SQL - Order of Execution | Michael C. Yu

Introduction

In my career as a data analyst, I recognized that SQL is still the most powerful tool. Most of the time, working on exploratory analysis and validating hypotheses with data were the main drivers of impact.

This article will walk you through some of the basic concepts of SQL order of execution. I believe understanding this can be helpful for anyone new to data analytics.

Who is the best Chris?

Let’s go through an example together. You are an analyst at Maverick Corp. and just received a task from the sales manager.

Some employees are debating who is the top sales named Chris in the company right after watching the Avengers movie. The sales manager thinks it will be way more efficient to have the analyst provide the data but adds a constraint that only sales that have completed more than 1 transaction can count.

The sales table contains the employees’ names and unique identification numbers, and the transaction table contains the data for each transaction the sales have made.

Table: sales

id	name
1	Chris Hamsvalue
2	Chris Evians
3	Benedict Lumberjack
4	Chris P. Ratatouille
5	Chris Pint

Table: transactions

transaction_id	sales_id	amount_usd
1	1	10000
2	2	20000
3	4	20000
4	1	40000
5	5	80000
6	3	20000
7	2	40000
8	4	30000

This is the query you will write to get the required answer:

SELECT name AS sales_name,
       SUM(amount_usd) AS ttl_sales
FROM employee e
JOIN transactions t
ON e.id = t.sales_id
WHERE name LIKE 'Chris %'
GROUP BY name
HAVING COUNT(transaction_id) > 1
ORDER BY ttl_sales DESC
LIMIT 1;

However, the order of execution/operation is different from how you write the code.

FROM (JOIN) - Pick and join tables to get raw data.
WHERE - Filter the raw data.
GROUP BY - Aggregate the raw data.
HAVING - Filter the aggregated data.
SELECT - Return the filtered data.
ORDER BY - Sort the filtered data.
LIMIT - Limit the returned data to number of rows.

FROM/ JOIN

The first step is the statements in FROM clause followed by the JOIN clause. The outcome of these operations is combining the columns of the two tables.

FROM employee e
JOIN transactions t

id	name	transaction_id	sales_id	amount_usd
1	Chris Hamsvalue	1	1	10000
1	Chris Hamsvalue	2	2	20000
1	Chris Hamsvalue	3	4	20000
1	Chris Hamsvalue	4	1	40000
1	Chris Hamsvalue	5	5	80000
1	Chris Hamsvalue	6	3	20000
1	Chris Hamsvalue	7	2	40000
1	Chris Hamsvalue	8	4	30000
…

But this is not the data set you want. We work with relational databases. You need to specify the relation between tables. In this case, we state in ON clause that we only want the rows where the employee id matches the sales id.

ON e.id = t.sales_id

id	name	transaction_id	sales_id	amount_usd
1	Chris Hamsvalue	1	1	10000
1	Chris Hamsvalue	4	1	40000
2	Chris Evians	2	2	20000
2	Chris Evians	7	2	40000
3	Benedict Lumberjack	6	3	20000
4	Chris P. Ratatouille	8	4	30000
4	Chris P. Ratatouille	3	4	20000
5	Chris Pint	5	5	80000

WHERE

Now we have the qualified rows, and the next step is the WHERE clause. WHERE filters each row with the stated conditions. When rows do not return a true value, they will be removed from the data set. Benedict unfortunatly did not meet the requirements.

WHERE name LIKE 'Chris %'

id	name	transaction_id	sales_id	amount_usd
1	Chris Hamsvalue	1	1	10000
1	Chris Hamsvalue	4	1	40000
2	Chris Evians	2	2	20000
2	Chris Evians	7	2	40000
4	Chris P. Ratatouille	8	4	30000
4	Chris P. Ratatouille	3	4	20000
5	Chris Pint	5	5	80000

GROUP BY

GROUP BY clause aggregates rows of the data that have the same distinct value. You will notice that if we group by name, we can not directly group the other columns and this is quite often the reason why your query does not work. This clause is used with aggregation functions such as sum() or count() to show one value per aggregated field. Is this example, we will add the count() functions to the other columns, the function will return the number of values(rows) this column has.

GROUP BY name

count(id)	name	count(transaction_id)	count(sales_id)	count(amount_usd)
2	Chris Hamsvalue	2	2	2
2	Chris Evians	2	2	2
2	Chris P. Ratatouille	2	2	2
1	Chris Pint	1	1	1

HAVING

The HAVING clause is similar to the WHERE clause. But instead of filtering data by individual rows, it filters data by the aggregated rows created after the GROUP BY clause. Since Chris Pint only has one sales record, even with the highest sales, he is still filtered out.

HAVING COUNT(transaction_id) > 1

count(id)	name	count(transaction_id)	count(sales_id)	sum(amount_usd)
2	Chris Hamsvalue	2	2	50000
2	Chris Evians	2	2	60000
2	Chris P. Ratatouille	2	2	50000

SELECT

In the SELECT clause, we decide what columns will display as the results of the query, and if there are some functions like sum() or count(), etc to run.

SELECT 
	name AS sales_name,
	SUM(amount_usd) AS ttl_sales

sales_name	ttl_sales
Chris Hamsvalue	50000
Chris Evians	60000
Chris P. Ratatouille	50000

ORDER BY

The last is the ORDER BY clause. The top sales will logically have the highest sales number. We can set the ordering by ttl_sales in descending order to complete the objective.

ORDER BY ttl_sales DESC

sales_name	ttl_sales
Chris Evians	60000
Chris Hamsvalue	50000
Chris P. Ratatouille	50000

LIMIT

We want to know who the best Chris is. Set the LIMIT to 1 to only show the first row.

LIMIT 1

sales_name	ttl_sales
Chris Evians	60000

Conclusion

Not everything is covered here and there are a bunch of other details when writing queries. But these are the essential basics you should know to kickstart your learning. Hopefully, you find this article helpful!

SQL - Order of Execution