Introduction
In my career as a data analyst, I recognized that SQL is still the most powerful tool. Most of the time, working on exploratory analysis and validating hypotheses with data were the main drivers of impact.
This article will walk you through some of the basic concepts of SQL order of execution. I believe understanding this can be helpful for anyone new to data analytics.
Who is the best Chris?
Let’s go through an example together. You are an analyst at Maverick Corp. and just received a task from the sales manager.
Some employees are debating who is the top sales named Chris in the company right after watching the Avengers movie. The sales manager thinks it will be way more efficient to have the analyst provide the data but adds a constraint that only sales that have completed more than 1 transaction can count.
The sales table contains the employees’ names and unique identification numbers, and the transaction table contains the data for each transaction the sales have made.
Table: sales
id | name |
---|---|
1 | Chris Hamsvalue |
2 | Chris Evians |
3 | Benedict Lumberjack |
4 | Chris P. Ratatouille |
5 | Chris Pint |
Table: transactions
transaction_id | sales_id | amount_usd |
---|---|---|
1 | 1 | 10000 |
2 | 2 | 20000 |
3 | 4 | 20000 |
4 | 1 | 40000 |
5 | 5 | 80000 |
6 | 3 | 20000 |
7 | 2 | 40000 |
8 | 4 | 30000 |
This is the query you will write to get the required answer:
SELECT name AS sales_name,
SUM(amount_usd) AS ttl_sales
FROM employee e
JOIN transactions t
ON e.id = t.sales_id
WHERE name LIKE 'Chris %'
GROUP BY name
HAVING COUNT(transaction_id) > 1
ORDER BY ttl_sales DESC
LIMIT 1;
However, the order of execution/operation is different from how you write the code.
- FROM (JOIN) - Pick and join tables to get raw data.
- WHERE - Filter the raw data.
- GROUP BY - Aggregate the raw data.
- HAVING - Filter the aggregated data.
- SELECT - Return the filtered data.
- ORDER BY - Sort the filtered data.
- LIMIT - Limit the returned data to number of rows.
FROM/ JOIN
The first step is the statements in FROM clause followed by the JOIN clause. The outcome of these operations is combining the columns of the two tables.
FROM employee e
JOIN transactions t
id | name | transaction_id | sales_id | amount_usd |
---|---|---|---|---|
1 | Chris Hamsvalue | 1 | 1 | 10000 |
1 | Chris Hamsvalue | 2 | 2 | 20000 |
1 | Chris Hamsvalue | 3 | 4 | 20000 |
1 | Chris Hamsvalue | 4 | 1 | 40000 |
1 | Chris Hamsvalue | 5 | 5 | 80000 |
1 | Chris Hamsvalue | 6 | 3 | 20000 |
1 | Chris Hamsvalue | 7 | 2 | 40000 |
1 | Chris Hamsvalue | 8 | 4 | 30000 |
… |
But this is not the data set you want. We work with relational databases. You need to specify the relation between tables. In this case, we state in ON clause that we only want the rows where the employee id matches the sales id.
ON e.id = t.sales_id
id | name | transaction_id | sales_id | amount_usd |
---|---|---|---|---|
1 | Chris Hamsvalue | 1 | 1 | 10000 |
1 | Chris Hamsvalue | 4 | 1 | 40000 |
2 | Chris Evians | 2 | 2 | 20000 |
2 | Chris Evians | 7 | 2 | 40000 |
3 | Benedict Lumberjack | 6 | 3 | 20000 |
4 | Chris P. Ratatouille | 8 | 4 | 30000 |
4 | Chris P. Ratatouille | 3 | 4 | 20000 |
5 | Chris Pint | 5 | 5 | 80000 |
WHERE
Now we have the qualified rows, and the next step is the WHERE clause. WHERE filters each row with the stated conditions. When rows do not return a true value, they will be removed from the data set. Benedict unfortunatly did not meet the requirements.
WHERE name LIKE 'Chris %'
id | name | transaction_id | sales_id | amount_usd |
---|---|---|---|---|
1 | Chris Hamsvalue | 1 | 1 | 10000 |
1 | Chris Hamsvalue | 4 | 1 | 40000 |
2 | Chris Evians | 2 | 2 | 20000 |
2 | Chris Evians | 7 | 2 | 40000 |
4 | Chris P. Ratatouille | 8 | 4 | 30000 |
4 | Chris P. Ratatouille | 3 | 4 | 20000 |
5 | Chris Pint | 5 | 5 | 80000 |
GROUP BY
GROUP BY clause aggregates rows of the data that have the same distinct value. You will notice that if we group by name, we can not directly group the other columns and this is quite often the reason why your query does not work. This clause is used with aggregation functions such as sum() or count() to show one value per aggregated field. Is this example, we will add the count() functions to the other columns, the function will return the number of values(rows) this column has.
GROUP BY name
count(id) | name | count(transaction_id) | count(sales_id) | count(amount_usd) |
---|---|---|---|---|
2 | Chris Hamsvalue | 2 | 2 | 2 |
2 | Chris Evians | 2 | 2 | 2 |
2 | Chris P. Ratatouille | 2 | 2 | 2 |
1 | Chris Pint | 1 | 1 | 1 |
HAVING
The HAVING clause is similar to the WHERE clause. But instead of filtering data by individual rows, it filters data by the aggregated rows created after the GROUP BY clause. Since Chris Pint only has one sales record, even with the highest sales, he is still filtered out.
HAVING COUNT(transaction_id) > 1
count(id) | name | count(transaction_id) | count(sales_id) | sum(amount_usd) |
---|---|---|---|---|
2 | Chris Hamsvalue | 2 | 2 | 50000 |
2 | Chris Evians | 2 | 2 | 60000 |
2 | Chris P. Ratatouille | 2 | 2 | 50000 |
SELECT
In the SELECT clause, we decide what columns will display as the results of the query, and if there are some functions like sum() or count(), etc to run.
SELECT
name AS sales_name,
SUM(amount_usd) AS ttl_sales
sales_name | ttl_sales |
---|---|
Chris Hamsvalue | 50000 |
Chris Evians | 60000 |
Chris P. Ratatouille | 50000 |
ORDER BY
The last is the ORDER BY clause. The top sales will logically have the highest sales number. We can set the ordering by ttl_sales in descending order to complete the objective.
ORDER BY ttl_sales DESC
sales_name | ttl_sales |
---|---|
Chris Evians | 60000 |
Chris Hamsvalue | 50000 |
Chris P. Ratatouille | 50000 |
LIMIT
We want to know who the best Chris is. Set the LIMIT to 1 to only show the first row.
LIMIT 1
sales_name | ttl_sales |
---|---|
Chris Evians | 60000 |
Conclusion
Not everything is covered here and there are a bunch of other details when writing queries. But these are the essential basics you should know to kickstart your learning. Hopefully, you find this article helpful!