A subquery allows you to nest one SELECT statement inside another. This inner query, often referred to as a subquery, is executed independently from the outer query.
Subqueries can be found in the SELECT
statement, WHERE, HAVING
clause and the FROM
statement.
Can return a table, lists or scalar quantities.
Syntax:
SELECT column_name(s)
FROM table_name
WHERE column_name IN (SELECT column_name FROM table_name);
Let's break down the syntax:
The outer query starts with the
SELECT
statement, followed by the column(s) you want to retrieve from the table.After that, you specify the table name in the
FROM
clause.The
WHERE
clause is used to filter the rows based on a condition.Inside the
WHERE
clause, you use theIN
operator to compare a column from the outer query with the result of the inner subquery.The inner subquery is enclosed in parentheses and follows the
IN
operator. It can be any valid SQL query that returns a column or a set of values.
Execution Order
In SQL, subqueries are executed in a specific order. The inner subquery is processed first, and its results determine the outcome of the outer query. This means that the inner subquery is evaluated independently, and its output is used as a filter or criteria for the outer query.
In the above code, the highlighted code is run first and the results determine the first SELECT code
Importance of Subqueries
compares groups to summarized values.
Subqueries are commonly used to compare aggregated or summarized values with individual records in a table. This enables you to filter or retrieve data based on conditions that involve group-level calculations.
SELECT e.employee_name, e.salary, e.department_id FROM Employee e WHERE e.salary > ( SELECT AVG(salary) FROM Employee WHERE department_id = e.department_id );
In the above example, we are retrieving the employee name, salary and department ID from the employee table. We want to find employees who earn above-average salary. We cannot use an aggregate function directly in the where clause so the subquery will return first the average salary as it is implemented first then the outer query is run.
Reshaping of data
can help you transform data from a long format to a wide format or vice versa, making it easier to work with specific data requirements or reporting needs.
transaction_id | product_name | sales_date | quantity_sold |
1 | Product A | 2023-01-01 | 10 |
2 | Product B | 2023-01-01 | 5 |
3 | Product C | 2023-01-01 | 8 |
4 | Product A | 2023-01-02 | 15 |
5 | Product B | 2023-01-02 | 7 |
6 | Product C | 2023-01-02 | 9 |
For example, in the above table, we find that the different types of products are grouped separately we'd want to have products A/B/C each on its own column. We can achieve that by using a subquery
SELECT DISTINCT sales_date,
(SELECT SUM(quantity_sold) FROM Sales s1 WHERE s1.sales_date = s.sales_date AND s1.product_name = 'Product A') AS 'Product A',
(SELECT SUM(quantity_sold) FROM Sales s2 WHERE s2.sales_date = s.sales_date AND s2.product_name = 'Product B') AS 'Product B',
(SELECT SUM(quantity_sold) FROM Sales s3 WHERE s3.sales_date = s.sales_date AND s3.product_name = 'Product C') AS 'Product C'
FROM Sales s;
The final output would be something like:
sales_date | Product A | Product B | Product C |
2023-01-01 | 10 | 5 | 8 |
2023-01-02 | 15 | 7 | 9 |
- Combining Data That Cannot Be Joined:
In situations where joining tables is not possible due to incompatible structures or missing relationships, subqueries allow you to combine data from different sources or conditions seamlessly.
sales table:
transaction_id | sales_date | amount_sold |
1 | 2023-01-15 | 100.50 |
2 | 2023-02-10 | 75.25 |
3 | 2023-02-25 | 120.75 |
4 | 2023-03-05 | 90.00 |
expense_id | expense_date | amount_spent |
1 | 2023-01-20 | 50.00 |
2 | 2023-02-15 | 60.75 |
3 | 2023-03-10 | 45.50 |
The two tables have no way of joining since they don't have a common unique identifier but we can use subqueries to retrieve information from the two
SELECT
EXTRACT(MONTH FROM sales_date) AS month,
SUM(amount_sold) AS total_sales,
(SELECT SUM(amount_spent) FROM Expenses e WHERE EXTRACT(MONTH FROM e.expense_date) = EXTRACT(MONTH FROM s.sales_date)) AS total_expenses
FROM Sales s
GROUP BY month
ORDER BY month;
Final output would be
month | total_sales | total_expenses |
1 | 100.50 | 50.00 |
2 | 196.00 | 60.75 |
3 | 90.00 | 45.50 |
This is a good approach but should be done sparingly because Subqueries can inhibit or lower the performance. Therefore, we can consider using CTEs for such.
Subqueries in the SELECT statement
SELECT employee_name, salary,
(SELECT AVG(salary) FROM Employee) AS avg_salary
FROM Employee;
Scalar subqueries in the SELECT clause are used to retrieve a single value and display it as a column in the query's result set. For example, you can use a scalar subquery to calculate an average or find the maximum value and display it for each row in the result
Subqueries in the FROM statement
Subqueries can be used to create result sets that would function as a temporary table for our main query.
SELECT category_name, total_quantity_sold
FROM (
SELECT category_name, SUM(quantity_sold) AS total_quantity_sold
FROM Products p
JOIN Categories c ON p.category_id = c.category_id
GROUP BY category_name
) AS CategorySales;
Subqueries in the WHERE clause
Used to filter data based on certain conditions.
SELECT employee_name, salary, department_id
FROM Employee
WHERE salary > (
SELECT AVG(salary)
FROM Employee
WHERE department_id = Employee.department_id
);
vamos a practicar!