Subqueries

A subquery allows you to nest one SELECT statement inside another. This inner query, often referred to as a subquery, is executed independently from the outer query.

Subqueries can be found in the SELECT statement, WHERE, HAVING clause and the FROM statement.

Can return a table, lists or scalar quantities.

Syntax:

SELECT column_name(s)
FROM table_name
WHERE column_name IN (SELECT column_name FROM table_name);

Let's break down the syntax:

  • The outer query starts with the SELECT statement, followed by the column(s) you want to retrieve from the table.

  • After that, you specify the table name in the FROM clause.

  • The WHERE clause is used to filter the rows based on a condition.

  • Inside the WHERE clause, you use the IN operator to compare a column from the outer query with the result of the inner subquery.

  • The inner subquery is enclosed in parentheses and follows the IN operator. It can be any valid SQL query that returns a column or a set of values.

Execution Order

In SQL, subqueries are executed in a specific order. The inner subquery is processed first, and its results determine the outcome of the outer query. This means that the inner subquery is evaluated independently, and its output is used as a filter or criteria for the outer query.

In the above code, the highlighted code is run first and the results determine the first SELECT code

Importance of Subqueries

  1. compares groups to summarized values.

    Subqueries are commonly used to compare aggregated or summarized values with individual records in a table. This enables you to filter or retrieve data based on conditions that involve group-level calculations.

     SELECT e.employee_name, e.salary, e.department_id
     FROM Employee e
     WHERE e.salary > (
         SELECT AVG(salary)
         FROM Employee
         WHERE department_id = e.department_id
     );
    

    In the above example, we are retrieving the employee name, salary and department ID from the employee table. We want to find employees who earn above-average salary. We cannot use an aggregate function directly in the where clause so the subquery will return first the average salary as it is implemented first then the outer query is run.

  2. Reshaping of data

can help you transform data from a long format to a wide format or vice versa, making it easier to work with specific data requirements or reporting needs.

transaction_idproduct_namesales_datequantity_sold
1Product A2023-01-0110
2Product B2023-01-015
3Product C2023-01-018
4Product A2023-01-0215
5Product B2023-01-027
6Product C2023-01-029

For example, in the above table, we find that the different types of products are grouped separately we'd want to have products A/B/C each on its own column. We can achieve that by using a subquery

SELECT DISTINCT sales_date,
    (SELECT SUM(quantity_sold) FROM Sales s1 WHERE s1.sales_date = s.sales_date AND s1.product_name = 'Product A') AS 'Product A',
    (SELECT SUM(quantity_sold) FROM Sales s2 WHERE s2.sales_date = s.sales_date AND s2.product_name = 'Product B') AS 'Product B',
    (SELECT SUM(quantity_sold) FROM Sales s3 WHERE s3.sales_date = s.sales_date AND s3.product_name = 'Product C') AS 'Product C'
FROM Sales s;

The final output would be something like:

sales_dateProduct AProduct BProduct C
2023-01-011058
2023-01-021579
  1. Combining Data That Cannot Be Joined:

In situations where joining tables is not possible due to incompatible structures or missing relationships, subqueries allow you to combine data from different sources or conditions seamlessly.

sales table:

transaction_idsales_dateamount_sold
12023-01-15100.50
22023-02-1075.25
32023-02-25120.75
42023-03-0590.00
expense_idexpense_dateamount_spent
12023-01-2050.00
22023-02-1560.75
32023-03-1045.50

The two tables have no way of joining since they don't have a common unique identifier but we can use subqueries to retrieve information from the two

SELECT
    EXTRACT(MONTH FROM sales_date) AS month,
    SUM(amount_sold) AS total_sales,
    (SELECT SUM(amount_spent) FROM Expenses e WHERE EXTRACT(MONTH FROM e.expense_date) = EXTRACT(MONTH FROM s.sales_date)) AS total_expenses
FROM Sales s
GROUP BY month
ORDER BY month;

Final output would be

monthtotal_salestotal_expenses
1100.5050.00
2196.0060.75
390.0045.50

This is a good approach but should be done sparingly because Subqueries can inhibit or lower the performance. Therefore, we can consider using CTEs for such.

Subqueries in the SELECT statement

SELECT employee_name, salary, 
    (SELECT AVG(salary) FROM Employee) AS avg_salary
FROM Employee;

Scalar subqueries in the SELECT clause are used to retrieve a single value and display it as a column in the query's result set. For example, you can use a scalar subquery to calculate an average or find the maximum value and display it for each row in the result

Subqueries in the FROM statement

Subqueries can be used to create result sets that would function as a temporary table for our main query.

SELECT category_name, total_quantity_sold
FROM (
    SELECT category_name, SUM(quantity_sold) AS total_quantity_sold
    FROM Products p
    JOIN Categories c ON p.category_id = c.category_id
    GROUP BY category_name
) AS CategorySales;

Subqueries in the WHERE clause

Used to filter data based on certain conditions.

SELECT employee_name, salary, department_id
FROM Employee
WHERE salary > (
    SELECT AVG(salary)
    FROM Employee
    WHERE department_id = Employee.department_id
);

vamos a practicar!