Joins in SQL: Bridging the Gap Between Data Tables

Joins in SQL: Bridging the Gap Between Data Tables

Joins is a clause in SQL used to combine rows from two or more tables based on a related column.

Tables that share information have a primary key that identifies the entity uniquely across the database.

JOIN combines row data across separate tables using the unique key. They allow data retrieval from multiple tables in one query.

SYNTAX:

SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;

SELECT column_name(s): This part of the query specifies which columns you want to retrieve from the tables.

FROM table1: This part of the query indicates that you are selecting data from table1. table1 is the first table in the join operation.

INNER JOIN table2: This part of the query specifies that you want to perform an INNER JOIN with table2.

ON table1.column_name = table2.column_name; This is the join condition. It defines how the two tables are related. In this example, you are joining the tables on the equality of values in table1.column_name and table2.column_name.

If the two tables share a common name you can use using instead of ON

Example:

SELECT first_name, last_name
FROM customers
INNER JOIN orders
USING (customer_id);

In this query, we use USING (customer_id)

to specify that the join condition should be based on the customer_id column, which is common to both the customers and orders tables. This can make the SQL query more readable and concise when dealing with common column names in joined tables.

Importance of Joins

  1. Allows extraction of data from multiple tables in a single query

  2. Data analysis. Joins enable complex data analysis by combining and aggregating data from various tables.

  3. Data normalization. In normalized databases, data is distributed across several tables to minimize redundancy and improve data integrity. Joins reassemble normalized data when necessary but still maintain the database efficiency

💡
Database normalization In a database, data is often distributed across multiple tables to improve data organization and reduce redundancy. Is useful since it minimizes duplicate data in any single table and allows data in the database to grow independently. So to be able to draw insight from data in separate tables we write queries to combine the data.

Different types of Joins

Employee table:

idfirstNamedate_of_hirebenefits
1John2020-01-15Health Insurance
2Alice2019-03-10Dental Insurance
3Bob2021-05-20Retirement Plan
4Sarah2018-11-02Vision Plan
5Emily2022-02-18None

salary table:

employee_idsalary
160000.00
255000.00
370000.00
775000.00
4080000.00

INNER JOIN/JOIN:

Selects records that have matching values in both tables.

Example:

SELECT firstName, salary, date_of_hire, benefits
FROM employee e
INNER JOIN salary s
ON e.id = s.employee_id

sample output

+-----------+---------+--------------+------------------+
| firstName | salary  | date_of_hire | benefits         |
+-----------+---------+--------------+------------------+
| John      | 60000.00| 2020-01-15   | Health Insurance |
| Alice     | 55000.00| 2019-03-10   | Dental Insurance |
| Bob       | 70000.00| 2021-05-20   | Retirement Plan  |
+-----------+---------+--------------+------------------+

LEFT(OUTER) JOIN:

Retrieves all the values in the first table(left) regardless of whether they are present in the second table.

Useful when you want all records from the left table and the matching ones from the right.

Example:

SELECT firstName, salary, date_of_hire, benefits
FROM employee e
LEFT JOIN salary s
ON e.id = s.employee_id

sample output:

+-----------+---------+--------------+------------------+
| firstName | salary  | date_of_hire | benefits         |
+-----------+---------+--------------+------------------+
| John      | 60000.00| 2020-01-15   | Health Insurance |
| Alice     | 55000.00| 2019-03-10   | Dental Insurance |
| Bob       | 70000.00| 2021-05-20   | Retirement Plan  |
| Sarah     | NULL    | 2018-11-02   | Vision Plan      |
| Emily     | NULL    | 2022-02-18   | None           |
+-----------+---------+--------------+------------------+

RIGHT(OUTER)JOIN:

Opposite of left join.

retrieves all values in the right(second) table and the matching values in the first table.

Not as commonly used as left join, as it can be rewritten as a left join with tables swapped.

FULL JOIN:

Retrieves all records when there is a match in either the left or right table

Not supported in all database systems (e.g., MySQL), and can often be emulated using a combination of left and right joins.

Example:

SELECT firstName, salary, date_of_hire, benefits
FROM employee e
FULL JOIN salary s
ON e.id = s.employee_id

output:

+-----------+---------+--------------+------------------+
| firstName | salary  | date_of_hire | benefits         |
+-----------+---------+--------------+------------------+
| John      | 60000.00| 2020-01-15   | Health Insurance |
| Alice     | 55000.00| 2019-03-10   | Dental Insurance |
| Bob       | 70000.00| 2021-05-20   | Retirement Plan  |
| Sarah     | NULL    | 2018-11-02   | Vision Plan      |
| Emily     | NULL    | 2022-02-18   | None          |
| NULL      | 75000.00| NULL         | NULL   |
| NULL      | 80000.00| NULL         |   NULL
+-----------+---------+--------------+------------------+

SELF JOIN:

  • A table is joined with itself, usually using aliases to differentiate between the instances of the table.

  • Useful when you want to compare records within the same table

Example:

SELECT e1.name AS employee1, e2.name AS employee2
FROM employees e1
INNER JOIN employees e2
ON e1.manager_id = e2.employee_id;

Cartesian (cross) Join:

Allows each row from the first table to join with all rows of another table.

If the first table let's say has x rows and the second has y rows, the end results will be x multiplied by y

Syntax:

SELECT Customers.CustomerName, Products.ProductName
FROM Customers
CROSS JOIN Products;
CustomerIDCustomerName
1Customer A
2Customer B
3Customer C
ProductIDProductName
101Product X
102Product Y

Sample output:

Customer A

Product X

Customer A

Product Y

Customer B

Product X

Customer B

Product Y

Customer C

Product X

Customer C

Product Y

NB: not frequently used as it can be resource-intensive and result in a lot of data.

Different SQL flavors also support different join so it is good to confirm the DBMS you are using.

Unions

Used to combine the result set of two or more SELECT statements.

Each SELECT statement within UNION must have the same number of columns and similar or compatible data types. The columns in each SELECT statement must be in the same order.

Unions are useful when you need to consolidate data from different sources or perform complex data aggregation across multiple queries.

Syntax:

SELECT column1, column2, ...
FROM table1
WHERE conditions
UNION [ALL]  -- Use "ALL" to include duplicate rows, or omit it to remove duplicates
SELECT column1, column2, ...
FROM table2
WHERE conditions
-- You can continue with more SELECT statements if needed
  • SELECT column1, column2, ...: In each SELECT statement, specify the columns you want to retrieve from the respective tables.

  • FROM table1: Indicate the first table you want to query.

  • WHERE conditions: Optionally, you can include conditions to filter the rows from the first table.

  • UNION [ALL]: Use "UNION" to remove duplicate rows from the result set. Use "UNION ALL" to include duplicate rows in the result set.

  • SELECT column1, column2, ...: In the second (and subsequent) SELECT statement(s), specify the same number of columns and compatible data types as in the first SELECT statement.

  • FROM table2: Indicate the second (and subsequent) table(s) you want to query.

  • WHERE conditions: Optionally, you can include conditions to filter the rows from the second (and subsequent) table(s).

Example:

SELECT employee_name, department
FROM employees
WHERE salary > 50000
UNION
SELECT contractor_name, department
FROM contractors
WHERE contract_type = 'Full-Time';

vamos a practicar!