Joining Tables
Relational schemas split data across tables on purpose β one row per user, one row per order, the link living in a column called user_id. A join is how you stitch those tables back together at query time. This chapter covers the four join flavours, the row-matching mental model that explains all of them, and the handful of mistakes that produce silently wrong results.
Why Joins Exist
"Show me each order with the customer's name and email" is a question that touches two tables: orders (with the order id, status, date) and users (with the email and name). The schema split them on purpose β names live with the customer, not duplicated on every order. The join is the operation that puts them back together at query time, exactly when you need them.
What goes wrong without joins β duplication
Imagine storing the whole order history in a single flat table:
order_id | user_email | user_name | placed_at | product_name | quantity
---------+-----------------+--------------+---------------------+----------------+---------
1001 | ada@example.com | Ada Lovelace | 2025-02-10 09:00 | SQL Cookbook | 1
1001 | ada@example.com | Ada Lovelace | 2025-02-10 09:00 | Database Mug | 2
1003 | ada@example.com | Ada Lovelace | 2025-02-14 16:45 | SQL Cookbook | 1
1003 | ada@example.com | Ada Lovelace | 2025-02-14 16:45 | Designing Data | 1
Ada's email and name appear four times. If she changes her email tomorrow, you have to remember to update every historical row, and a single missed update produces inconsistent answers depending on which row a report happens to read. The relational fix is to store each fact exactly once: a users row holds Ada's email; an orders row holds a foreign key user_id = 1 that points back. Joins reconstruct the flat view at query time without baking the duplication into storage.
Foreign keys, primary keys, cardinality
Three words to nail down before the syntax:
- Primary key (PK) β one or more columns that uniquely identify a row.
users.idandorders.idare PKs. - Foreign key (FK) β a column whose values are required to exist as PKs in another table.
orders.user_idis an FK pointing atusers.id. Chapter 07 covers the syntax and theON DELETEoptions. - Cardinality β how many rows on each side. One-to-many means one parent row (one user) can have many child rows (many orders). Many-to-many needs a third "junction" table (see
order_itemsin our schema, which sits betweenordersandproducts).
INNER JOIN
INNER JOIN (often shortened to just JOIN) returns every pair of rows β one from each table β where the join condition is true. Rows on either side with no match are dropped. It's the most common join, and the right mental model is not the famous Venn diagram β it's a row-matching table.
-- Each order alongside the customer who placed it
SELECT u.full_name, o.id AS order_id, o.placed_at, o.status
FROM users u
JOIN orders o ON o.user_id = u.id; full_name | order_id | placed_at | status
---------------+----------+---------------------+----------
Ada Lovelace | 1001 | 2025-02-10 09:00:00 | paid
Linus Torvalds | 1002 | 2025-02-11 11:30:00 | paid
Ada Lovelace | 1003 | 2025-02-14 16:45:00 | shipped
Grace Hopper | 1004 | 2025-02-15 08:10:00 | cancelled
Table aliases β short prefixes for readability
Once two or more tables are in scope, you have to disambiguate any column that exists in both (here, both tables have id). The convention is a one- or two-letter alias right after the table name. FROM users u means "the rest of this query can refer to users as u." It applies to every column reference in the query, including the SELECT list.
ON vs USING vs implicit join
The same INNER JOIN can be written three ways. The first is the standard explicit form, the second a shortcut for equality on identically-named columns, and the third is the legacy comma-style join that you'll meet in old code.
-- β
ANSI JOIN with ON β the universal form
SELECT *
FROM orders o
JOIN users u ON u.id = o.user_id;
-- β
USING (col) β shortcut when both sides have a column with the same name
-- Renames it: in the result there's one "user_id" column, not two.
-- Requires the column to exist in BOTH tables with the same name.
SELECT *
FROM orders
JOIN users USING (user_id); -- only works if users also has a column called user_id
-- β Implicit (comma) join β same result, but the WHERE doubles as the join condition
-- Easy to forget the WHERE and get a Cartesian product (Β§4). Avoid in new code.
SELECT *
FROM orders o, users u
WHERE u.id = o.user_id; ON by default. Reach for USING only when both tables genuinely share an identically-named column and you want to collapse it in the result. Avoid the comma form β it's not wrong, but the JOIN keyword makes intent obvious and prevents the missing-WHERE Cartesian-product accident.
Joining three or more tables
Add another JOIN clause for every table you need. Each join adds one matching condition. The order of joins doesn't change the result (for inner joins), but readability matters: chain along the relationships.
-- Every paid line item with customer name and product name
SELECT u.full_name AS customer,
o.id AS order_id,
p.name AS product,
oi.quantity,
oi.unit_price
FROM orders o
JOIN users u ON u.id = o.user_id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.status = 'paid'
ORDER BY o.id, p.name; customer | order_id | product | quantity | unit_price
---------------+----------+----------------+----------+-----------
Ada Lovelace | 1001 | Database Mug | 2 | 9.50
Ada Lovelace | 1001 | SQL Cookbook | 1 | 34.99
Linus Torvalds | 1002 | Designing Data | 1 | 42.00
order_items in the middle? It's the junction table that resolves the many-to-many relationship between orders and products (Β§1 diagram). Multi-table joins follow the FK arrows β let the schema diagram tell you the right chain.
Outer Joins & Where NULL Comes From
INNER JOIN drops rows with no match. Outer joins keep them, filling the missing side with NULLs. Three flavours β LEFT, RIGHT, FULL β differ only in which unmatched rows survive.
LEFT JOIN β the workhorse
Almost every "show me X with their optional Y" question is a LEFT JOIN. "Users with their orders, including users who've placed none." "Products with their reviews, including products with no reviews." The left table is the spine of the result; rows without a match get NULL on the right.
-- Every user, with their order ids (NULL where the user has no orders)
SELECT u.full_name, o.id AS order_id, o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
ORDER BY u.id, o.placed_at; full_name | order_id | status
---------------+----------+----------
Ada Lovelace | 1001 | paid
Ada Lovelace | 1003 | shipped
Linus Torvalds | 1002 | paid
Grace Hopper | 1004 | cancelled
In the seed schema every user happens to have at least one order, so the LEFT JOIN result looks the same as INNER. The instant a user has no orders, the LEFT version includes them with NULLs; the INNER version silently drops them.
Counting children β the classic LEFT JOIN + GROUP BY
"How many orders has each user placed, including the zeros?" combines a LEFT JOIN with a GROUP BY. Watch which column you count β this is where Chapter 04's COUNT(*) vs COUNT(col) distinction earns its salary.
SELECT u.full_name,
COUNT(o.id) AS order_count -- β
counts non-NULL order ids
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.full_name
ORDER BY u.id; COUNT(*) here. For a user with no orders, the LEFT JOIN still produces one row (with o.id = NULL). COUNT(*) counts that row as 1; COUNT(o.id) correctly ignores it and returns 0. The error is silent: every user with zero orders gets a count of 1.
RIGHT JOIN β and why LEFT is usually preferred
RIGHT JOIN is the mirror of LEFT JOIN. A LEFT JOIN B is identical to B RIGHT JOIN A with the operands swapped. The result-set semantics are the same; only the syntactic position changes. Every team-style guide picks one (almost always LEFT) and forbids the other, because reading a query is harder when half the joins go one direction and half the other.
-- These two return the same rows
SELECT * FROM users u LEFT JOIN orders o ON o.user_id = u.id;
SELECT * FROM orders o RIGHT JOIN users u ON o.user_id = u.id; FULL OUTER JOIN β keep unmatched rows from both sides
Rare in application code, useful in reconciliation reports: "show me every user and every order, paired where possible, flagged where unpaired." MySQL didn't support it until very recently β emulate with LEFT UNION ALL RIGHT WHERE LEFT.key IS NULL on older versions.
SELECT u.full_name, o.id AS order_id
FROM users u
FULL OUTER JOIN orders o ON o.user_id = u.id; The killer outer-join bug β WHERE after LEFT JOIN
This single mistake quietly destroys more outer-join queries than every other join issue combined. Move a predicate from ON to WHERE in a LEFT JOIN and you've turned it into an INNER JOIN.
WHERE o.status = 'paid' after a LEFT JOIN silently filters out your unmatched rows-- β Bug: silently turns LEFT JOIN into INNER JOIN for users with no PAID order
SELECT u.full_name, o.id, o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.status = 'paid'; -- excludes "NULL = 'paid'" rows β unmatched users dropped
-- β
Fix: move the row-level predicate INTO the join condition
SELECT u.full_name, o.id, o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'paid';
-- β
Or, if you DID want only users who actually have a paid order, be explicit:
SELECT u.full_name, o.id, o.status
FROM users u
JOIN orders o ON o.user_id = u.id -- inner is now the correct intent
WHERE o.status = 'paid'; ON. Predicates on the left table can safely live in WHERE. A predicate on the right table in WHERE is almost always a bug β and one that compiles, runs, and returns plausible-looking (wrong) numbers.
Self Joins & CROSS JOIN
Self join β joining a table to itself
When a table has rows that reference other rows in the same table, you can join it to itself. The classic case is an employees table where every row has a manager_id pointing at another employees row.
LEFT JOIN keeps Ada (no manager).
-- Sample table for this section (run once)
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT,
manager_id INTEGER REFERENCES employees(id)
);
INSERT INTO employees VALUES
(1, 'Ada', NULL),
(2, 'Linus', 1),
(3, 'Grace', 1),
(4, 'Donald', 2);
-- Employee with their manager's name
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id; Other natural self-join use cases: finding pairs of products in the same category, comparing each row to the previous one (more often done with window functions β Chapter 10), or finding all "friend-of-friend" pairs in a social graph (Chapter 09 covers recursive CTEs for arbitrary-depth traversal).
CROSS JOIN β every row paired with every row
CROSS JOIN returns the Cartesian product β every row in the left table paired with every row in the right, no join condition. If left has L rows and right has R rows, the result has LΓR rows. Useful when you genuinely want a grid; dangerous when you wrote it by accident.
CROSS JOIN β 3 sizes Γ 3 colours = 9 SKUs to seedON on a JOIN (or WHERE on a comma join) on two 10,000-row tables = 100,000,000 rows. Memory spikes, query times out.
-- Real use case: pre-generate a SKU grid
SELECT s.value AS size, c.value AS colour
FROM sizes s
CROSS JOIN colours c;
-- Same idea with VALUES (Chapter 06 covers this constructor)
SELECT s.size, c.colour
FROM (VALUES ('S'), ('M'), ('L')) AS s(size)
CROSS JOIN (VALUES ('red'), ('green'), ('blue')) AS c(colour);
-- β The accidental one β note the missing ON clause
SELECT *
FROM users u
JOIN orders o; -- syntax error in most dialects, but...
-- β ...the comma form lets it through silently
SELECT *
FROM users, orders; -- 3 users Γ 4 orders = 12 rows, none of them meaningful (row count of table A) Γ (row count of table B), every value in one column appears once for every value in another, and a report that should return ~100 rows returns hundreds of thousands.
INNER JOIN= matched pairs only. The default; the most common.LEFT JOIN= matched pairs + every unmatched left row, NULL on the right. The workhorse for "show me X with optional Y".RIGHT JOIN= mirror of LEFT. Pick one direction (almost always LEFT) and stick to it.FULL OUTER JOIN= unmatched rows from both sides. Rare; mostly for reconciliation.- In a LEFT JOIN, predicates on the right table go in
ON, notWHEREβ or the join silently becomes INNER. - Count children with
COUNT(child_col), neverCOUNT(*), when the join is LEFT. - Self join needs two aliases. CROSS JOIN multiplies; missing
ONis the most common way to write one by accident.
β Chapter 04 β Aggregation Β· π Continue to Chapter 06 β Writing Data β