Indexes & Performance
An index is a data structure the database maintains alongside a table so it can answer certain queries without scanning every row. This chapter explains how indexes work (B-trees), when to create them, how to read the query plan that proves they’re being used, and the four slow-query patterns that trip up every team eventually.
How Indexes Work
Without an index, the database must sequentially scan every row to find matches. A table with 10 million rows means reading 10 million rows — even if only 3 match. An index is like the index at the back of a textbook: you look up the term, get a page number, and jump directly there.
The most common structure is the B-tree (balanced tree). Keys are stored in sorted order across a hierarchy of pages. Finding a value takes O(log n) page reads — for 10 million rows that’s about 3–4 reads instead of millions.
Which queries benefit?
Equality: WHERE email = ?
Direct B-tree lookup — O(log n). The most common indexed pattern.
Range: WHERE price BETWEEN 10 AND 50
Walk sorted leaf nodes from 10 to 50 — skip everything outside the range.
Sorting: ORDER BY placed_at LIMIT 10
Index is already sorted — read 10 entries from the end. No full sort needed.
Full table: SELECT * FROM orders
If you need every row, a sequential scan is faster (less random I/O jumping).
The cost on writes
Every INSERT/UPDATE/DELETE must also update every affected index. A table with 5 indexes does roughly 5× the write work of an unindexed table. Most tables benefit from 2–5 well-chosen indexes; 20 is too many.
WHERE, JOIN ON, and ORDER BY. Don’t index columns you only SELECT. Don’t index low-cardinality columns alone (e.g. a boolean).
Creating Indexes
Basic syntax
-- Single-column index
CREATE INDEX idx_orders_user_id ON orders (user_id);
-- Unique index (enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email ON users (email);
-- Drop
DROP INDEX idx_orders_user_id; Composite indexes — column order matters
A composite index is sorted by all columns in order. It satisfies queries on a leftmost prefix of those columns — but not ones that skip the leading column.
CREATE INDEX idx_orders_status_placed ON orders (status, placed_at); WHERE status = 'paid' AND placed_at >= ...
Both columns match the leftmost prefix. Jump to 'paid', range-scan dates.
WHERE status = 'shipped'
Leading column alone. Jumps to 'shipped' section, scans all dates within it.
WHERE placed_at >= '2025-02-10'
Skips the leading column (status). The B-tree is sorted status-first — can’t jump in without it.
ORDER BY placed_at DESC LIMIT 10
The B-tree is sorted status-first — can’t extract the global top-10 by date alone. Needs a separate (placed_at) index.
(A, B, C) satisfies (A), (A, B), (A, B, C) — but not (B), (B, C), or (C). Put the most-filtered column first.
Partial indexes
Index only the rows you actually query. Smaller, faster, cheaper to maintain.
-- PostgreSQL / SQLite: index only pending orders
CREATE INDEX idx_orders_pending
ON orders (placed_at)
WHERE status = 'pending';
-- The query that benefits:
SELECT * FROM orders WHERE status = 'pending' ORDER BY placed_at;
-- Planner sees the partial index matches the WHERE and uses it.
-- MySQL: no partial indexes. Workaround: generated columns or covering indexes. Reading Query Plans
EXPLAIN shows the plan the database intends to use. EXPLAIN ANALYZE runs the query and shows real timing. Never guess whether an index is used — ask the planner.
-- PostgreSQL
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 1 ORDER BY placed_at DESC;
-- MySQL
EXPLAIN
SELECT * FROM orders WHERE user_id = 1 ORDER BY placed_at DESC; Two scan types to recognise
Reads every row
Scans the table end-to-end. Fine for small tables or full-table queries. Bad for large tables with selective WHERE — means no usable index exists.
Jumps to matching rows
Walks the B-tree to find matches, then fetches table rows. Index Only Scan is better — all columns are in the index; no table fetch needed.
Annotated example (PostgreSQL)
QUERY PLAN
---------------------------------------------------------------------------
Index Scan using idx_orders_user_id on orders (cost=0.15..8.17 rows=2)
Index Cond: (user_id = 1)
Filter: (status = 'paid')
Rows Removed by Filter: 1
Planning Time: 0.08 ms
Execution Time: 0.03 ms - Node type: Index Scan — good, using an index.
- Index:
idx_orders_user_id— confirms which one. - Index Cond:
user_id = 1— the part satisfied by the index. - Filter:
status = 'paid'— applied after the index fetch. Adding status to the index would eliminate this extra filter. - cost: 0.15..8.17 = startup..total in abstract units. Compare between plans.
- Execution Time: 0.03 ms — the real wall-clock time.
EXPLAIN ANALYZEthe slow query.- Find
Seq Scanon a large table with a selective WHERE — that’s the bottleneck. - Create an index on the filtered column(s).
EXPLAIN ANALYZEagain — confirmIndex Scanand time dropped.
Common Slow Query Patterns
Most SQL performance problems fall into four buckets. Know them and you can spot trouble before production.
1. SELECT * on large tables
Fetching all columns forces full-row reads, prevents Index Only Scan, and sends unnecessary bytes over the network.
-- Slow: all columns, can't use index-only scan
SELECT * FROM orders WHERE status = 'paid';
-- Fast: only needed columns; may use index-only scan
SELECT id, placed_at FROM orders WHERE status = 'paid'; 2. Functions on indexed columns
An index stores raw values. Wrapping the column in a function creates a different value the B-tree doesn’t know — index is bypassed.
-- BAD: function prevents index use
SELECT * FROM orders WHERE DATE(placed_at) = '2025-02-10';
-- GOOD: rewrite as a range on the raw column
SELECT * FROM orders
WHERE placed_at >= '2025-02-10'
AND placed_at < '2025-02-11';
-- Alternative (PostgreSQL): expression index
CREATE INDEX idx_orders_placed_date ON orders (DATE(placed_at)); 3. The N+1 query problem
Application code that loops over rows issuing one query per row. 100 orders with per-order item fetches = 101 queries. A single JOIN does the same work in 1 round trip.
-- N+1 (pseudocode):
-- orders = query("SELECT * FROM orders WHERE user_id = 1")
-- for order in orders:
-- items = query("SELECT * FROM order_items WHERE order_id = ?", order.id)
-- Fixed: one query
SELECT o.id, oi.product_id, oi.quantity
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = 1;
-- Or batch:
SELECT * FROM order_items WHERE order_id IN (1001, 1002, 1003); 4. Missing indexes on foreign keys
A PK auto-gets an index. A FK does not in PostgreSQL. Without it, joins do a nested-loop over a sequential scan.
-- Slow without index on orders.user_id:
SELECT u.full_name, o.id
FROM users u JOIN orders o ON o.user_id = u.id;
-- Fix:
CREATE INDEX idx_orders_user_id ON orders (user_id); No auto-index on FK
You must create it. Missing FK indexes are the #1 cause of slow joins in PostgreSQL.
Auto-creates FK index
InnoDB automatically adds an index on FOREIGN KEY columns. One less thing to forget.
- Indexes = sorted B-trees for O(log n) lookups. Faster reads, slower writes.
- Index columns in WHERE, JOIN ON, ORDER BY. Don’t index everything.
- Composite index: leftmost-prefix rule. Leading column first.
- Partial indexes shrink to only the rows you care about.
EXPLAIN ANALYZE= proof. Look for Seq Scan on selective queries.- Four killers: SELECT *, functions on indexed columns, N+1 loops, missing FK indexes.
← Chapter 07 — Schema Design · → Continue to Chapter 09 — Subqueries & CTEs →