SQL · Advanced

Indexes & Performance

An index is a data structure the database maintains alongside a table so it can answer certain queries without scanning every row. This chapter explains how indexes work (B-trees), when to create them, how to read the query plan that proves they’re being used, and the four slow-query patterns that trip up every team eventually.

01
Section One · The B-tree mental model

How Indexes Work

root 1..50 51..100 1..25 26..50 51..75 76..100 B-tree: O(log n) lookups regardless of table size

Without an index, the database must sequentially scan every row to find matches. A table with 10 million rows means reading 10 million rows — even if only 3 match. An index is like the index at the back of a textbook: you look up the term, get a page number, and jump directly there.

The most common structure is the B-tree (balanced tree). Keys are stored in sorted order across a hierarchy of pages. Finding a value takes O(log n) page reads — for 10 million rows that’s about 3–4 reads instead of millions.

Which queries benefit?

Index helps

Equality: WHERE email = ?

Direct B-tree lookup — O(log n). The most common indexed pattern.

Index helps

Range: WHERE price BETWEEN 10 AND 50

Walk sorted leaf nodes from 10 to 50 — skip everything outside the range.

Index helps

Sorting: ORDER BY placed_at LIMIT 10

Index is already sorted — read 10 entries from the end. No full sort needed.

Index does NOT help

Full table: SELECT * FROM orders

If you need every row, a sequential scan is faster (less random I/O jumping).

The cost on writes

Every INSERT/UPDATE/DELETE must also update every affected index. A table with 5 indexes does roughly 5× the write work of an unindexed table. Most tables benefit from 2–5 well-chosen indexes; 20 is too many.

Rule of thumb: index columns in WHERE, JOIN ON, and ORDER BY. Don’t index columns you only SELECT. Don’t index low-cardinality columns alone (e.g. a boolean).
02
Section Two · Syntax and strategy

Creating Indexes

Basic syntax

-- Single-column index
CREATE INDEX idx_orders_user_id ON orders (user_id);

-- Unique index (enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email ON users (email);

-- Drop
DROP INDEX idx_orders_user_id;

Composite indexes — column order matters

A composite index is sorted by all columns in order. It satisfies queries on a leftmost prefix of those columns — but not ones that skip the leading column.

CREATE INDEX idx_orders_status_placed ON orders (status, placed_at);
Uses the index

WHERE status = 'paid' AND placed_at >= ...

Both columns match the leftmost prefix. Jump to 'paid', range-scan dates.

Uses the index

WHERE status = 'shipped'

Leading column alone. Jumps to 'shipped' section, scans all dates within it.

Cannot use this index

WHERE placed_at >= '2025-02-10'

Skips the leading column (status). The B-tree is sorted status-first — can’t jump in without it.

Cannot use this index

ORDER BY placed_at DESC LIMIT 10

The B-tree is sorted status-first — can’t extract the global top-10 by date alone. Needs a separate (placed_at) index.

The leftmost-prefix rule: index on (A, B, C) satisfies (A), (A, B), (A, B, C) — but not (B), (B, C), or (C). Put the most-filtered column first.

Partial indexes

Index only the rows you actually query. Smaller, faster, cheaper to maintain.

-- PostgreSQL / SQLite: index only pending orders
CREATE INDEX idx_orders_pending
  ON orders (placed_at)
  WHERE status = 'pending';

-- The query that benefits:
SELECT * FROM orders WHERE status = 'pending' ORDER BY placed_at;
-- Planner sees the partial index matches the WHERE and uses it.

-- MySQL: no partial indexes. Workaround: generated columns or covering indexes.
03
Section Three · Proof, not guesswork

Reading Query Plans

EXPLAIN shows the plan the database intends to use. EXPLAIN ANALYZE runs the query and shows real timing. Never guess whether an index is used — ask the planner.

-- PostgreSQL
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 1 ORDER BY placed_at DESC;

-- MySQL
EXPLAIN
SELECT * FROM orders WHERE user_id = 1 ORDER BY placed_at DESC;

Two scan types to recognise

Sequential Scan (Seq Scan)

Reads every row

Scans the table end-to-end. Fine for small tables or full-table queries. Bad for large tables with selective WHERE — means no usable index exists.

Index Scan / Index Only Scan

Jumps to matching rows

Walks the B-tree to find matches, then fetches table rows. Index Only Scan is better — all columns are in the index; no table fetch needed.

Annotated example (PostgreSQL)

                                    QUERY PLAN
---------------------------------------------------------------------------
 Index Scan using idx_orders_user_id on orders  (cost=0.15..8.17 rows=2)
   Index Cond: (user_id = 1)
   Filter: (status = 'paid')
   Rows Removed by Filter: 1
   Planning Time: 0.08 ms
   Execution Time: 0.03 ms
  • Node type: Index Scan — good, using an index.
  • Index: idx_orders_user_id — confirms which one.
  • Index Cond: user_id = 1 — the part satisfied by the index.
  • Filter: status = 'paid' — applied after the index fetch. Adding status to the index would eliminate this extra filter.
  • cost: 0.15..8.17 = startup..total in abstract units. Compare between plans.
  • Execution Time: 0.03 ms — the real wall-clock time.
Investigation workflow:
  1. EXPLAIN ANALYZE the slow query.
  2. Find Seq Scan on a large table with a selective WHERE — that’s the bottleneck.
  3. Create an index on the filtered column(s).
  4. EXPLAIN ANALYZE again — confirm Index Scan and time dropped.
04
Section Four · What to watch for

Common Slow Query Patterns

Most SQL performance problems fall into four buckets. Know them and you can spot trouble before production.

1. SELECT * on large tables

Fetching all columns forces full-row reads, prevents Index Only Scan, and sends unnecessary bytes over the network.

-- Slow: all columns, can't use index-only scan
SELECT * FROM orders WHERE status = 'paid';

-- Fast: only needed columns; may use index-only scan
SELECT id, placed_at FROM orders WHERE status = 'paid';

2. Functions on indexed columns

An index stores raw values. Wrapping the column in a function creates a different value the B-tree doesn’t know — index is bypassed.

-- BAD: function prevents index use
SELECT * FROM orders WHERE DATE(placed_at) = '2025-02-10';

-- GOOD: rewrite as a range on the raw column
SELECT * FROM orders
WHERE placed_at >= '2025-02-10'
  AND placed_at <  '2025-02-11';

-- Alternative (PostgreSQL): expression index
CREATE INDEX idx_orders_placed_date ON orders (DATE(placed_at));
General rule: never wrap an indexed column in a function on the left side of a comparison. Move the transformation to the right side, or create an expression index.

3. The N+1 query problem

Application code that loops over rows issuing one query per row. 100 orders with per-order item fetches = 101 queries. A single JOIN does the same work in 1 round trip.

-- N+1 (pseudocode):
-- orders = query("SELECT * FROM orders WHERE user_id = 1")
-- for order in orders:
--     items = query("SELECT * FROM order_items WHERE order_id = ?", order.id)

-- Fixed: one query
SELECT o.id, oi.product_id, oi.quantity
FROM   orders o
JOIN   order_items oi ON oi.order_id = o.id
WHERE  o.user_id = 1;

-- Or batch:
SELECT * FROM order_items WHERE order_id IN (1001, 1002, 1003);

4. Missing indexes on foreign keys

A PK auto-gets an index. A FK does not in PostgreSQL. Without it, joins do a nested-loop over a sequential scan.

-- Slow without index on orders.user_id:
SELECT u.full_name, o.id
FROM   users u JOIN orders o ON o.user_id = u.id;

-- Fix:
CREATE INDEX idx_orders_user_id ON orders (user_id);
PostgreSQL

No auto-index on FK

You must create it. Missing FK indexes are the #1 cause of slow joins in PostgreSQL.

MySQL (InnoDB)

Auto-creates FK index

InnoDB automatically adds an index on FOREIGN KEY columns. One less thing to forget.

Mental shortcut for the whole chapter:
  • Indexes = sorted B-trees for O(log n) lookups. Faster reads, slower writes.
  • Index columns in WHERE, JOIN ON, ORDER BY. Don’t index everything.
  • Composite index: leftmost-prefix rule. Leading column first.
  • Partial indexes shrink to only the rows you care about.
  • EXPLAIN ANALYZE = proof. Look for Seq Scan on selective queries.
  • Four killers: SELECT *, functions on indexed columns, N+1 loops, missing FK indexes.

Chapter 07 — Schema Design  ·  → Continue to Chapter 09 — Subqueries & CTEs →