SQL · Expert

Advanced SQL Patterns

This chapter covers four patterns you encounter the moment your SQL leaves a textbook and enters a real codebase: upsert/merge for idempotent data pipelines, JSON columns for semi-structured data, pivot/unpivot for reshaping result sets, and LATERAL joins for correlated derived tables that solve top-N-per-group without window functions.

01
Section One · Idempotent writes

UPSERT & MERGE

An upsert inserts a row if it doesn’t exist, and updates it if it does — in a single atomic statement. It’s the foundation of idempotent data pipelines: re-running the same import produces the same result, with no duplicate-key errors and no lost updates.

PostgreSQL / SQLite: INSERT ... ON CONFLICT

-- Insert a new product, or update its price if the SKU already exists
INSERT INTO products (id, sku, name, price, category)
VALUES (101, 'BK-001', 'SQL Cookbook', 36.99, 'books')
ON CONFLICT (sku) DO UPDATE
  SET price = EXCLUDED.price,
      name  = EXCLUDED.name;

-- EXCLUDED refers to the row that FAILED to insert (the new values).
-- Only the columns you list in SET are updated; others stay unchanged.
DO NOTHING

Skip duplicates silently

ON CONFLICT (col) DO NOTHING — if the row exists, ignore the insert. No error, no update. Useful for “insert if missing” patterns.

DO UPDATE

True upsert

ON CONFLICT (col) DO UPDATE SET ... — if the row exists, update specified columns with the new values. The row is guaranteed to reflect the latest data.

MySQL: INSERT ... ON DUPLICATE KEY UPDATE

INSERT INTO products (id, sku, name, price, category)
VALUES (101, 'BK-001', 'SQL Cookbook', 36.99, 'books')
ON DUPLICATE KEY UPDATE
  price = VALUES(price),
  name  = VALUES(name);
-- VALUES(col) refers to the value that was attempted for insertion.

SQL:2003 MERGE statement

MERGE is the standard’s answer to upsert. It matches a source against a target and specifies what to do on match/no-match. Supported in SQL Server, Oracle, PostgreSQL 15+, and DB2. Not in MySQL or SQLite.

-- PostgreSQL 15+ / SQL Server / Oracle
MERGE INTO products AS target
USING (VALUES (101, 'BK-001', 'SQL Cookbook', 36.99, 'books'))
      AS source (id, sku, name, price, category)
ON target.sku = source.sku
WHEN MATCHED THEN
  UPDATE SET price = source.price, name = source.name
WHEN NOT MATCHED THEN
  INSERT (id, sku, name, price, category)
  VALUES (source.id, source.sku, source.name, source.price, source.category);

Idempotent pipelines

The value of upsert is idempotency: running the same import twice produces the same end state. This is essential for:

  • ETL jobs that might be retried after a partial failure.
  • Event-sourced systems that replay events on restart.
  • CDC (Change Data Capture) streams applying changes to a warehouse.
Common mistake: using ON CONFLICT DO UPDATE SET col = col (updating a column to its current value). This still fires triggers and increments the row version — it’s not a no-op. Use DO NOTHING if you truly want to skip, or add a WHERE clause to only update when values actually changed.
02
Section Two · Semi-structured data

JSON in SQL

Sometimes a column needs to hold flexible, nested data that doesn’t fit a fixed set of columns — user preferences, API responses, event payloads. Storing it as JSON (or JSONB in PostgreSQL) gives you both the flexibility of a document store and the query power of SQL.

Column types

JSON

Stored as text, validated on write

Preserves exact formatting (whitespace, key order). Slower to query because it’s parsed every time. Available in PostgreSQL, MySQL 5.7+.

JSONB (PostgreSQL)

Binary, decomposed, indexable

Stored in a binary format. Doesn’t preserve key order or whitespace. Faster queries, supports GIN indexes for containment checks. Prefer JSONB in PostgreSQL.

Querying nested JSON

-- Hypothetical: users table with a JSONB preferences column
ALTER TABLE users ADD COLUMN prefs JSONB DEFAULT '{}';
UPDATE users SET prefs = '{"theme": "dark", "lang": "en", "notifications": {"email": true, "sms": false}}' WHERE id = 1;

-- Extract a top-level key as text
SELECT prefs ->> 'theme' AS theme FROM users WHERE id = 1;
-- 'dark'

-- Extract a nested key
SELECT prefs -> 'notifications' ->> 'email' AS email_notif FROM users WHERE id = 1;
-- 'true' (as text)

-- Filter by JSON value
SELECT * FROM users WHERE prefs ->> 'theme' = 'dark';

-- Containment operator (JSONB only, uses GIN index)
SELECT * FROM users WHERE prefs @> '{"theme": "dark"}';

-- Array access
-- prefs = '{"tags": ["sql", "postgres"]}'
SELECT prefs -> 'tags' ->> 0 AS first_tag FROM users WHERE id = 1;
-- 'sql'
->

Returns JSON

Extracts a key/index and returns the result as a JSON value (preserving type). Chain with another -> to go deeper.

->>

Returns TEXT

Extracts and casts to text. Use this for comparisons, WHERE clauses, and display. Always the final operator in a chain.

Indexing JSON for performance

-- GIN index on the entire JSONB column (supports @>, ?, ?|, ?& operators)
CREATE INDEX idx_users_prefs ON users USING GIN (prefs);

-- Expression index on a specific key (for ->> equality queries)
CREATE INDEX idx_users_theme ON users ((prefs ->> 'theme'));

-- Now this is fast:
SELECT * FROM users WHERE prefs ->> 'theme' = 'dark';
-- Uses idx_users_theme (expression index)
When to use JSON vs separate tables: use JSON for truly unstructured/variable data (user preferences, third-party payloads) where you rarely join on individual fields. If you find yourself writing WHERE prefs ->> 'x' = ? in every query, that field should be a real column with a proper index.
03
Section Three · Reshaping result sets

Pivot & Unpivot

Pivoting turns row values into columns (long → wide). Unpivoting does the reverse. SQL doesn’t have a universal PIVOT keyword — you build it with conditional aggregation.

Pivot with CASE WHEN + aggregate

Problem: “Show total revenue per user, broken out by order status as columns.”

-- Crosstab using CASE WHEN (works everywhere)
SELECT u.full_name,
       SUM(CASE WHEN o.status = 'paid'      THEN oi.quantity * oi.unit_price ELSE 0 END) AS paid_revenue,
       SUM(CASE WHEN o.status = 'shipped'   THEN oi.quantity * oi.unit_price ELSE 0 END) AS shipped_revenue,
       SUM(CASE WHEN o.status = 'cancelled' THEN oi.quantity * oi.unit_price ELSE 0 END) AS cancelled_revenue
FROM   users u
JOIN   orders o  ON o.user_id = u.id
JOIN   order_items oi ON oi.order_id = o.id
GROUP BY u.full_name
ORDER BY u.full_name;
full_name      | paid_revenue | shipped_revenue | cancelled_revenue
---------------+--------------+-----------------+------------------
Ada Lovelace   |        53.99 |           76.99 |              0.00
Grace Hopper   |         0.00 |            0.00 |              9.50
Linus Torvalds |        42.00 |            0.00 |              0.00

FILTER clause (PostgreSQL shortcut)

PostgreSQL offers FILTER (WHERE ...) on any aggregate — cleaner than CASE WHEN and often faster.

-- Same result, cleaner syntax (PostgreSQL only)
SELECT u.full_name,
       SUM(oi.quantity * oi.unit_price) FILTER (WHERE o.status = 'paid')      AS paid_revenue,
       SUM(oi.quantity * oi.unit_price) FILTER (WHERE o.status = 'shipped')   AS shipped_revenue,
       SUM(oi.quantity * oi.unit_price) FILTER (WHERE o.status = 'cancelled') AS cancelled_revenue
FROM   users u
JOIN   orders o  ON o.user_id = u.id
JOIN   order_items oi ON oi.order_id = o.id
GROUP BY u.full_name;

Unpivot (wide → long)

Turn columns back into rows. Standard SQL uses LATERAL + VALUES; some dialects have UNPIVOT.

-- Given a wide table: monthly_sales(product, jan, feb, mar)
-- Turn each month column into a row:
SELECT product, month, revenue
FROM   monthly_sales
CROSS JOIN LATERAL (
  VALUES ('jan', jan), ('feb', feb), ('mar', mar)
) AS unpivoted(month, revenue);
PostgreSQL extension: the tablefunc module provides crosstab() for dynamic column pivots where the number of output columns is determined at query time. Useful for reports, but overkill for most application queries.
04
Section Four · Correlated derived tables

LATERAL Joins

A normal derived table (subquery in FROM) cannot reference columns from other tables in the same FROM clause. LATERAL removes that restriction — the derived table can “see” the current row of the table to its left. Think of it as a correlated subquery that returns multiple columns and rows.

What LATERAL does

-- Without LATERAL: the subquery can't reference u.id
-- (Error: "invalid reference to FROM-clause entry")
SELECT u.full_name, recent.*
FROM   users u
JOIN   (
  SELECT id, placed_at FROM orders WHERE user_id = u.id  -- ERROR
  ORDER BY placed_at DESC LIMIT 3
) recent ON true;

-- With LATERAL: the subquery CAN reference u.id
SELECT u.full_name, recent.*
FROM   users u
CROSS JOIN LATERAL (
  SELECT id, placed_at FROM orders
  WHERE  user_id = u.id
  ORDER BY placed_at DESC
  LIMIT 3
) recent;

Top-N per group — the LATERAL way

Chapter 10 solved top-N per group with ROW_NUMBER. LATERAL offers a simpler, often faster alternative — especially when an index exists on the correlated column.

-- Top 2 most recent orders per user (LATERAL approach)
SELECT u.full_name, o.id AS order_id, o.placed_at
FROM   users u
CROSS JOIN LATERAL (
  SELECT id, placed_at
  FROM   orders
  WHERE  user_id = u.id
  ORDER BY placed_at DESC
  LIMIT 2
) o
ORDER BY u.full_name, o.placed_at DESC;
full_name      | order_id | placed_at
---------------+----------+---------------------
Ada Lovelace   |     1003 | 2025-02-14 16:45:00
Ada Lovelace   |     1001 | 2025-02-10 09:00:00
Grace Hopper   |     1004 | 2025-02-15 08:10:00
Linus Torvalds |     1002 | 2025-02-11 11:30:00
LATERAL advantage

Uses index + LIMIT directly

The inner query does an index scan on (user_id, placed_at DESC) and stops after N rows. No need to rank the entire table first.

ROW_NUMBER advantage

More portable

Works in MySQL 8+ (which doesn’t support LATERAL until 8.0.14) and SQL Server. Doesn’t require a composite index to be efficient.

LATERAL vs correlated subquery

A correlated subquery in SELECT returns one value. LATERAL returns a full result set (multiple rows, multiple columns). Use LATERAL when you need more than a single scalar from the correlated computation.

-- Correlated subquery: one value per user
SELECT u.full_name,
       (SELECT MAX(placed_at) FROM orders WHERE user_id = u.id) AS last_order
FROM   users u;

-- LATERAL: multiple columns and rows per user
SELECT u.full_name, lo.id, lo.placed_at, lo.status
FROM   users u
CROSS JOIN LATERAL (
  SELECT id, placed_at, status FROM orders
  WHERE  user_id = u.id
  ORDER BY placed_at DESC LIMIT 1
) lo;
Dialect support:
  • PostgreSQL: full LATERAL support since 9.3.
  • MySQL: LATERAL supported since 8.0.14.
  • SQLite: no LATERAL support. Use correlated subqueries or ROW_NUMBER.
  • SQL Server: CROSS APPLY / OUTER APPLY = equivalent of LATERAL.
Mental shortcut for the whole chapter:
  • Upsert: ON CONFLICT DO UPDATE (PG/SQLite), ON DUPLICATE KEY UPDATE (MySQL), MERGE (standard). Makes pipelines idempotent.
  • JSON: use JSONB in PostgreSQL. -> returns JSON, ->> returns text. Index with GIN or expression indexes. Don’t store data in JSON that you frequently filter or join on.
  • Pivot: SUM(CASE WHEN status = 'x' THEN val END) or FILTER (WHERE ...) in PostgreSQL. Unpivot with LATERAL + VALUES.
  • LATERAL: a correlated derived table that sees the current outer row. Perfect for top-N-per-group with an index. CROSS APPLY in SQL Server.

Chapter 10 — Window Functions  ·  → Continue to Chapter 12 — Real-World SQL →