Window Functions Deep Dive
Chapter 04 introduced window functions as “aggregates without collapsing.” This chapter goes all the way: the full anatomy of OVER(), frame clauses that control which rows the function sees, all the ranking functions, value-access functions like LAG/LEAD, and named windows for DRY queries.
Window Function Anatomy
Every window function has the form function() OVER (window_definition). The window definition has three optional parts:
PARTITION BY col— splits rows into independent groups (like GROUP BY, but without collapsing). If omitted, the entire result set is one partition.ORDER BY col— defines the logical order of rows within each partition. Required for ranking and running aggregates.- Frame clause — defines which subset of rows within the partition the function operates on relative to the current row. The default depends on whether ORDER BY is present.
Frame clauses: ROWS BETWEEN vs RANGE BETWEEN
The frame defines a sliding window of rows the aggregate sees for each current row. Two modes:
Physical offsets
Counts actual rows. ROWS BETWEEN 2 PRECEDING AND CURRENT ROW = the current row plus the 2 rows before it (exactly 3 rows). Deterministic even with ties.
Logical (value) offsets
Includes all rows whose ORDER BY value falls within the range. RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW = all rows within the last 7 days. Includes ties automatically.
The default frame (the counterintuitive bit)
When you write ORDER BY inside OVER but no explicit frame clause, the default is:
- With ORDER BY:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW— a running total from the start up to (and including) all rows tied with the current row. - Without ORDER BY:
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING— the entire partition.
placed_at value, both see the same running total. Use ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW if you want strictly one-row-at-a-time accumulation.
-- Running total (default frame = RANGE, includes ties)
SELECT placed_at,
SUM(total) OVER (ORDER BY placed_at) AS running_total_range
FROM daily_revenue;
-- Strictly row-by-row running total (ROWS frame)
SELECT placed_at,
SUM(total) OVER (ORDER BY placed_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS running_total_rows
FROM daily_revenue;
-- 7-day moving average (RANGE with interval)
SELECT placed_at,
AVG(total) OVER (ORDER BY placed_at
RANGE BETWEEN INTERVAL '6 days' PRECEDING
AND CURRENT ROW)
AS moving_avg_7d
FROM daily_revenue; Named windows with WINDOW ... AS
If multiple columns share the same window definition, name it once and reuse it — DRY and less error-prone.
SELECT id,
placed_at,
SUM(total) OVER w AS running_total,
AVG(total) OVER w AS running_avg,
COUNT(*) OVER w AS running_count
FROM daily_revenue
WINDOW w AS (ORDER BY placed_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
ORDER BY placed_at; WINDOW clause is supported in PostgreSQL, MySQL 8+, and SQLite 3.28+. SQL Server does not support it — repeat the OVER inline.
Ranking Functions
Chapter 04 introduced ROW_NUMBER, RANK, and DENSE_RANK with a tie-handling comparison. This section adds NTILE, PERCENT_RANK, and CUME_DIST — the percentile-bucket functions.
Quick recap: the three basic rankings
SELECT name, price,
ROW_NUMBER() OVER (ORDER BY price DESC) AS rn, -- never ties: 1,2,3,4,5
RANK() OVER (ORDER BY price DESC) AS rk, -- ties share, skip: 1,2,2,4,5
DENSE_RANK() OVER (ORDER BY price DESC) AS drk -- ties share, no skip: 1,2,2,3,4
FROM products; NTILE(n) — divide into equal-sized buckets
Splits the partition into n roughly equal groups and assigns a bucket number 1..n. Useful for percentile analysis.
-- Split orders into 4 quartiles by revenue
WITH order_rev AS (
SELECT order_id, SUM(quantity * unit_price) AS total
FROM order_items GROUP BY order_id
)
SELECT order_id, total,
NTILE(4) OVER (ORDER BY total) AS quartile
FROM order_rev; order_id | total | quartile
---------+-------+---------
1004 | 9.50 | 1
1002 | 42.00 | 2
1001 | 53.99 | 3
1003 | 76.99 | 4
PERCENT_RANK and CUME_DIST
Both return a value between 0 and 1 representing a row’s position in the sorted partition:
(rank - 1) / (total - 1)
The relative rank as a fraction. First row = 0, last row = 1. Useful for “this product is more expensive than X% of all products.”
rows ≤ current / total
Cumulative distribution. The fraction of rows that have a value ≤ the current row’s value. Always > 0, last row = 1.
SELECT name, price,
PERCENT_RANK() OVER (ORDER BY price) AS pct_rank,
CUME_DIST() OVER (ORDER BY price) AS cume_dist
FROM products
ORDER BY price; name | price | pct_rank | cume_dist
---------------+-------+----------+----------
Database Mug | 9.50 | 0.00 | 0.33
SQL Cookbook | 34.99 | 0.50 | 0.67
Designing Data | 42.00 | 1.00 | 1.00
Top-N per group (the killer pattern)
Chapter 04 previewed this. Here’s the full production pattern: rank within each group, then filter to keep only the top N.
-- Top 2 most expensive products per category
WITH ranked AS (
SELECT category, name, price,
ROW_NUMBER() OVER (PARTITION BY category
ORDER BY price DESC) AS rn
FROM products
)
SELECT category, name, price
FROM ranked
WHERE rn <= 2; Value Functions
Value functions access a specific row in the window without collapsing. They answer questions like “what was the previous row’s value?” or “what was the first value in this partition?”
LAG / LEAD — peek at adjacent rows
LAG(col, offset, default) looks back N rows. LEAD looks forward. Default offset is 1; default value is NULL.
-- Compare each order's total to the previous order's total
WITH order_totals AS (
SELECT o.id, o.placed_at,
SUM(oi.quantity * oi.unit_price) AS total
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.status != 'cancelled'
GROUP BY o.id, o.placed_at
)
SELECT id,
placed_at,
total,
LAG(total) OVER (ORDER BY placed_at) AS prev_total,
total - LAG(total) OVER (ORDER BY placed_at) AS change
FROM order_totals
ORDER BY placed_at; id | placed_at | total | prev_total | change
-----+---------------------+-------+------------+-------
1001 | 2025-02-10 09:00:00 | 53.99 | NULL | NULL
1002 | 2025-02-11 11:30:00 | 42.00 | 53.99 | -11.99
1003 | 2025-02-14 16:45:00 | 76.99 | 42.00 | 34.99
The first row has LAG = NULL because there’s no previous row. You can supply a default: LAG(total, 1, 0) would return 0 instead of NULL.
FIRST_VALUE / LAST_VALUE
Return the first or last value in the current window frame.
-- Each product vs the cheapest in its category
SELECT category, name, price,
FIRST_VALUE(name) OVER (PARTITION BY category ORDER BY price)
AS cheapest_in_cat,
price - FIRST_VALUE(price) OVER (PARTITION BY category ORDER BY price)
AS premium_over_cheapest
FROM products; The LAST_VALUE frame gotcha
This is the most common window-function bug. LAST_VALUE with the default frame (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) returns the current row’s value — because the frame ends at the current row! To get the true last value in the partition, extend the frame explicitly:
-- BROKEN: LAST_VALUE with default frame = current row's value (useless)
SELECT name, price,
LAST_VALUE(name) OVER (PARTITION BY category ORDER BY price)
AS "wrong_most_expensive"
FROM products;
-- Every row says its OWN name because the frame ends at the current row.
-- FIXED: extend frame to end of partition
SELECT name, price,
LAST_VALUE(name) OVER (
PARTITION BY category ORDER BY price
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS most_expensive_in_cat
FROM products; LAST_VALUE returns the wrong answer without an explicit frameNTH_VALUE
Access the Nth row in the frame. Same frame gotcha applies — always specify an explicit frame if you want to scan the whole partition.
-- Second cheapest product per category
SELECT category, name, price,
NTH_VALUE(name, 2) OVER (
PARTITION BY category ORDER BY price
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS second_cheapest
FROM products; OVER()= function + partition + order + frame. All optional, but order is required for ranking/running aggregates.- Default frame with ORDER BY = RANGE to CURRENT ROW (includes ties). Use ROWS for strict row-by-row logic.
- ROWS = physical offset (count rows). RANGE = logical offset (value-based, includes ties).
- Named windows (
WINDOW w AS (...)) eliminate repetition across multiple columns. - Ranking: ROW_NUMBER (no ties), RANK (skip), DENSE_RANK (no skip), NTILE (buckets), PERCENT_RANK / CUME_DIST (0..1 position).
- Top-N per group = ROW_NUMBER + PARTITION BY + filter in outer query.
- LAG/LEAD: compare to previous/next row. FIRST_VALUE: start of partition. LAST_VALUE: always specify full frame or you get the current row.
← Chapter 09 — Subqueries & CTEs · → Continue to Chapter 11 — Advanced SQL Patterns →