Filtering with WHERE
The WHERE clause is how a database becomes useful โ instead of "give me everything", you ask "give me the rows that match these conditions". This chapter covers comparison operators, combining predicates with AND/OR/NOT, pattern matching with LIKE, range/list shortcuts, and the three-valued logic that makes NULL surprising.
WHERE Basics
Think of WHERE as a row-by-row test. The database walks the table, evaluates your predicate against each row, and keeps the row only if the predicate evaluates to TRUE. Both FALSE and NULL rows are discarded โ that's a fact we'll come back to in ยง4.
The six comparison operators
These work on numbers, strings, and dates. SQL is whitespace-tolerant โ formatting is up to you.
-- Equality and inequality
SELECT * FROM products WHERE price = 42.00;
SELECT * FROM products WHERE price != 42.00; -- also written <> in standard SQL
SELECT * FROM products WHERE price <> 42.00; -- identical to !=
-- Range
SELECT * FROM products WHERE price < 20;
SELECT * FROM products WHERE price <= 20;
SELECT * FROM products WHERE price > 30;
SELECT * FROM products WHERE price >= 30; id | sku | name | price | category
----+--------+----------------+-------+----------
103 | MG-001 | Database Mug | 9.50 | merch
Numbers vs strings โ the quoting rule
String and date literals must be wrapped in single quotes. Numbers must not. Double quotes mean something else in most dialects (an identifier โ a column or table name), so don't use them for values.
-- โ
Number: no quotes
SELECT * FROM products WHERE price = 42;
-- โ
String: single quotes
SELECT * FROM products WHERE category = 'books';
-- โ
Date / timestamp: single quotes, ISO-8601 format
SELECT * FROM orders WHERE placed_at >= '2025-02-12';
-- โ This is wrong in PostgreSQL: "books" is interpreted as a column name
SELECT * FROM products WHERE category = "books";
-- ERROR: column "books" does not exist Case sensitivity โ a dialect minefield
Whether 'Books' matches 'books' depends on three things: the database engine, the column's collation, and (for PostgreSQL) which operator you use. This trips up beginners more than any other quirk.
String comparisons are case-sensitive
'books' = 'Books' is FALSE. Use ILIKE (case-insensitive LIKE) or LOWER(col) = LOWER('Books') when you need to match either case.
Depends on the collation
Default collations like utf8mb4_general_ci are case-insensitive โ 'books' = 'Books' is TRUE. Switch to a _bin collation for case-sensitive matching.
Case-sensitive by default for =
But LIKE is case-insensitive for ASCII by default โ the opposite of =. Set PRAGMA case_sensitive_like = ON to change it.
Table & column names
PostgreSQL folds unquoted names to lowercase; SQL Server is case-insensitive; quoted identifiers ("MyColumn") are case-sensitive everywhere. Stick to snake_case and you'll never hit this.
WHERE category = "books" with double quotes. It works in MySQL (which is non-standard here) but breaks in PostgreSQL with a confusing "column does not exist" error. Use single quotes for values, every time.
Combining Conditions
Real questions are rarely a single comparison. "Show me paid orders placed in the last week from users who joined this year" is three conditions glued together. SQL gives you AND, OR, and NOT for that โ same three connectives as every programming language, with one twist around precedence.
-- AND: both must be true
SELECT * FROM orders
WHERE status = 'paid'
AND placed_at >= '2025-02-11';
-- OR: at least one must be true
SELECT * FROM orders
WHERE status = 'paid'
OR status = 'shipped';
-- NOT: invert a condition
SELECT * FROM orders
WHERE NOT status = 'cancelled'; Precedence โ AND binds tighter than OR
Multiply-before-add. Without parentheses, SQL evaluates AND before OR, exactly like 2 + 3 ร 4 = 14, not 20. This is the single most common logical bug beginners write โ and it returns answers that look plausible, so it's easy to miss.
A worked example
Using the sample schema, find every order that was either paid or shipped and was placed on or after Feb 12.
-- โ Without parens โ probably not what you meant
SELECT id, status, placed_at
FROM orders
WHERE status = 'paid'
OR status = 'shipped'
AND placed_at >= '2025-02-12';
-- Returns orders 1001, 1002 (any 'paid', regardless of date)
-- and 1003 (shipped AND late enough). Order 1004 cancelled is out.
-- โ
With parens โ the question we actually asked
SELECT id, status, placed_at
FROM orders
WHERE (status = 'paid' OR status = 'shipped')
AND placed_at >= '2025-02-12';
-- Returns only 1003 โ the only paid-or-shipped order on/after Feb 12. AND and OR in the same query, add the parentheses anyway. Even when they're not strictly required, they make intent obvious to the next reader (often: you, six months from now).
De Morgan's laws โ distributing NOT
This matters whenever you want to exclude a combination of conditions. Suppose a manager asks: "Show me every order that is not a paid order from February." That's a negation of an AND โ and rewriting it as separate conditions is often easier to read and easier to index (Chapter 08). Negating a compound predicate flips the connective:
-- NOT (A AND B) โก (NOT A) OR (NOT B)
-- NOT (A OR B) โก (NOT A) AND (NOT B)
-- "Orders that are NOT (paid AND placed in Feb)":
SELECT * FROM orders
WHERE NOT (status = 'paid' AND placed_at >= '2025-02-01');
-- Equivalent โ distributed NOT:
SELECT * FROM orders
WHERE status != 'paid'
OR placed_at < '2025-02-01'; Patterns, Ranges & Lists
Equality covers a lot, but real questions need more. "Emails ending in @example.com", "orders placed between Feb 10 and Feb 14", "products in books or merch but not electronics" โ each of these has a clean shortcut.
LIKE & wildcards
LIKE compares against a pattern with two wildcards:
%โ matches zero or more of any character_โ matches exactly one character
-- Emails ending in @example.com
SELECT * FROM users WHERE email LIKE '%@example.com';
-- Names starting with "L"
SELECT * FROM users WHERE full_name LIKE 'L%';
-- 6-character SKU starting with 'BK-'
SELECT * FROM products WHERE sku LIKE 'BK-___';
-- Negation
SELECT * FROM users WHERE email NOT LIKE '%@example.com'; - PostgreSQL: use
ILIKEโemail ILIKE '%@EXAMPLE.com'. - MySQL:
LIKEis already case-insensitive under default collations. - SQLite:
LIKEis case-insensitive for ASCII; case-sensitive for Unicode.PRAGMA case_sensitive_like = ONflips it.
WHERE name LIKE 'Ada' with no wildcards. It's just a slower =. Use = for exact matches.
BETWEEN โฆ AND โ inclusive ranges
BETWEEN x AND y is shorthand for col >= x AND col <= y โ both endpoints are included. It works on numbers, dates, and even strings (alphabetic range).
-- Orders placed Feb 10 through Feb 14 (inclusive)
SELECT * FROM orders
WHERE placed_at BETWEEN '2025-02-10' AND '2025-02-14';
-- Products priced $10 through $40 (inclusive)
SELECT * FROM products WHERE price BETWEEN 10 AND 40;
-- Negation
SELECT * FROM products WHERE price NOT BETWEEN 10 AND 40; placed_at BETWEEN '2025-02-10' AND '2025-02-14' includes Feb 14 โ but only the instant 2025-02-14 00:00:00. An order at 2025-02-14 16:45:00 is excluded! Prefer half-open ranges for timestamps:
placed_at >= '2025-02-10' AND placed_at < '2025-02-15'.
IN (list) โ multiple equality checks
IN is shorthand for a chain of OR col = value. Cleaner to write, faster to read, and the engine can usually evaluate it more efficiently than the equivalent OR chain.
-- Long form
SELECT * FROM orders
WHERE status = 'paid' OR status = 'shipped' OR status = 'pending';
-- Same query, idiomatic
SELECT * FROM orders WHERE status IN ('paid', 'shipped', 'pending');
-- Negation
SELECT * FROM orders WHERE status NOT IN ('cancelled'); NOT IN: if the list contains NULL, NOT IN returns zero rows. Why? NOT IN (1, 2, NULL) expands to x != 1 AND x != 2 AND x != NULL, and that last comparison is always NULL, dragging the whole AND down to NULL โ row dropped. Filter NULLs out of the list, or use NOT EXISTS (covered in Chapter 09).
Three-Valued Logic & IS NULL
Chapter 01 introduced the rule: NULL means "unknown", and any comparison against NULL yields NULL, not TRUE or FALSE. Now we have to deal with what happens when NULL meets AND, OR, and NOT. This is the part of SQL where pure boolean intuition stops working โ but a single truth table covers all of it.
Two practical consequences
Both of these look "obviously equivalent" โ they aren't.
-- โ WHERE x = 5 OR x != 5
-- You might think this matches every row. It does NOT match rows where x IS NULL,
-- because both halves evaluate to NULL and OR(NULL, NULL) = NULL.
-- โก WHERE NOT status = 'paid'
-- This drops rows where status IS NULL โ because NOT NULL is still NULL.
-- If you want "everything except paid, including unknown", you need:
SELECT * FROM orders WHERE status != 'paid' OR status IS NULL; IS NULL and IS NOT NULL โ the only safe checks
IS NULL never returns NULL โ it returns TRUE or FALSE, every time. It's the single escape hatch out of three-valued logic.
-- Find users with no recorded signup date
SELECT * FROM users WHERE created_at IS NULL;
-- Same, negated
SELECT * FROM users WHERE created_at IS NOT NULL;
-- Combine with other predicates as usual
SELECT * FROM users
WHERE created_at IS NOT NULL
AND created_at >= '2025-01-01'; COALESCE: SQL has a helper called COALESCE(a, b, โฆ) that returns the first non-NULL argument, so you can "substitute" a value for NULL. It's most useful when you're shaping the SELECT list around aggregates โ we'll introduce it properly in Chapter 04, where NULLs in COUNT/SUM/AVG create real-world reasons to use it. For now, just remember the name.
- Single quotes for values, double quotes (or none) for identifiers.
- Parentheses any time you mix
ANDandOR. - Half-open ranges (
>= start AND < end) for timestamps,BETWEENfor integers. - NULL drops the row in
WHERE, inNOT, and insideNOT IN.IS NULLis the only safe check.
โ Chapter 01 โ What Is a Database ยท ๐ Continue to Chapter 03 โ Sorting & Limiting โ