Schema Design
Every chapter so far used a schema someone else wrote. Now you design one. CREATE TABLE defines columns and types; constraints (PK, FK, UNIQUE, CHECK, NOT NULL) enforce business rules at the database level; and normalisation tells you when to split a table in two. By the end you'll be able to look at a business requirement and turn it into tables that won't rot.
CREATE TABLE & Data Types
A CREATE TABLE statement declares the table name, its columns (with types), and any inline constraints. The syntax is the same across dialects; the data types diverge at the edges.
CREATE TABLE users (
id INTEGER PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
full_name TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
); Common data types
Whole numbers
IDs, counts, quantities. Use BIGINT for tables you expect to exceed 2 billion rows.
Exact decimals
Money. Never use FLOAT for currency — you'll get rounding errors. NUMERIC(10,2) stores up to 99,999,999.99.
Strings
TEXT is unbounded (PostgreSQL prefers it). VARCHAR(255) is bounded. In PostgreSQL there is no performance difference; in MySQL there is.
true / false
PostgreSQL: native BOOLEAN. MySQL: TINYINT(1) storing 0/1. SQLite: no native type — use 0/1 with a CHECK constraint.
Date + time
Always use TIMESTAMPTZ (with timezone) in PostgreSQL. MySQL's TIMESTAMP auto-converts to UTC. SQLite stores as TEXT in ISO-8601.
128-bit unique ID
Native in PostgreSQL. In MySQL use CHAR(36) or BINARY(16). Great for distributed systems where auto-increment conflicts.
Type coercion traps
Strict — errors on mismatch
INSERT INTO users (id) VALUES ('abc') fails. You must cast explicitly: '42'::integer. This catches bugs early.
Silently coerces or truncates
Inserting 'abc' into an INTEGER gives 0 with a warning. VARCHAR(5) truncates longer strings in non-strict mode. Enable STRICT_TRANS_TABLES.
Dynamic typing
You can insert 'hello' into an INTEGER column. It stores and returns the string unchanged. Use CHECK constraints as guardrails.
Develop strict, deploy strict
PostgreSQL's strictness is a feature. If your tests pass there, they'll pass everywhere with small syntax adjustments.
Constraints
A constraint is a rule the database enforces at write time. If a write violates it, the statement errors and the row is rejected — no corrupt data ever reaches disk.
PRIMARY KEY
Uniquely identifies each row. Implicitly NOT NULL + UNIQUE. One per table — single column or composite.
-- Single-column PK
CREATE TABLE users (
id INTEGER PRIMARY KEY,
email TEXT NOT NULL
);
-- Composite PK
CREATE TABLE order_items (
order_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
quantity INTEGER NOT NULL,
PRIMARY KEY (order_id, product_id)
); UNIQUE
No two rows share the same value. NULL handling differs: PostgreSQL/SQLite allow multiple NULLs (they're not "equal"); MySQL treats two NULLs as a duplicate.
CREATE TABLE users (
id INTEGER PRIMARY KEY,
email TEXT NOT NULL UNIQUE
); NOT NULL
Prevents the three-valued-logic problems from Chapter 02 at the source. If a column should always have a value, declare it NOT NULL.
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
user_id INTEGER NOT NULL,
status TEXT NOT NULL,
placed_at TIMESTAMP NOT NULL
); CHECK
An arbitrary boolean expression each row must satisfy. Use for domain rules beyond what types express.
CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
price NUMERIC(10,2) NOT NULL CHECK (price >= 0),
category TEXT CHECK (category IN ('books', 'merch', 'electronics'))
);
-- Multi-column check
CREATE TABLE events (
id INTEGER PRIMARY KEY,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
CHECK (end_date >= start_date)
); DEFAULT
Supplies a value when an INSERT omits the column. Common: CURRENT_TIMESTAMP, 'pending', 0, true.
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
status TEXT NOT NULL DEFAULT 'pending',
placed_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
-- Insert without specifying status or placed_at:
INSERT INTO orders DEFAULT VALUES;
-- status = 'pending', placed_at = now() Foreign Keys
A foreign key says "the value in this column must exist as a PK (or unique) in that other table." It enforces relationships: you can't create an order for a user that doesn't exist, and you can't (by default) delete a user who still has orders.
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id),
status TEXT NOT NULL DEFAULT 'pending',
placed_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
); ON DELETE — three actions
Safest
Delete fails with an error if child rows exist. Forces the app to handle children first. Almost always the right choice.
Convenient but dangerous
Deleting one parent silently wipes all children. Use only when children are meaningless without the parent (e.g. order_items without an order).
Keeps children, orphans them
Sets the FK column to NULL. The child survives but loses the relationship. Rarely the right default — consider soft-deletes instead.
Delete bottom-up
Explicitly delete children first, then the parent. Works regardless of FK action and makes intent obvious in code.
-- Syntax with ON DELETE
CREATE TABLE order_items (
order_id INTEGER NOT NULL
REFERENCES orders(id) ON DELETE CASCADE,
product_id INTEGER NOT NULL
REFERENCES products(id) ON DELETE RESTRICT,
quantity INTEGER NOT NULL,
unit_price NUMERIC(10,2) NOT NULL,
PRIMARY KEY (order_id, product_id)
); The circular reference problem
If A references B and B references A, neither can be inserted first. Solutions: make one FK nullable (fill it after both exist), or use PostgreSQL DEFERRABLE constraints that check at COMMIT instead of per-statement.
-- Option 1: nullable FK, update after
CREATE TABLE departments (
id INTEGER PRIMARY KEY, name TEXT NOT NULL,
manager_id INTEGER -- nullable; FK added after employees table exists
);
CREATE TABLE employees (
id INTEGER PRIMARY KEY, name TEXT NOT NULL,
department_id INTEGER NOT NULL REFERENCES departments(id)
);
ALTER TABLE departments ADD CONSTRAINT fk_mgr
FOREIGN KEY (manager_id) REFERENCES employees(id);
-- Insert order: department (manager NULL) -> employee -> update department
INSERT INTO departments (id, name) VALUES (1, 'Engineering');
INSERT INTO employees (id, name, department_id) VALUES (10, 'Ada', 1);
UPDATE departments SET manager_id = 10 WHERE id = 1;
-- Option 2 (PostgreSQL): deferred constraints
ALTER TABLE departments ALTER CONSTRAINT fk_mgr DEFERRABLE INITIALLY DEFERRED;
BEGIN;
INSERT INTO employees (id, name, department_id) VALUES (10, 'Ada', 1);
INSERT INTO departments (id, name, manager_id) VALUES (1, 'Eng', 10);
COMMIT; -- constraint checked here, both rows now exist Normalisation
Normalisation structures tables so each fact is stored exactly once. It prevents the update/delete anomalies from the flat-table example in Chapter 05. The smell to recognise: "if I update this value in one place, do I have to update it somewhere else too?"
1NF — atomic values, no repeating groups
Every cell contains a single value (not a list). Violation: comma-separated phone numbers in one column.
-- NOT 1NF: phones column holds a list
CREATE TABLE contacts_bad (
user_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
phones TEXT -- '555-1234,555-5678'
);
-- 1NF: separate table for phones
CREATE TABLE contacts (
user_id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE phones (
user_id INTEGER NOT NULL REFERENCES contacts(user_id),
phone TEXT NOT NULL,
PRIMARY KEY (user_id, phone)
);
-- Now: WHERE phone = '555-1234' uses an index correctly 2NF — every non-key column depends on the whole key
Only matters for composite keys. If a column depends on part of the key, split it out.
-- NOT 2NF: product_name depends only on product_id, not the whole PK
CREATE TABLE order_items_bad (
order_id INTEGER,
product_id INTEGER,
product_name TEXT, -- partial dependency on product_id alone
quantity INTEGER,
PRIMARY KEY (order_id, product_id)
);
-- 2NF: product_name lives in the products table (depends on product_id PK)
-- order_items keeps only (order_id, product_id, quantity, unit_price) 3NF — no transitive dependencies
Every non-key column depends directly on the PK, not via another non-key column. Violation: storing category_name alongside category_id in products — name depends on category_id, which depends on the product PK.
-- NOT 3NF: category_name depends on category_id, not product PK directly
CREATE TABLE products_bad (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category_id INTEGER NOT NULL,
category_name TEXT NOT NULL -- transitive dependency
);
-- 3NF: extract into its own table
CREATE TABLE categories (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category_id INTEGER NOT NULL REFERENCES categories(id)
); When to intentionally denormalise
Read performance is critical
A joined query runs 50x/second and the join is expensive. Storing a computed order_total avoids the join on every read.
Historical accuracy matters
Storing unit_price on order_items because the product's price changes over time. This records a fact about that moment.
Before you measure
If the normalised version is fast enough, denormalisation just adds complexity. Premature denormalisation = premature optimisation.
Materialised views
A precomputed cached query you refresh periodically. Source stays normalised; cache is denormalised. Best of both worlds.
Junction tables for many-to-many
Our order_items is a junction table resolving the many-to-many between orders and products. The PK is the composite of both FKs; extra columns (quantity, unit_price) live here.
CREATE TABLE student_courses (
student_id INTEGER NOT NULL REFERENCES students(id),
course_id INTEGER NOT NULL REFERENCES courses(id),
enrolled_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (student_id, course_id)
);
-- PK prevents duplicate enrolments. Extra columns live on the junction. CREATE TABLE: name columns, pick types, declare constraints inline or at end.- Constraints: PK (one per table), UNIQUE, NOT NULL, CHECK (business rules), DEFAULT (sane fallbacks).
- Foreign keys:
REFERENCES parent(pk). Default ON DELETE RESTRICT is safest. CASCADE for true child records. - Normalise to 3NF by default: each fact stored once. Denormalise only after measuring a real performance need.
- Junction tables resolve many-to-many. Their PK is the composite of both FKs.
← Chapter 06 — Writing Data · → Continue to Chapter 08 — Indexes & Performance →