SQL · Advanced

Schema Design

Every chapter so far used a schema someone else wrote. Now you design one. CREATE TABLE defines columns and types; constraints (PK, FK, UNIQUE, CHECK, NOT NULL) enforce business rules at the database level; and normalisation tells you when to split a table in two. By the end you'll be able to look at a business requirement and turn it into tables that won't rot.

01
Section One · Defining tables

CREATE TABLE & Data Types

A CREATE TABLE statement declares the table name, its columns (with types), and any inline constraints. The syntax is the same across dialects; the data types diverge at the edges.

CREATE TABLE users (
  id         INTEGER      PRIMARY KEY,
  email      TEXT         NOT NULL UNIQUE,
  full_name  TEXT         NOT NULL,
  created_at TIMESTAMP    DEFAULT CURRENT_TIMESTAMP
);

Common data types

INTEGER / BIGINT

Whole numbers

IDs, counts, quantities. Use BIGINT for tables you expect to exceed 2 billion rows.

NUMERIC(p,s) / DECIMAL

Exact decimals

Money. Never use FLOAT for currency — you'll get rounding errors. NUMERIC(10,2) stores up to 99,999,999.99.

TEXT / VARCHAR(n)

Strings

TEXT is unbounded (PostgreSQL prefers it). VARCHAR(255) is bounded. In PostgreSQL there is no performance difference; in MySQL there is.

BOOLEAN

true / false

PostgreSQL: native BOOLEAN. MySQL: TINYINT(1) storing 0/1. SQLite: no native type — use 0/1 with a CHECK constraint.

TIMESTAMP / TIMESTAMPTZ

Date + time

Always use TIMESTAMPTZ (with timezone) in PostgreSQL. MySQL's TIMESTAMP auto-converts to UTC. SQLite stores as TEXT in ISO-8601.

UUID

128-bit unique ID

Native in PostgreSQL. In MySQL use CHAR(36) or BINARY(16). Great for distributed systems where auto-increment conflicts.

Type coercion traps

PostgreSQL

Strict — errors on mismatch

INSERT INTO users (id) VALUES ('abc') fails. You must cast explicitly: '42'::integer. This catches bugs early.

MySQL

Silently coerces or truncates

Inserting 'abc' into an INTEGER gives 0 with a warning. VARCHAR(5) truncates longer strings in non-strict mode. Enable STRICT_TRANS_TABLES.

SQLite

Dynamic typing

You can insert 'hello' into an INTEGER column. It stores and returns the string unchanged. Use CHECK constraints as guardrails.

Best practice

Develop strict, deploy strict

PostgreSQL's strictness is a feature. If your tests pass there, they'll pass everywhere with small syntax adjustments.

02
Section Two · Declaring rules

Constraints

A constraint is a rule the database enforces at write time. If a write violates it, the statement errors and the row is rejected — no corrupt data ever reaches disk.

PRIMARY KEY

Uniquely identifies each row. Implicitly NOT NULL + UNIQUE. One per table — single column or composite.

-- Single-column PK
CREATE TABLE users (
  id INTEGER PRIMARY KEY,
  email TEXT NOT NULL
);

-- Composite PK
CREATE TABLE order_items (
  order_id   INTEGER NOT NULL,
  product_id INTEGER NOT NULL,
  quantity   INTEGER NOT NULL,
  PRIMARY KEY (order_id, product_id)
);

UNIQUE

No two rows share the same value. NULL handling differs: PostgreSQL/SQLite allow multiple NULLs (they're not "equal"); MySQL treats two NULLs as a duplicate.

CREATE TABLE users (
  id    INTEGER PRIMARY KEY,
  email TEXT NOT NULL UNIQUE
);

NOT NULL

Prevents the three-valued-logic problems from Chapter 02 at the source. If a column should always have a value, declare it NOT NULL.

CREATE TABLE orders (
  id        INTEGER PRIMARY KEY,
  user_id   INTEGER NOT NULL,
  status    TEXT    NOT NULL,
  placed_at TIMESTAMP NOT NULL
);

CHECK

An arbitrary boolean expression each row must satisfy. Use for domain rules beyond what types express.

CREATE TABLE products (
  id       INTEGER PRIMARY KEY,
  name     TEXT NOT NULL,
  price    NUMERIC(10,2) NOT NULL CHECK (price >= 0),
  category TEXT CHECK (category IN ('books', 'merch', 'electronics'))
);

-- Multi-column check
CREATE TABLE events (
  id         INTEGER PRIMARY KEY,
  start_date DATE NOT NULL,
  end_date   DATE NOT NULL,
  CHECK (end_date >= start_date)
);
Dialect note: MySQL 8.0.16+ enforces CHECK constraints. Older versions parse but silently ignore them. Always verify with a deliberate violation on your target version.

DEFAULT

Supplies a value when an INSERT omits the column. Common: CURRENT_TIMESTAMP, 'pending', 0, true.

CREATE TABLE orders (
  id        SERIAL PRIMARY KEY,
  status    TEXT NOT NULL DEFAULT 'pending',
  placed_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

-- Insert without specifying status or placed_at:
INSERT INTO orders DEFAULT VALUES;
-- status = 'pending', placed_at = now()
03
Section Three · Relationships

Foreign Keys

A foreign key says "the value in this column must exist as a PK (or unique) in that other table." It enforces relationships: you can't create an order for a user that doesn't exist, and you can't (by default) delete a user who still has orders.

CREATE TABLE orders (
  id        INTEGER PRIMARY KEY,
  user_id   INTEGER NOT NULL REFERENCES users(id),
  status    TEXT NOT NULL DEFAULT 'pending',
  placed_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

ON DELETE — three actions

ON DELETE RESTRICT (default)

Safest

Delete fails with an error if child rows exist. Forces the app to handle children first. Almost always the right choice.

ON DELETE CASCADE

Convenient but dangerous

Deleting one parent silently wipes all children. Use only when children are meaningless without the parent (e.g. order_items without an order).

ON DELETE SET NULL

Keeps children, orphans them

Sets the FK column to NULL. The child survives but loses the relationship. Rarely the right default — consider soft-deletes instead.

Best practice

Delete bottom-up

Explicitly delete children first, then the parent. Works regardless of FK action and makes intent obvious in code.

-- Syntax with ON DELETE
CREATE TABLE order_items (
  order_id   INTEGER NOT NULL
    REFERENCES orders(id) ON DELETE CASCADE,
  product_id INTEGER NOT NULL
    REFERENCES products(id) ON DELETE RESTRICT,
  quantity   INTEGER NOT NULL,
  unit_price NUMERIC(10,2) NOT NULL,
  PRIMARY KEY (order_id, product_id)
);

The circular reference problem

If A references B and B references A, neither can be inserted first. Solutions: make one FK nullable (fill it after both exist), or use PostgreSQL DEFERRABLE constraints that check at COMMIT instead of per-statement.

-- Option 1: nullable FK, update after
CREATE TABLE departments (
  id INTEGER PRIMARY KEY, name TEXT NOT NULL,
  manager_id INTEGER  -- nullable; FK added after employees table exists
);
CREATE TABLE employees (
  id INTEGER PRIMARY KEY, name TEXT NOT NULL,
  department_id INTEGER NOT NULL REFERENCES departments(id)
);
ALTER TABLE departments ADD CONSTRAINT fk_mgr
  FOREIGN KEY (manager_id) REFERENCES employees(id);

-- Insert order: department (manager NULL) -> employee -> update department
INSERT INTO departments (id, name) VALUES (1, 'Engineering');
INSERT INTO employees (id, name, department_id) VALUES (10, 'Ada', 1);
UPDATE departments SET manager_id = 10 WHERE id = 1;

-- Option 2 (PostgreSQL): deferred constraints
ALTER TABLE departments ALTER CONSTRAINT fk_mgr DEFERRABLE INITIALLY DEFERRED;
BEGIN;
  INSERT INTO employees (id, name, department_id) VALUES (10, 'Ada', 1);
  INSERT INTO departments (id, name, manager_id) VALUES (1, 'Eng', 10);
COMMIT; -- constraint checked here, both rows now exist
Design tip: circular FKs are a code smell. They usually mean the schema is trying to enforce ordering that belongs in application logic. Ask: "can I make one column nullable or remove the constraint entirely?"
04
Section Four · Eliminating redundancy

Normalisation

Normalisation structures tables so each fact is stored exactly once. It prevents the update/delete anomalies from the flat-table example in Chapter 05. The smell to recognise: "if I update this value in one place, do I have to update it somewhere else too?"

1NF — atomic values, no repeating groups

Every cell contains a single value (not a list). Violation: comma-separated phone numbers in one column.

-- NOT 1NF: phones column holds a list
CREATE TABLE contacts_bad (
  user_id INTEGER PRIMARY KEY,
  name    TEXT NOT NULL,
  phones  TEXT  -- '555-1234,555-5678'
);

-- 1NF: separate table for phones
CREATE TABLE contacts (
  user_id INTEGER PRIMARY KEY,
  name    TEXT NOT NULL
);
CREATE TABLE phones (
  user_id INTEGER NOT NULL REFERENCES contacts(user_id),
  phone   TEXT NOT NULL,
  PRIMARY KEY (user_id, phone)
);
-- Now: WHERE phone = '555-1234' uses an index correctly

2NF — every non-key column depends on the whole key

Only matters for composite keys. If a column depends on part of the key, split it out.

-- NOT 2NF: product_name depends only on product_id, not the whole PK
CREATE TABLE order_items_bad (
  order_id     INTEGER,
  product_id   INTEGER,
  product_name TEXT,       -- partial dependency on product_id alone
  quantity     INTEGER,
  PRIMARY KEY (order_id, product_id)
);

-- 2NF: product_name lives in the products table (depends on product_id PK)
-- order_items keeps only (order_id, product_id, quantity, unit_price)

3NF — no transitive dependencies

Every non-key column depends directly on the PK, not via another non-key column. Violation: storing category_name alongside category_id in products — name depends on category_id, which depends on the product PK.

-- NOT 3NF: category_name depends on category_id, not product PK directly
CREATE TABLE products_bad (
  id            INTEGER PRIMARY KEY,
  name          TEXT NOT NULL,
  category_id   INTEGER NOT NULL,
  category_name TEXT NOT NULL  -- transitive dependency
);

-- 3NF: extract into its own table
CREATE TABLE categories (
  id   INTEGER PRIMARY KEY,
  name TEXT NOT NULL UNIQUE
);
CREATE TABLE products (
  id          INTEGER PRIMARY KEY,
  name        TEXT NOT NULL,
  category_id INTEGER NOT NULL REFERENCES categories(id)
);

When to intentionally denormalise

Denormalise when...

Read performance is critical

A joined query runs 50x/second and the join is expensive. Storing a computed order_total avoids the join on every read.

Denormalise when...

Historical accuracy matters

Storing unit_price on order_items because the product's price changes over time. This records a fact about that moment.

Don't denormalise...

Before you measure

If the normalised version is fast enough, denormalisation just adds complexity. Premature denormalisation = premature optimisation.

Alternative

Materialised views

A precomputed cached query you refresh periodically. Source stays normalised; cache is denormalised. Best of both worlds.

Junction tables for many-to-many

Our order_items is a junction table resolving the many-to-many between orders and products. The PK is the composite of both FKs; extra columns (quantity, unit_price) live here.

CREATE TABLE student_courses (
  student_id  INTEGER NOT NULL REFERENCES students(id),
  course_id   INTEGER NOT NULL REFERENCES courses(id),
  enrolled_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (student_id, course_id)
);
-- PK prevents duplicate enrolments. Extra columns live on the junction.
Mental shortcut for the whole chapter:
  • CREATE TABLE: name columns, pick types, declare constraints inline or at end.
  • Constraints: PK (one per table), UNIQUE, NOT NULL, CHECK (business rules), DEFAULT (sane fallbacks).
  • Foreign keys: REFERENCES parent(pk). Default ON DELETE RESTRICT is safest. CASCADE for true child records.
  • Normalise to 3NF by default: each fact stored once. Denormalise only after measuring a real performance need.
  • Junction tables resolve many-to-many. Their PK is the composite of both FKs.

Chapter 06 — Writing Data  ·  → Continue to Chapter 08 — Indexes & Performance →