System Design · Building Blocks · NoSQL Databases

NoSQL Databases

Document, key-value, column-family, and graph — when and why.

Chapter One

What NoSQL Actually Means

Not a Replacement — a Family of Specialists

NoSQL does not mean “no SQL.” It does not mean “better than SQL.” It means “not only SQL” — an umbrella term for a collection of database models, each optimised for an access pattern that relational databases handle poorly or expensively. The category exists because at extreme scale, or for naturally non-tabular data, the trade-offs that ACID forces become genuinely painful. The choice is not relational vs NoSQL; it is one specific access pattern vs another.

Four families dominate the NoSQL landscape, and they are not interchangeable. Each was built around a specific data shape and query model. Picking the wrong family is worse than using a relational database for the same workload — you get all the operational complexity of NoSQL with none of the wins.

📄

Document — JSON-Shaped Data

Model: self-contained JSON/BSON documents in collections.

Strength: flexible schema, hierarchical data, fast prototyping.

Examples: MongoDB, CouchDB, Firestore, DocumentDB.

Canonical use: content management, product catalogs, user profiles with custom fields.

🔑

Key-Value — The Hash Map

Model: opaque value indexed by a key. Simplest possible.

Strength: O(1) reads/writes, extreme throughput.

Examples: Redis, DynamoDB, Riak, etcd.

Canonical use: sessions, caches, leaderboards, feature flags.

📊

Column-Family — Wide Rows at Scale

Model: rows with dynamic columns; partitioned across nodes.

Strength: massive write throughput, geographic distribution, time-series.

Examples: Cassandra, HBase, ScyllaDB, Bigtable.

Canonical use: IoT telemetry, event logs, fraud detection feature stores.

🕸️

Graph — Relationships First

Model: nodes (entities) and edges (relationships) as first-class citizens.

Strength: deep traversals, pattern matching, multi-hop queries.

Examples: Neo4j, Amazon Neptune, ArangoDB, JanusGraph.

Canonical use: social graphs, recommendations, fraud rings, knowledge graphs.

The Four NoSQL Families — Different Shapes for Different Problems

The common thread across all four families: trade ACID guarantees and SQL flexibility for one of {scale, performance, schema flexibility, relationship depth}. Pick the family that matches your access pattern. The wrong NoSQL is worse than a relational database. The right NoSQL is unbeatable.

📋 Chapter 1 — Summary

NoSQL = “not only SQL” — an umbrella for specialised database models, not a replacement for relational.
Four families: Document (flexible schema), Key-Value (extreme throughput), Column-Family (write-heavy at scale), Graph (relationships).
Each trades ACID and query flexibility for one specific advantage.
The wrong family for your access pattern is worse than just using PostgreSQL.

Chapter Two

How They Work Internally

Document Databases — Embed or Reference?

Document databases store self-contained JSON or BSON documents in collections. There is no enforced schema; each document can have different fields. You can build secondary indexes on any field, and aggregation pipelines let you do analytics-style queries without joins. The fundamental design decision — the one that makes or breaks a MongoDB schema — is embed vs reference.

Embed when data is accessed together and the relationship is one-to-few: an order with its line items, a blog post with its tags. Reference when data is accessed independently or the relationship is one-to-many or many-to-many: a user's posts (could be thousands), a product's reviews. Get this wrong and you either fetch a 10 MB document for every page view, or do application-layer joins that defeat the entire model.

Key-Value Stores — The Partition Key Is Everything

Key-value stores are conceptually a hash map, distributed across many nodes. The value is opaque to the database — you cannot query “all users named Alice” because the database does not know what is inside the value. Reads and writes are O(1) by the key. The interesting design choice is partitioning: data is split across nodes by hashing the key, and the right key choice is the difference between linear scaling and a hot partition that takes down a node.

DynamoDB extends the model with a composite key: a partition key (which node holds it) plus a sort key (ordering within the partition). This unlocks range queries within a partition — e.g. “all events for user 42 between yesterday and today” — without giving up O(1) partition routing.

Column-Family — Designing for the Query, Not the Data

In Cassandra and its kin, you do not design a data model. You design a query model. Start with the queries you need to serve and work backwards to the table layout. Joins do not exist. Transactions barely exist. What exists is partition keys (which node holds the data), clustering columns (sort order within a partition), and wide rows where a single row can contain millions of columns. Time-series workloads love this shape because every reading for a sensor goes into the same wide row, sorted by time, on the same node.

Cassandra Write Path — Why Writes Are So Fast

Why Cassandra writes are blazing fast: every write hits an in-memory memtable and an append-only commit log on disk — sequential writes only, no seeks. The memtable is periodically flushed to immutable SSTables; a background compaction merges them. Reads are slower because they may have to consult multiple SSTables plus the memtable, but writes are essentially as fast as the disk can append.

🔗

Embed vs Reference (Document DBs)

Embed when: data accessed together; one-to-few; bounded growth.

Reference when: independent access; one-to-many or many-to-many; potentially unbounded.

Anti-pattern: embedding an unbounded array (comments on a viral post) — documents have a 16 MB limit in MongoDB.

🔑

Partition Key Design (Cassandra/DynamoDB)

Goal: spread writes evenly across nodes; keep related data on one node for fast reads.

Good keys: high-cardinality, evenly distributed (UUIDs, hashed user IDs).

Bad keys: sequential IDs, timestamps, country codes — create hot partitions.

In Cassandra, you don't design a data model. You design a query model. Start with the query, work backwards to the table. The same row, the same query, the same node — that's the entire performance philosophy.

📋 Chapter 2 — Summary

Document DBs: embed for accessed-together data, reference for independent access; mind document size limits.
Key-Value: O(1) by key; values are opaque; partition key choice determines scaling shape.
Cassandra: commit log + memtable + SSTable architecture — writes are sequential and very fast.
Design for queries first; data model second. NoSQL data modelling rewards thinking like the database, not the domain.

Chapter Three

When to Use — and When Not To

Per-Family Decision Guide

There is no single “use NoSQL when…” rule, because NoSQL is not one thing. The decision is per-family: each has its own sweet spot and its own anti-pattern. The most common mistake is treating “NoSQL” as a single choice and picking based on team familiarity rather than access pattern.

📄

Document — USE / NOT USE

USE when: data is naturally document-shaped; schema evolves rapidly; hierarchical data accessed as a unit; rapid prototyping.

DO NOT use when: complex relational queries with joins; cross-document transactions are critical; heavy aggregation at very large scale.

Tell: if you find yourself doing joins in application code, you picked wrong.

🔑

Key-Value — USE / NOT USE

USE when: simple lookup by known key; sessions, caches, leaderboards; extreme throughput needed.

DO NOT use when: need to query by value; range queries without composite-key support; complex relationships between entities.

Tell: if you need a secondary index on every other field, you picked wrong.

📊

Column-Family — USE / NOT USE

USE when: time-series data; write-heavy at massive scale; query patterns are known and fixed; geographic distribution required.

DO NOT use when: ad-hoc queries; strong consistency required; small datasets where operational overhead outweighs benefit.

Tell: if you can't enumerate every query at design time, you picked wrong.

🕸️

Graph — USE / NOT USE

USE when: relationships are the data; social graphs, recommendations, fraud rings, knowledge graphs.

DO NOT use when: data is not naturally a graph; relationships are shallow (1–2 hops — SQL JOIN handles this fine).

Tell: if your queries are mostly “get one entity by ID,” you picked wrong.

The selection criterion is simple: identify your top three queries. If they are deep traversals, use a graph database. If they are key lookups, use key-value. If they are fixed time-series patterns at massive scale, use column-family. If your data fits naturally as documents and your team values flexibility, use document. Otherwise — use a relational database.

📋 Chapter 3 — Summary

Each NoSQL family has a sweet spot — pick by access pattern, not by team familiarity.
Document for flexible schema, Key-Value for throughput, Column-Family for time-series and writes, Graph for deep relationships.
The smell of a wrong choice: doing joins in application code; needing too many secondary indexes; can't predict your queries.
Default still applies: if none of the four families is a clear win, relational is the safe choice.

Chapter Four

Trade-offs & Comparisons

Eventual Consistency — The Headline Trade-off

Most NoSQL databases trade strong consistency for availability and partition tolerance — AP in CAP terms. “Eventually consistent” is a precise statement: given no new updates, all replicas will converge to the same value. The catch is in the meantime. In practice, replicas catch up within milliseconds. Worst case — under partition or heavy load — can stretch to seconds or minutes. The implication that surprises engineers: two users may genuinely see different values for the same data at the same moment, and your application has to be designed for that.

Consistency Spectrum — Where Real Databases Sit

MongoDB vs PostgreSQL — Document vs Relational

This is rarely an either/or in modern systems. PostgreSQL with JSONB columns covers most cases where teams reach for MongoDB. The honest distinction is about access shape: do your relationships need to be queried, or just navigated? If the answer is queried — with filters, joins, aggregations — PostgreSQL wins decisively. If the answer is navigated — load a self-contained document, render it — MongoDB's flexibility starts to pay off, especially for early-stage products with churning schemas.

🐘

SQL Wins When…

Complex queries: filters across joined tables, aggregations, window functions.

ACID transactions: money, inventory, anything where partial failure is unacceptable.

Referential integrity: the database enforces what bad code might forget.

Hybrid needs: JSONB gives you flexible documents inside a relational row.

📄

NoSQL Wins When…

Document-shaped access: load one self-contained thing, render it.

Massive write throughput: horizontally scaled writes from day one (Cassandra, DynamoDB).

Schema churn: early-stage products where the shape changes weekly.

Naturally non-tabular: graphs, time-series, deeply nested hierarchies.

DynamoDB at Scale — The Unforgiving Model

DynamoDB delivers single-digit millisecond reads at any scale. The catch is that data modelling is unforgiving: pick the wrong partition key and you get a hot partition that throttles every write. Change your access pattern after the table is built and you may have to redesign the entire schema. Global Secondary Indexes (GSIs) let you query by alternate keys, but they are eventually consistent, cost extra, and have their own partition keys to design around. Single-table design — cramming multiple entity types into one table with carefully constructed composite keys — is powerful but requires discipline most teams underestimate.

Eventual consistency is not a bug to work around — it is a contract the database has with you. If your code assumes strong consistency on a system that doesn't provide it, you don't have a database problem. You have an architecture problem.

📋 Chapter 4 — Summary

Eventual consistency is the headline NoSQL trade-off — usually milliseconds, occasionally seconds.
MongoDB vs PostgreSQL isn't binary — PostgreSQL JSONB covers the middle ground for most teams.
DynamoDB rewards careful key design and punishes access-pattern changes — redesigns are real.
Spanner / CockroachDB occupy the rare middle: distributed, strong consistency, at the cost of latency and price.

Chapter Five

Production Patterns & Common Mistakes

Two Patterns That Make NoSQL Pay Off

The teams that win with NoSQL embrace two practices that feel wrong if you grew up on relational. First, denormalize on purpose: duplicate data across multiple tables or documents so each query reads from a single place. The cost of duplicated writes is a small multiple; the cost of cross-partition joins is unbounded. Second, set TTLs at write time: most NoSQL stores will auto-expire items, dramatically simplifying lifecycle management for sessions, tokens, event logs, and rate-limit counters.

📄

Pattern: Design for Access Patterns

Principle: denormalize intentionally. Duplicate data so each query touches one partition.

Example: store a user's recent orders inside the user document and in a separate orders table — if both queries matter.

Trade-off: more write logic, more storage, simpler reads, no cross-partition joins.

Mantra: writes are cheap; cross-partition reads are expensive.

⏲️

Pattern: TTL for Auto-Expiry

Principle: set TTL at write time on data with a known lifecycle.

Examples: session tokens (24 h), email verification tokens (1 h), rate-limit counters (1 min), event logs (90 days).

Why it matters: dramatically simplifies cleanup. No cron jobs, no DELETE WHERE, no “why is this table 10 TB?”

Watch: TTL eviction is best-effort — not a security guarantee. Don't rely on it for hard deletion deadlines.

The Five Mistakes That Break NoSQL Systems

👥

Mistake 1 — Mongo as Postgres

Designing normalized schemas with foreign-key-style references; doing application-layer joins in code. Defeats the entire reason to use a document store. Fix: embed where data is accessed together; if you need joins, use Postgres.

🔥

Mistake 2 — Hot Partitions

Sequential IDs or timestamps as partition keys → all writes pile onto one node → throttling. Fix: use UUIDs or hash prefixes; in time-series, bucket by user/device + time window, not by raw timestamp.

🔎

Mistake 3 — Missing Secondary Indexes

Querying by non-key fields without indexes → full table scan → expensive and slow. Fix: identify access patterns up front; create GSIs / secondary indexes for them; accept the storage and write-amplification cost.

⌛

Mistake 4 — Eventual Consistency Surprise

Code assumes “just-written = visible to next read.” Bug shows up only under load and never on a developer's machine. Fix: use strong-consistency reads where required (DynamoDB ConsistentRead=true); design idempotent operations; treat post-write state as eventual.

📉

Mistake 5 — NoSQL as a Performance Fix

Migrating to NoSQL because Postgres is “slow” — but slow Postgres is almost always missing indexes or N+1 queries. Fix: profile first; fix the SQL; only migrate when the relational model is genuinely the wrong fit.

🔗

Bonus — No Cross-Region Plan

Setting up a global table without thinking about multi-region writes → conflict resolution becomes a nightmare. Fix: pick a single writer region, use last-write-wins or CRDTs intentionally, document the consistency contract.

Almost every NoSQL outage I have debugged comes back to one of these five. NoSQL doesn't fail mysteriously — it fails predictably along the boundaries of its model. Learn the model. Respect the model. The database will reward you.

📋 Chapter 5 — Summary

Denormalize on purpose — duplicate data across tables/documents so each query touches one partition.
Use TTL at write time for any data with a known lifecycle — sessions, tokens, logs, counters.
The five outage mistakes: Mongo-as-Postgres, hot partitions, missing indexes, eventual-consistency surprises, NoSQL-as-perf-fix.
Profile before you migrate. Most “we need NoSQL” conversations end with a properly-indexed Postgres query.

← Relational Databases Replication & Sharding →