System Design · Building Blocks · Caching

Caching Strategies

Storing data closer to where it is needed to reduce latency and load.

01
Chapter One

What Caching Is

Doing the Same Work Once

Your database is doing the same work millions of times. The same product page read by thousands of users a minute. The same user profile fetched on every API request. The same configuration value queried on every webhook. Caching is the discipline of doing that work once and reusing the result — trading a small risk of staleness for an enormous reduction in latency and load.

A cache is a faster, smaller storage layer that sits between a requester and the original data source. Everything from the L1 cache on your CPU to a CDN edge in São Paulo is the same idea applied at a different scale. The hierarchy below is why caching works at all: each level is roughly 10–1000× slower than the one above it. A cache hit one level up pays for many cache misses below.

What a Cache Buys You

Speed: orders of magnitude lower latency.

Reduced load: origin data store sees a fraction of the requests.

Resilience: a warm cache lets you survive brief origin outages.

Cost: 1 GB of Redis is dramatically cheaper than the equivalent DB throughput.

⚠️

What a Cache Costs You

Staleness: the cache and the source can disagree, sometimes for surprising amounts of time.

Complexity: invalidation logic, key design, eviction tuning, monitoring.

Consistency hazards: cache-aside race conditions, stampedes, key collisions.

Operational burden: another stateful service to run, monitor, fail over.

The Storage Hierarchy — Why Caching Works
L1 Cache ~1 ns · ~64 KB L2 Cache ~10 ns · ~1 MB RAM ~100 ns · GB SSD ~100 μs · TB Network (same DC) ~1 ms Database / Cross-DC ~10–100 ms faster smaller slower larger

The two hardest problems in computer science are naming things, cache invalidation, and off-by-one errors. The middle one is genuinely hard. Plan for it from day one — do not retrofit invalidation onto a working cache.

📋 Chapter 1 — Summary
  • A cache is a faster, smaller storage layer between a requester and the source of truth.
  • Each level of the storage hierarchy is 10–1000× slower than the one above — a cache hit one level up is hugely valuable.
  • Caching trades potential staleness for speed, reduced load, and resilience.
  • Invalidation is the hardest problem in caching. Design for it before you ship.
02
Chapter Two

How Caches Work Internally

Hits, Misses, and the Hit Ratio

A cache hit is the entire point: data found in cache, returned in microseconds, source untouched. A miss is the slow path: query the source, store the result, return it. The interesting metric is the hit ratio — the percentage of requests served from cache. A healthy production cache sits at 80–99% depending on the workload. Below 50%, you are paying the cost of cache complexity without getting the benefit; either the working set doesn't fit, the eviction policy is wrong, or you are caching the wrong layer.

Write Policies — How Updates Flow Through

Where you write to determines what staleness, durability, and latency look like. Three policies cover almost every real system. Pick consciously.

Three Write Policies — Cache vs Database
Write-Through App Cache (sync) Database (sync) Pro: cache always consistent Con: write latency = DB latency Use: consistency-critical paths e.g. user balances Write-Behind App Cache (sync) async Database (deferred) Pro: very fast writes Con: data loss if cache dies Use: high-throughput, tolerable loss e.g. metrics, analytics counters Write-Around App skip cache on write Cache read-only DB Database (write target) Pro: no cache pollution Con: first read after write is slow Use: write-heavy, read-rare data e.g. audit logs, archival
Eviction Policies — What to Throw Away First

Caches have finite memory. When the cache is full and a new entry needs space, the eviction policy decides who dies. The policy you pick should match your access pattern; a default LRU on a workload that wants LFU is just slow with extra steps.

🕑

LRU — Least Recently Used

How: evict the item not accessed for the longest time.

Best for: general-purpose workloads with temporal locality (recently-accessed items likely accessed again).

Worst for: scan-heavy workloads — one big sequential read flushes the cache.

📊

LFU — Least Frequently Used

How: evict the item with the fewest accesses overall.

Best for: frequency-skewed workloads — a small set of items dominates traffic.

Worst for: shifting popularity — old hot items linger; new hot items can't break in.

⏲️

TTL — Time To Live

How: evict after a fixed time, regardless of access.

Best for: time-sensitive data: sessions, tokens, rate-limit counters, cached API responses with known freshness windows.

Worst for: highly-accessed data with no natural freshness ceiling — you evict hot keys for no reason.

FIFO exists. It is rarely the right answer. LRU is the safe default, LFU wins for skewed workloads, TTL wins for time-bounded data — and most production systems combine LRU with TTL.

📋 Chapter 2 — Summary
  • Hit ratio is the headline metric — aim for 80–99%; below 50% the cache is hurting more than helping.
  • Write-through trades latency for consistency; write-behind trades durability for speed; write-around avoids cache pollution at the cost of cold reads.
  • LRU is the default, LFU for frequency-skewed access, TTL for anything time-bounded.
  • Match the policy to the workload — don't take defaults on faith.
03
Chapter Three

When to Use — and When Not To

Both Sides of the Decision

Caching is the engineer's favourite tool because it is conceptually simple and the wins look enormous in benchmarks. The real-world wins come with strings attached: invalidation logic, security boundaries, and the temptation to use a cache to paper over a problem that should be fixed at the source.

USE Caching When…

Read-heavy and shared. Same data read repeatedly by many users.

Expensive queries on stable data. A 200ms aggregation that changes hourly belongs in a cache.

Expensive computation with repeating inputs. Memoization in front of CPU-bound work.

External APIs with rate limits or high latency. Cache responses; respect the provider's freshness contract.

Session and config data. Touched on every request; can't live in the DB's hot path.

DO NOT Cache When…

Data changes on every read. Every miss is the only path. The cache adds work.

Every user sees unique data. Personalised result sets generated from scratch — near-zero hit ratio.

Strict real-time correctness. Money in flight, inventory holds, security tokens. Use the source of truth.

Working set is enormous. If the dataset is so large that hit ratio is permanently low, you're paying for memory that doesn't pay back.

You're hiding a slow query. Fix the query. Then cache.

💣

The Cache Poisoning Risk

If an attacker can write to your cache — through user input that ends up cached, untrusted upstream data, or shared keyspace — they can serve malicious content to every user that hits the same key.

Mitigation: validate everything at write time, namespace keys by trust boundary, never cache untrusted HTML or scripts, and keep authentication state out of shared cache entries.

🔍

The "I'll Cache This" Reflex

When a query is slow, the engineer's instinct is to cache it. Sometimes that's right. More often, the query is missing an index, doing N+1, or scanning a table that should be partitioned.

Mitigation: profile first. If the query plan is bad, fix the plan. A cache in front of a broken query just delays the moment of failure to the next cache miss spike.

Caching a slow query is not fixing it. It is hiding it. Fix the query first — index, denormalize, partition, rewrite. Then add a cache to absorb the steady-state load. Caches in front of unfixed queries are landmines waiting for a cold start.

📋 Chapter 3 — Summary
  • Cache shared, read-heavy data and expensive computations — not unique-per-user, real-time, or already-fast paths.
  • Strict correctness paths (money, inventory, auth) must read the source of truth, not the cache.
  • Cache poisoning is a real attack surface — validate writes; namespace by trust boundary.
  • Fix slow queries before caching them. A cache hides a problem; it does not solve it.
04
Chapter Four

Trade-offs & Comparisons

Redis vs Memcached — The 2026 Verdict

In 2010 this was a real debate. In 2026 it is mostly settled. Redis is the default for nearly every modern caching workload because it does everything Memcached does plus a long list of features you will eventually want. Memcached still wins one specific niche: extreme raw throughput for tiny string values where you don't need any of Redis's extras.

🔹

Redis

Data structures: strings, lists, sets, sorted sets, hashes, streams, bitmaps, HyperLogLog, geo.

Persistence: RDB snapshots + AOF append-only log — can survive restarts.

Replication: primary/replica + Redis Sentinel + Redis Cluster for sharding.

Extras: Lua scripting, pub/sub, transactions (MULTI/EXEC), streams.

Threading: single-threaded for commands (predictable latency); multi-threaded I/O.

Use: almost everything. Default choice.

◻️

Memcached

Data structures: strings only. Opaque blobs.

Persistence: none — restart loses everything.

Replication: none built in. Sharding is client-side consistent hashing.

Extras: almost none. Get, set, delete, increment.

Threading: multi-threaded — better raw throughput per box on multi-core hardware.

Use: simple K/V at extreme throughput when you want zero features.

Redis vs Memcached is not a hard decision in 2026. Use Redis. The only exception is extreme throughput requirements on simple string values where Memcached's threading model wins by a few percent — and you don't need any data structure beyond get/set.

Local Cache vs Distributed Cache

An in-process cache is a hash map living inside your application. Zero network hop, zero serialization, fastest possible. Until your app runs on more than one machine — at which point each instance has its own copy and they disagree. A distributed cache (Redis, Memcached) costs you a 1ms network hop, but every backend sees the same view.

📚

Local (In-Process)

Speed: nanoseconds — no network, no serialization.

Consistency: per-instance. Server A's view diverges from Server B's.

Memory: shared with your app heap — poisons GC if oversized.

Use: immutable lookups (config, feature flags), or per-instance computation memoization.

Examples: Caffeine (Java), Guava, lru-cache (Node).

🌐

Distributed (Redis, Memcached)

Speed: ~1 ms per round trip — network bound.

Consistency: all backends see the same value.

Memory: separate service — your app heap is unaffected.

Use: default for multi-instance applications. Sessions, shared computation, cross-service state.

Pattern: often combined with a small local cache for hot keys (two-tier caching).

The Cache Stampede — The Failure Mode That Kills Databases

A popular cache key expires. In the next millisecond, ten thousand requests simultaneously miss the cache. All ten thousand hit the database at once. The database, which was sized for the cached load, falls over. Symptoms: traffic looks normal, then a single moment of cache eviction takes the entire system down. This is the classic thundering-herd problem applied to caching, and the solutions are well-known and rarely implemented.

Cache Stampede — Three Scenarios
Normal — Cache Hit N requests Cache (hit) DB sees 0 reqs ✓ healthy Stampede key expired at t=T 10,000 requests Cache (miss) Database 10K simultaneous queries ☠ falls over Mutex on Miss 10,000 requests Cache (miss + lock) 9,999 wait 1 fetches Database DB sees 1 query ✓ survives

Three solutions, in increasing sophistication: (1) mutex/lock on miss — only one request fetches; the rest wait. (2) probabilistic early expiration — recompute the value before TTL with a small random probability per request, so popular keys refresh asynchronously and never all expire at once. (3) background refresh — a worker proactively refreshes hot keys before they expire. Pick the simplest one that fits your traffic.

📋 Chapter 4 — Summary
  • Redis is the default in 2026; Memcached only wins for extreme-throughput simple string workloads.
  • Local caches are nanosecond-fast but per-instance; distributed caches cost ~1 ms but give all backends one view.
  • Cache stampede is the failure mode that takes down databases — design for it.
  • Stampede defenses: mutex on miss, probabilistic early expiration, or background refresh of hot keys.
05
Chapter Five

Production Patterns & Common Mistakes

The Two Patterns You Will Actually Use

Despite a textbook full of caching patterns, two cover almost every production system: cache-aside and read-through. The third pattern most teams need but don't implement is cache warming — the difference between a graceful deployment and a database that catches fire every time you ship.

🔁

Cache-Aside (Lazy Loading)

Flow: app checks cache → on miss, fetches from DB → writes back to cache → returns to caller.

Pro: only caches what is actually used. Cache stays small. Failure of cache doesn't take down reads.

Con: first request for any key is always slow. Stale on writes unless invalidated.

Use: the default. ~80% of production caches are cache-aside.

📖

Read-Through

Flow: app talks only to cache → cache fetches from DB on miss internally → returns to app.

Pro: simpler app code — one data access path.

Con: tighter coupling between cache and DB. Cold start hurts harder — the cache is also the connection pool.

Use: when the cache is a managed service that supports it (e.g. AWS DAX in front of DynamoDB).

🔥

Cache Warming

Why it matters: after a deployment, restart, or failover, your cache is empty and your DB is about to take 10× its normal load.

Strategies: (1) replay recent traffic logs to pre-populate hot keys; (2) gradual traffic shift from old to new instance; (3) snapshot/restore the cache itself; (4) backend pre-fetch of known hot keys at startup.

Verdict: mandatory for any system whose DB cannot handle 100% miss-rate traffic.

🔑

Key Design Discipline

Namespace everything: user:{id}:profile, not {id}. Avoids collisions across types and makes wildcard cleanup possible.

Version the schema: user:v2:{id}:profile — lets you deploy a new format without invalidating the old data.

Avoid encoding user data in keys: never put raw user input in a key without sanitization.

The Five Mistakes That Break Production Caches
♾️

Mistake 1 — No TTL on Entries

Stale data served indefinitely; cache grows until OOM. Fix: set a TTL on every entry, even “immutable” data — immutable today, schema-changed tomorrow.

🔐

Mistake 2 — Caching User-Specific Data Globally

User A's response cached under a non-user-scoped key; User B retrieves it. Privacy breach + correctness disaster. Fix: include user/tenant ID in the cache key for any user-scoped data.

🐎

Mistake 3 — No Stampede Protection

Hot key expires; thousands of misses hit the DB at once; database crashes. Fix: mutex on miss, probabilistic early expiration, or pre-refresh hot keys in background.

🗄️

Mistake 4 — Cache as Primary Store

No persistence on the cache; eviction or restart loses data. Customers report missing orders. Fix: the source of truth is the database. The cache is a hint.

💥

Mistake 5 — Key Collisions

Different data types share the same key namespace; one overwrites another. Fix: prefix all keys by type: user:, session:, cart:. Audit the keyspace as code.

📊

Bonus — No Hit-Rate Monitoring

Hit ratio drops from 95% to 40% after a deploy and nobody notices until the DB starts smoking. Fix: alarm on hit-ratio thresholds. Treat it as a first-class SLI.

Almost every catastrophic cache outage I have seen comes back to one of these five. They are not subtle bugs — they are missing discipline. Treat your cache configuration the way you treat your database schema: code-reviewed, versioned, monitored, and tested.

📋 Chapter 5 — Summary
  • Cache-aside is the default; read-through for managed caches; cache warming for any deploy that risks 100% miss rate.
  • Always namespace and version cache keys — treat the keyspace as a schema.
  • Never cache user-scoped data under a global key. Always include tenant or user ID.
  • The five outage mistakes: no TTL, user data in global keys, no stampede protection, cache-as-store, key collisions. Audit for all five.
  • Hit ratio is an SLI. Alarm on it. The first sign of trouble shows up here long before it shows up at the DB.