System Design · Case Studies

Case Study: Distributed Cache

Design, trade-offs, and alternatives for a distributed cache at scale.

Chapter One

Problem Statement

What We Are Building

A distributed cache stores frequently accessed data in memory across multiple servers, reducing database load and serving responses in microseconds instead of milliseconds. The challenge is not caching a single value — it is maintaining consistency across a cluster of cache nodes, handling node failures without data loss storms, distributing keys evenly, and invalidating stale data without serving incorrect results. A cache that serves stale data can be worse than no cache at all.

Scale Requirements

Traffic & Scale

10M requests/sec to cache cluster
50TB+ data cached across cluster
1000+ cache nodes in production
Sub-millisecond latency: p99 < 1ms

Requirements

Hit rate: >95% (misses hit database)
Availability: 99.99% — cache down = database overwhelmed
Eviction: graceful under memory pressure (no OOM crashes)
Consistency: stale data bounded (configurable TTL)

A distributed cache is infrastructure that OTHER systems depend on. When your cache goes down, it doesn't just affect one feature — it cascades. Every service that relied on cached responses now hits the database directly. 10M req/sec that were served from cache now flood the DB. The database can't handle 100x its normal load. Result: cache failure = system-wide outage. This is the "thundering herd" or "cache stampede" problem — and it makes cache availability more critical than almost any other component.

📋 Chapter 1 — Summary

10M requests/sec, 50TB+ cached data, 1000+ nodes. Sub-millisecond latency.
Cache failure cascades: 95% of traffic that was cached now hits DB → DB melts.
Hit rate >95%. Stale data bounded by TTL. Graceful eviction under memory pressure.
Cache is infrastructure — its failure is everyone's failure.

Chapter Two

Questions to Ask

Clarifying Before Designing

📦

Data Model

Key-value only? Or complex structures (lists, sets, hashes)?
Average value size? (1KB session vs 1MB page fragment)
Hot key distribution? (Zipfian or uniform?)
TTL per key or global policy?

🔄

Consistency Model

Cache-aside, read-through, or write-through?
How stale can data be? (seconds, minutes, never?)
Active invalidation or TTL-based expiry?
Multi-region consistency needed?

🛡️

Failure Handling

What happens on cache miss? (fallback to DB)
Cache node failure: how fast must recovery be?
Cold start: how to warm cache after restart?
Memory exhaustion: evict or reject new writes?

The cache pattern (aside vs through) determines who owns the write path. Cache-aside: application reads from cache, on miss reads DB and populates cache. Application owns logic. Read-through: cache itself fetches from DB on miss. Write-through: writes go to cache AND DB simultaneously. Write-behind: writes go to cache, async flush to DB. Each trades complexity, consistency, and performance differently.

For This Case Study, Our Answers Are:

Data model: key-value strings + sorted sets (for leaderboards/rate limiting)
Average value size: ~1-10KB (user sessions, API responses, rendered page fragments)
Key distribution: Zipfian — top 1% of keys receive ~50% of traffic (hot key problem expected)
Cache pattern: cache-aside (application controls logic)
Staleness tolerance: up to 60 seconds for most data; 5 seconds for inventory/pricing
Invalidation: TTL-based expiry + active invalidation on writes
Multi-region: yes — separate cache cluster per region, no cross-region consistency
Node failure recovery: replica promotion within 30 seconds
Cold start: gradual traffic rollout (1% → 10% → 100%) + pre-warming from DB snapshot
Eviction policy: LRU (Least Recently Used) — evict cold data under memory pressure

📋 Chapter 2 — Summary

Cache pattern choice (aside/through) determines application complexity and consistency.
Hot keys (Zipfian) need special handling — one popular key can overload a single node.
TTL sets the staleness budget. Too short = low hit rate. Too long = stale data.
Cold start and node failure recovery are operational challenges, not just design ones.

Chapter Three

Naive Design

Single Redis Instance

The simplest design: one Redis server. All application servers read/write to this single instance. Fast (in-memory), simple (one endpoint), and effective up to ~100K requests/sec and ~25GB of data. Beyond that: one server's memory is the ceiling, one server's CPU is the bottleneck, and one server's failure means 100% cache miss rate — every request hits the database simultaneously.

Naive Design — Single Redis Instance

✅

What Works

Simple — one server, no partitioning logic
Atomic operations (INCR, LPUSH) without distributed coordination
Sub-millisecond reads (100K+ ops/sec single-threaded)
Great for <25GB datasets and moderate traffic

💥

What Breaks

Single server memory limit (~25-100GB max practical)
Single point of failure: server dies = all cache gone
Thundering herd: all misses hit DB simultaneously
CPU bound: single-threaded Redis maxes at ~100K complex ops/sec
No horizontal scaling — only vertical

📋 Chapter 3 — Summary

Single Redis: simple, fast, limited. Ceiling at ~25GB and ~100K ops/sec.
SPOF: server failure = 100% miss rate = database flood.
No horizontal scaling. Only solution to more data/traffic is sharding.

Chapter Four

Refined Design

Consistent Hashing with Replication

The refined design distributes keys across multiple cache nodes using consistent hashing. Each key maps to a position on a hash ring, and the node responsible for that range serves it. When a node fails, only its keys are redistributed — not the entire cache. Replication (each key stored on 2-3 nodes) ensures that a single node failure doesn't cause a miss storm. A client-side library handles routing, failover, and connection pooling transparently.

Refined Design — Distributed Cache Cluster

🔗

Key Distribution

Consistent hashing: key → hash → ring position → owning node
Virtual nodes (vnodes): 100-200 per physical node for even distribution
Adding a node: only K/N keys migrate (not all keys reshuffled)
Removing a node: its keys taken by next node on ring

🛡️

Failure Handling

Replication factor 2-3: each key on multiple nodes
Node failure: client routes to replica automatically
No thundering herd: keys still served from surviving replicas
Recovery: new node joins, pulls data from replicas

Consistent hashing with virtual nodes is the foundation of every production distributed cache. Without it, adding or removing a node invalidates all keys (modulo-based hashing). With consistent hashing, only 1/N keys are affected. Virtual nodes (multiple positions per physical node) prevent hotspots from uneven distribution. This is why Memcached, Redis Cluster, and DynamoDB all use variations of consistent hashing.

Consistent Hashing: Adding a Node Moves Only K/N Keys

📋 Chapter 4 — Summary

Consistent hashing: key → ring position → node. Minimal redistribution on changes.
Virtual nodes: 100+ positions per physical node. Even key distribution.
Replication: each key on 2-3 nodes. Node failure doesn't cause miss storm.
Client-side routing: intelligent library handles hash, failover, connection pooling.

Chapter Five

Alternative Approaches

Cache Strategies Compared

Cache-Aside (Lazy Loading)

Read-Through / Write-Through

Application checks cache → miss → reads DB → writes to cache
Application owns all logic — cache is passive storage
Only requested data is cached (no wasted memory)
Cache miss = extra latency (DB round-trip + cache write)
Stale data possible: DB updated but cache not invalidated
Used by: most web applications, Memcached patterns

Cache sits between app and DB. Reads/writes go through cache.
Read-through: cache fetches from DB on miss automatically
Write-through: write to cache + DB synchronously (consistent)
Simpler application code — cache handles DB interaction
Write-through: higher write latency (2 writes per operation)
Used by: DynamoDB DAX, Hibernate L2 cache

Cache Pattern Flows — Read Path Comparison

Write-Behind (Async Write)

Refresh-Ahead (Proactive)

Write to cache immediately, async flush to DB in background
Very fast writes — user gets response before DB write
Risk: cache failure before DB write = data loss
Batching: multiple writes aggregated into one DB write
Good for: write-heavy with acceptable data loss risk (counters, analytics)

Proactively refresh cache entries before they expire
Monitor access patterns: if key accessed frequently, refresh early
No cache miss latency for popular keys (always fresh)
Wastes resources on keys that won't be read again
Good for: predictable access patterns, low-staleness requirements

Write-behind is the only cache pattern where cache failure causes data loss. In cache-aside, read-through, and write-through, the database is always up-to-date or can be recovered from. In write-behind, writes go to cache first — if the cache node fails before flushing to the database, those writes are gone. This makes write-behind appropriate only for truly expendable data: view counts, analytics counters, rate limit buckets. Never use write-behind for anything that would cause a business problem if lost.

📋 Chapter 5 — Summary

Cache-aside: app controls logic. Lazy. Only caches what's needed.
Read/write-through: cache handles DB. Simpler app code. Higher consistency.
Write-behind: fast writes, data loss risk on node failure. Only for expendable data (counters, analytics). Never for transactional data.
Refresh-ahead: proactive refresh. No miss latency for hot keys.

Chapter Six

What Real Companies Did

Production Cache Systems

🔴

Redis (OSS + Enterprise)

Redis Cluster: hash slots (16384) distributed across nodes
Single-threaded per shard — predictable latency
Rich data structures: strings, lists, sets, sorted sets, streams
Redis Sentinel for HA failover (replica promotion)
Used by: Twitter, GitHub, Stack Overflow, Instagram

🟢

Memcached

Pure key-value, multi-threaded — high throughput per node
No built-in clustering — client-side consistent hashing
Slab allocator: efficient memory management, no fragmentation
No persistence — pure volatile cache
Used by: Facebook (mcrouter), Wikipedia, YouTube (early)

📘

Facebook TAO + Memcache

mcrouter: custom proxy for billion-request-scale Memcached
Lease mechanism: prevents thundering herd on hot keys
Regional pools: cache per data center, invalidation via message bus
TAO: social graph cache (objects + associations)
Handles 10B+ cache requests/sec across fleet

☁️

Amazon ElastiCache / DAX

ElastiCache: managed Redis or Memcached clusters
DAX: DynamoDB accelerator (read-through cache, microsecond reads)
Auto-scaling, automatic failover, encryption at rest
Multi-AZ replication for high availability
Used by: most AWS-native applications

Production Cache Systems — Comparison

📋 Chapter 6 — Summary

Redis Cluster: hash slots, rich data structures, Sentinel for HA.
Memcached: pure KV, multi-threaded, client-side hashing. Facebook at billion-scale.
Facebook: mcrouter + lease mechanism + regional pools. 10B+ req/sec.
AWS DAX: read-through cache for DynamoDB. Microsecond reads.

Chapter Seven

Best Practices Extracted

Transferable Lessons

🔑

Cache Key Design

Include version in key: user:v2:123
Namespace by service: auth:session:abc
Avoid hot keys: shard popular keys across multiple entries
Key length matters: shorter keys = less memory overhead
Transfers to: any key-value storage design

⚡

Thundering Herd Prevention

Lock on miss: only 1 request fetches from DB, others wait
Stale-while-revalidate: serve stale, refresh in background
Probabilistic early expiry: random jitter prevents synchronized expiry
Lease mechanism: Facebook's approach — token prevents stampede
Transfers to: any system with expensive fallback on miss

📊

Cache Warming

Pre-populate cache before traffic arrives (deploy, restart)
Shadow traffic: replay production reads against new cache
Gradual rollout: route 1% → 10% → 50% → 100% to new cache
Never go cold-to-hot instantly — database can't absorb the surge
Transfers to: any system with hot/cold state transition

Thundering Herd Prevention — Lock-on-Miss Pattern

📋 Chapter 7 — Summary

Key design: versioned, namespaced, short. Avoid hot keys via sharding.
Thundering herd: lock on miss, stale-while-revalidate, probabilistic expiry.
Cache warming: never go cold-to-hot instantly. Gradual rollout + pre-population.
Negative caching: cache "not found" results with short TTL (30s). Prevents penetration attacks from bypassing cache entirely.

Chapter Eight

What Could Go Wrong

Common Failure Patterns

🐘

Cache Stampede

Popular key expires → 10K simultaneous requests all miss
All 10K requests hit DB for same data simultaneously
DB overwhelmed → slow response → more timeouts → cascade
Fix: lock on miss (one fetches, rest wait), never-expire + background refresh, probabilistic early recompute.

Cache Stampede vs Probabilistic Early Expiry

💀

Cache Penetration

Requests for keys that don't exist in DB either
Every request misses cache AND misses DB — no data to cache
Attacker can exploit: request random non-existent IDs → bypass cache
Fix: cache negative results (empty with short TTL), Bloom filter to reject impossible keys before DB query.

Cache Penetration: Bloom Filter as Pre-Query Guard

❄️

Cold Start Avalanche

Cache cluster restarted (deploy, failure) → 0% hit rate
100% of traffic hits DB → DB can't handle 100x normal load
DB goes down → entire system offline
Fix: rolling restart (one node at a time), cache warming from snapshot, gradual traffic shift to new cluster.

🔥

Hot Key Problem

One key (celebrity post, viral product) gets 100K req/sec
Single cache node serving that key becomes CPU/network bottleneck
Other keys on same node get slow responses (noisy neighbor)
Fix: replicate hot keys to multiple nodes, client-side caching for ultra-hot keys, key splitting (key_1, key_2... round-robin).

The most insidious failure is serving stale data silently. Cache stampede and cold start are visible (latency spikes, errors). But serving stale data — outdated prices, old permissions, deleted content — causes silent corruption. Users see wrong information, make decisions on stale data, and nobody gets alerted because the system appears healthy. Monitor staleness: track cache lag vs source-of-truth age.

📋 Chapter 8 — Summary

Stampede: lock on miss + background refresh. Never let popular keys expire synchronously.
Penetration: cache negatives + Bloom filter. Block impossible keys before DB.
Cold start: rolling restart + warming + gradual traffic shift. Never all-at-once.
Hot key: replicate to multiple nodes or split key. No single node for viral content.

← Payment System Search Engine →