Case Study: Distributed Cache
Design, trade-offs, and alternatives for a distributed cache at scale.
Problem Statement
A distributed cache stores frequently accessed data in memory across multiple servers, reducing database load and serving responses in microseconds instead of milliseconds. The challenge is not caching a single value โ it is maintaining consistency across a cluster of cache nodes, handling node failures without data loss storms, distributing keys evenly, and invalidating stale data without serving incorrect results. A cache that serves stale data can be worse than no cache at all.
Traffic & Scale
- 10M requests/sec to cache cluster
- 50TB+ data cached across cluster
- 1000+ cache nodes in production
- Sub-millisecond latency: p99 < 1ms
Requirements
- Hit rate: >95% (misses hit database)
- Availability: 99.99% โ cache down = database overwhelmed
- Eviction: graceful under memory pressure (no OOM crashes)
- Consistency: stale data bounded (configurable TTL)
A distributed cache is infrastructure that OTHER systems depend on. When your cache goes down, it doesn't just affect one feature โ it cascades. Every service that relied on cached responses now hits the database directly. 10M req/sec that were served from cache now flood the DB. The database can't handle 100x its normal load. Result: cache failure = system-wide outage. This is the "thundering herd" or "cache stampede" problem โ and it makes cache availability more critical than almost any other component.
- 10M requests/sec, 50TB+ cached data, 1000+ nodes. Sub-millisecond latency.
- Cache failure cascades: 95% of traffic that was cached now hits DB โ DB melts.
- Hit rate >95%. Stale data bounded by TTL. Graceful eviction under memory pressure.
- Cache is infrastructure โ its failure is everyone's failure.
Questions to Ask
Data Model
- Key-value only? Or complex structures (lists, sets, hashes)?
- Average value size? (1KB session vs 1MB page fragment)
- Hot key distribution? (Zipfian or uniform?)
- TTL per key or global policy?
Consistency Model
- Cache-aside, read-through, or write-through?
- How stale can data be? (seconds, minutes, never?)
- Active invalidation or TTL-based expiry?
- Multi-region consistency needed?
Failure Handling
- What happens on cache miss? (fallback to DB)
- Cache node failure: how fast must recovery be?
- Cold start: how to warm cache after restart?
- Memory exhaustion: evict or reject new writes?
The cache pattern (aside vs through) determines who owns the write path. Cache-aside: application reads from cache, on miss reads DB and populates cache. Application owns logic. Read-through: cache itself fetches from DB on miss. Write-through: writes go to cache AND DB simultaneously. Write-behind: writes go to cache, async flush to DB. Each trades complexity, consistency, and performance differently.
For This Case Study, Our Answers Are:
- Data model: key-value strings + sorted sets (for leaderboards/rate limiting)
- Average value size: ~1-10KB (user sessions, API responses, rendered page fragments)
- Key distribution: Zipfian โ top 1% of keys receive ~50% of traffic (hot key problem expected)
- Cache pattern: cache-aside (application controls logic)
- Staleness tolerance: up to 60 seconds for most data; 5 seconds for inventory/pricing
- Invalidation: TTL-based expiry + active invalidation on writes
- Multi-region: yes โ separate cache cluster per region, no cross-region consistency
- Node failure recovery: replica promotion within 30 seconds
- Cold start: gradual traffic rollout (1% โ 10% โ 100%) + pre-warming from DB snapshot
- Eviction policy: LRU (Least Recently Used) โ evict cold data under memory pressure
- Cache pattern choice (aside/through) determines application complexity and consistency.
- Hot keys (Zipfian) need special handling โ one popular key can overload a single node.
- TTL sets the staleness budget. Too short = low hit rate. Too long = stale data.
- Cold start and node failure recovery are operational challenges, not just design ones.
Naive Design
The simplest design: one Redis server. All application servers read/write to this single instance. Fast (in-memory), simple (one endpoint), and effective up to ~100K requests/sec and ~25GB of data. Beyond that: one server's memory is the ceiling, one server's CPU is the bottleneck, and one server's failure means 100% cache miss rate โ every request hits the database simultaneously.
What Works
- Simple โ one server, no partitioning logic
- Atomic operations (INCR, LPUSH) without distributed coordination
- Sub-millisecond reads (100K+ ops/sec single-threaded)
- Great for <25GB datasets and moderate traffic
What Breaks
- Single server memory limit (~25-100GB max practical)
- Single point of failure: server dies = all cache gone
- Thundering herd: all misses hit DB simultaneously
- CPU bound: single-threaded Redis maxes at ~100K complex ops/sec
- No horizontal scaling โ only vertical
- Single Redis: simple, fast, limited. Ceiling at ~25GB and ~100K ops/sec.
- SPOF: server failure = 100% miss rate = database flood.
- No horizontal scaling. Only solution to more data/traffic is sharding.
Refined Design
The refined design distributes keys across multiple cache nodes using consistent hashing. Each key maps to a position on a hash ring, and the node responsible for that range serves it. When a node fails, only its keys are redistributed โ not the entire cache. Replication (each key stored on 2-3 nodes) ensures that a single node failure doesn't cause a miss storm. A client-side library handles routing, failover, and connection pooling transparently.
Key Distribution
- Consistent hashing: key โ hash โ ring position โ owning node
- Virtual nodes (vnodes): 100-200 per physical node for even distribution
- Adding a node: only K/N keys migrate (not all keys reshuffled)
- Removing a node: its keys taken by next node on ring
Failure Handling
- Replication factor 2-3: each key on multiple nodes
- Node failure: client routes to replica automatically
- No thundering herd: keys still served from surviving replicas
- Recovery: new node joins, pulls data from replicas
Consistent hashing with virtual nodes is the foundation of every production distributed cache. Without it, adding or removing a node invalidates all keys (modulo-based hashing). With consistent hashing, only 1/N keys are affected. Virtual nodes (multiple positions per physical node) prevent hotspots from uneven distribution. This is why Memcached, Redis Cluster, and DynamoDB all use variations of consistent hashing.
- Consistent hashing: key โ ring position โ node. Minimal redistribution on changes.
- Virtual nodes: 100+ positions per physical node. Even key distribution.
- Replication: each key on 2-3 nodes. Node failure doesn't cause miss storm.
- Client-side routing: intelligent library handles hash, failover, connection pooling.
Alternative Approaches
- Application checks cache โ miss โ reads DB โ writes to cache
- Application owns all logic โ cache is passive storage
- Only requested data is cached (no wasted memory)
- Cache miss = extra latency (DB round-trip + cache write)
- Stale data possible: DB updated but cache not invalidated
- Used by: most web applications, Memcached patterns
- Cache sits between app and DB. Reads/writes go through cache.
- Read-through: cache fetches from DB on miss automatically
- Write-through: write to cache + DB synchronously (consistent)
- Simpler application code โ cache handles DB interaction
- Write-through: higher write latency (2 writes per operation)
- Used by: DynamoDB DAX, Hibernate L2 cache
- Write to cache immediately, async flush to DB in background
- Very fast writes โ user gets response before DB write
- Risk: cache failure before DB write = data loss
- Batching: multiple writes aggregated into one DB write
- Good for: write-heavy with acceptable data loss risk (counters, analytics)
- Proactively refresh cache entries before they expire
- Monitor access patterns: if key accessed frequently, refresh early
- No cache miss latency for popular keys (always fresh)
- Wastes resources on keys that won't be read again
- Good for: predictable access patterns, low-staleness requirements
Write-behind is the only cache pattern where cache failure causes data loss. In cache-aside, read-through, and write-through, the database is always up-to-date or can be recovered from. In write-behind, writes go to cache first โ if the cache node fails before flushing to the database, those writes are gone. This makes write-behind appropriate only for truly expendable data: view counts, analytics counters, rate limit buckets. Never use write-behind for anything that would cause a business problem if lost.
- Cache-aside: app controls logic. Lazy. Only caches what's needed.
- Read/write-through: cache handles DB. Simpler app code. Higher consistency.
- Write-behind: fast writes, data loss risk on node failure. Only for expendable data (counters, analytics). Never for transactional data.
- Refresh-ahead: proactive refresh. No miss latency for hot keys.
What Real Companies Did
Redis (OSS + Enterprise)
- Redis Cluster: hash slots (16384) distributed across nodes
- Single-threaded per shard โ predictable latency
- Rich data structures: strings, lists, sets, sorted sets, streams
- Redis Sentinel for HA failover (replica promotion)
- Used by: Twitter, GitHub, Stack Overflow, Instagram
Memcached
- Pure key-value, multi-threaded โ high throughput per node
- No built-in clustering โ client-side consistent hashing
- Slab allocator: efficient memory management, no fragmentation
- No persistence โ pure volatile cache
- Used by: Facebook (mcrouter), Wikipedia, YouTube (early)
Facebook TAO + Memcache
- mcrouter: custom proxy for billion-request-scale Memcached
- Lease mechanism: prevents thundering herd on hot keys
- Regional pools: cache per data center, invalidation via message bus
- TAO: social graph cache (objects + associations)
- Handles 10B+ cache requests/sec across fleet
Amazon ElastiCache / DAX
- ElastiCache: managed Redis or Memcached clusters
- DAX: DynamoDB accelerator (read-through cache, microsecond reads)
- Auto-scaling, automatic failover, encryption at rest
- Multi-AZ replication for high availability
- Used by: most AWS-native applications
- Redis Cluster: hash slots, rich data structures, Sentinel for HA.
- Memcached: pure KV, multi-threaded, client-side hashing. Facebook at billion-scale.
- Facebook: mcrouter + lease mechanism + regional pools. 10B+ req/sec.
- AWS DAX: read-through cache for DynamoDB. Microsecond reads.
Best Practices Extracted
Cache Key Design
- Include version in key:
user:v2:123 - Namespace by service:
auth:session:abc - Avoid hot keys: shard popular keys across multiple entries
- Key length matters: shorter keys = less memory overhead
- Transfers to: any key-value storage design
Thundering Herd Prevention
- Lock on miss: only 1 request fetches from DB, others wait
- Stale-while-revalidate: serve stale, refresh in background
- Probabilistic early expiry: random jitter prevents synchronized expiry
- Lease mechanism: Facebook's approach โ token prevents stampede
- Transfers to: any system with expensive fallback on miss
Cache Warming
- Pre-populate cache before traffic arrives (deploy, restart)
- Shadow traffic: replay production reads against new cache
- Gradual rollout: route 1% โ 10% โ 50% โ 100% to new cache
- Never go cold-to-hot instantly โ database can't absorb the surge
- Transfers to: any system with hot/cold state transition
- Key design: versioned, namespaced, short. Avoid hot keys via sharding.
- Thundering herd: lock on miss, stale-while-revalidate, probabilistic expiry.
- Cache warming: never go cold-to-hot instantly. Gradual rollout + pre-population.
- Negative caching: cache "not found" results with short TTL (30s). Prevents penetration attacks from bypassing cache entirely.
What Could Go Wrong
Cache Stampede
- Popular key expires โ 10K simultaneous requests all miss
- All 10K requests hit DB for same data simultaneously
- DB overwhelmed โ slow response โ more timeouts โ cascade
- Fix: lock on miss (one fetches, rest wait), never-expire + background refresh, probabilistic early recompute.
Cache Penetration
- Requests for keys that don't exist in DB either
- Every request misses cache AND misses DB โ no data to cache
- Attacker can exploit: request random non-existent IDs โ bypass cache
- Fix: cache negative results (empty with short TTL), Bloom filter to reject impossible keys before DB query.
Cold Start Avalanche
- Cache cluster restarted (deploy, failure) โ 0% hit rate
- 100% of traffic hits DB โ DB can't handle 100x normal load
- DB goes down โ entire system offline
- Fix: rolling restart (one node at a time), cache warming from snapshot, gradual traffic shift to new cluster.
Hot Key Problem
- One key (celebrity post, viral product) gets 100K req/sec
- Single cache node serving that key becomes CPU/network bottleneck
- Other keys on same node get slow responses (noisy neighbor)
- Fix: replicate hot keys to multiple nodes, client-side caching for ultra-hot keys, key splitting (key_1, key_2... round-robin).
The most insidious failure is serving stale data silently. Cache stampede and cold start are visible (latency spikes, errors). But serving stale data โ outdated prices, old permissions, deleted content โ causes silent corruption. Users see wrong information, make decisions on stale data, and nobody gets alerted because the system appears healthy. Monitor staleness: track cache lag vs source-of-truth age.
- Stampede: lock on miss + background refresh. Never let popular keys expire synchronously.
- Penetration: cache negatives + Bloom filter. Block impossible keys before DB.
- Cold start: rolling restart + warming + gradual traffic shift. Never all-at-once.
- Hot key: replicate to multiple nodes or split key. No single node for viral content.