Case Study: Social Feed
Design, trade-offs, and alternatives for a social news feed at scale.
Problem Statement
A social news feed aggregates posts from everyone a user follows and presents them in a personalized, ranked order. When you open Instagram, Twitter, or LinkedIn, the feed is already there โ hundreds of candidate posts filtered, ranked, and ready within milliseconds. The challenge is not displaying posts โ it is computing a personalized feed for each of 500M users from billions of candidate posts, in under 200ms, while handling the "celebrity problem" where one account has 100M followers and every post must reach all of them.
Traffic & Scale
- 500M daily active users
- Each user follows ~500 accounts on average
- 300M new posts/day (~3,500 posts/sec)
- Feed request: 10B/day (~115K reads/sec)
Requirements
- Feed latency: <200ms (pre-computed or fast assembly)
- Freshness: new post visible in feed within 30 seconds
- Ranking: ML-based relevance, not just reverse chronological
- Celebrity handling: accounts with 100M+ followers
The fundamental question is: when do you compute the feed? You can compute it when the author posts (fan-out on write) or when the reader opens the app (fan-out on read). This single decision determines your entire architecture โ storage model, latency profile, cost structure, and how you handle celebrities. There is no right answer; every production system uses a hybrid.
- 500M DAU, 300M posts/day, 10B feed reads/day.
- Core decision: fan-out on write (pre-compute) vs fan-out on read (compute at request time).
- Celebrity problem: 1 post from a 100M-follower account = 100M feed updates.
- Feed must be ranked (not just chronological), fresh (30s), and fast (<200ms).
Questions to Ask
A Twitter-style reverse-chronological timeline is architecturally simpler than a Facebook-style ranked feed with ML scoring. The questions below determine whether you need a pre-computed cache or a real-time assembly system โ and how much infrastructure the ranking layer requires.
Feed Model
- Reverse chronological or ML-ranked?
- Single feed or multiple (home, explore, following)?
- Content types: text, images, videos, stories?
- Ads mixed into feed? (sponsored posts)
Social Graph
- Follow model (asymmetric) or friend model (symmetric)?
- Average followers per user? Max followers?
- Celebrity accounts? (1M+ followers)
- Groups/communities in addition to follows?
Freshness & Features
- How fast must a post appear in followers' feeds?
- Engagement counters real-time? (likes, comments)
- Seen/read status tracking?
- Infinite scroll pagination or fixed pages?
The celebrity problem is the defining constraint. If your max follower count is 5K (like early Facebook), fan-out on write works perfectly. If one account has 100M followers (like a Twitter celebrity), fan-out on write means one post triggers 100M writes โ taking minutes and costing enormous storage. This is why every production system uses a hybrid: fan-out on write for normal users, fan-out on read for celebrities.
For This Case Study, Our Answers Are:
- Feed model: ML-ranked (not chronological)
- Social graph: follow model (asymmetric) โ like Twitter, not Facebook friends
- Max followers: up to 100M (celebrity accounts exist)
- Content types: text + images + short video
- Freshness SLA: new post visible in followers' feeds within 30 seconds
- Engagement counters: real-time likes/comments count (not pre-computed)
- Pagination: infinite scroll โ cursor-based
- Celebrity threshold: >500K followers โ fan-out on read (no pre-computation)
- Active user definition: seen in last 7 days (for selective fan-out)
- Chronological vs ranked: ranked requires ML scoring layer on read path.
- Follow model (asymmetric) creates the celebrity problem. Friend model (symmetric) doesn't.
- Max follower count determines viability of fan-out on write.
- Freshness SLA (30s vs 5min) affects whether pre-computation can use batch processing.
Naive Design
The simplest design: when a user opens the app, query the database for all accounts they follow, fetch recent posts from each, merge and sort by timestamp, return the top N. This is a pure "pull" model โ nothing is pre-computed. It works beautifully for 1000 users. For 500M users each following 500 accounts, it means 500 queries per feed request ร 115K requests/sec = 57.5M database queries per second just for feed generation. The database melts. Beyond latency, the posts table requires a composite index on (author_id, created_at) across hundreds of billions of rows โ a read-optimized index that becomes increasingly expensive to maintain as write volume grows.
What Works
- Simple โ no pre-computation infrastructure needed
- Always fresh โ queries live data every time
- No storage overhead (no pre-computed feeds)
- Celebrity posts appear instantly (no fan-out delay)
What Breaks
- 500 queries per feed request โ high latency (2-5s)
- 57M+ DB queries/sec โ database cannot handle this
- Ranking requires fetching all posts then scoring โ slow
- Caching helps but invalidation is complex (new posts constantly)
- User experience: slow feed load โ users leave
- Fan-out on read: compute feed at request time. Simple but slow and expensive.
- 500 queries per feed ร 115K req/sec = unsustainable DB load.
- Latency 2-5 seconds: unacceptable for mobile feed refresh.
- Advantage: always fresh, no storage cost, celebrities handled naturally.
Refined Design
The refined design uses fan-out on write for normal users (pre-compute feeds when a post is created) and fan-out on read for celebrities (merge their posts at read time). When Alice posts, a fan-out service pushes the post ID into the pre-computed feed cache of each of Alice's followers. When a user opens the app, their feed is already waiting in cache โ just read and return. Celebrity posts are merged on the fly from a small "celebrity posts" list. Result: feed served in <50ms from cache, with celebrity freshness maintained.
Write Path (Fan-Out)
- Author creates post โ stored in Post DB
- Fan-out service gets author's follower list
- If author has <500K followers: push post_id to each follower's cache
- If author has >500K followers: write to "celebrity posts" store only
- Average fan-out: 500 followers ร 3,500 posts/sec = 1.75M cache writes/sec
Read Path (Assembly)
- Read pre-computed feed from Redis cache (list of post_ids)
- Merge in celebrity posts from users this reader follows
- Score/rank merged candidates (ML model)
- Hydrate top-N post_ids with full content from Post Store
- Result: feed in <50ms (cache hit) + 100ms (ranking) = <200ms total
The threshold between fan-out and no-fan-out is a tunable parameter. Facebook uses ~5K followers. Twitter uses a dynamic threshold based on how many followers are currently online. The higher the threshold, the more you pre-compute (faster reads, more storage + write cost). The lower the threshold, the more you compute on read (slower reads, less storage). Most systems start at 500K and tune based on infrastructure capacity.
- Hybrid: fan-out on write for normal users (pre-computed, fast read), fan-out on read for celebrities.
- Feed cache in Redis: list of post_ids per user. Read = single cache lookup.
- Celebrity posts merged on the fly โ only ~10-20 celebrity accounts per user to merge.
- Ranking happens at read time: score merged candidates, return top N.
- Feed latency: <200ms (cache read + merge + rank + hydrate).
Alternative Approaches
The three canonical approaches to feed generation each optimize for different constraints. Pure fan-out on write optimizes for read speed. Pure fan-out on read optimizes for write simplicity. The hybrid approach โ used by every major platform โ trades implementation complexity for the best of both worlds.
- When post created โ write to every follower's feed cache
- Read is a single cache lookup โ O(1) and fast
- Write amplification: 1 post = N writes (N = follower count)
- Celebrity problem: 100M followers = 100M writes per post
- Storage cost: N copies of each post_id across all follower feeds
- Good for: mostly-equal follower counts, fast reads
- When user requests feed โ query all followed accounts' posts
- Read is expensive: N queries per feed (N = followed accounts)
- Write is simple: just store the post once
- No celebrity problem: celebrity post stored once regardless of followers
- Storage efficient: no duplication
- Good for: write-heavy, celebrity-heavy platforms
- Sort by timestamp โ newest first
- Deterministic: no ML model, no A/B testing needed
- Pre-computable: just prepend new posts to the list
- Problem: user misses important posts from close friends
- Users who follow 1000+ accounts: most posts never seen
- Used by: Twitter (optional "Latest"), Mastodon
- ML model scores each candidate: P(engagement | user, post)
- Features: author affinity, content type, recency, past interaction
- Higher engagement: users spend more time, see relevant content
- Requires ML inference infrastructure at read time (100ms budget)
- Controversial: filter bubbles, engagement over quality
- Used by: Facebook, Instagram, TikTok, LinkedIn
Ranked feeds increase engagement by 2-5x over chronological. But they require an ML scoring service that can rank hundreds of candidates in under 100ms per request. This is a massive infrastructure investment โ Facebook's ranking system uses hundreds of features, multiple ML models in cascade, and serves at billions of predictions/day. Start chronological, add ranking when you have the data and infrastructure.
- Fan-out on write: fast reads, expensive writes. Breaks for celebrities.
- Fan-out on read: simple writes, slow reads. No celebrity problem.
- Hybrid: fan-out on write for normal, on read for celebrities. Production standard.
- Ranked vs chronological: ranking increases engagement 2-5x but needs ML infrastructure.
What Real Companies Did
Every major social platform has published papers or talks about their feed architecture. The common theme: they all started simple (chronological, fan-out on write) and evolved toward hybrid systems with ML ranking as they scaled. Nobody ships a perfect feed system on day one.
Facebook / Meta
- Fan-out on write for friends (symmetric, max ~5K)
- Multi-stage ranking: coarse filter โ fine rank โ final reorder
- 1000+ features per candidate post for ML scoring
- TAO: distributed social graph store for follow relationships
- Aggregator: fetches ~2000 candidates, ranks, returns 20-50
Twitter / X
- Hybrid: fan-out on write for users with <threshold followers
- Timeline mixer: merges pre-computed + celebrity + algorithmic
- Manhattan: custom distributed key-value store for timelines
- GraphJet: real-time recommendation engine for "For You"
- ~400K fan-out operations/sec at peak
- Shifted from chronological to ranked feed (2016)
- ML model predicts: P(like), P(comment), P(share), P(save)
- Weighted combo of predictions โ final rank score
- Cassandra for feed storage (high write throughput)
- Engagement increased significantly after ranking launch
TikTok (For You Page)
- Pure fan-out on read + recommendation (no follow-based feed)
- Content-based: rank ALL content, not just followed accounts
- Watch time is the primary signal (not likes/follows)
- Massive candidate retrieval โ multi-stage ranking + filtering
- Cold start: new users get personalized feed in <10 videos
- Facebook: fan-out on write (friends), multi-stage ML ranking with 1000+ features.
- Twitter: hybrid fan-out, Manhattan KV store, GraphJet for recommendations.
- Instagram: chronological โ ranked (2016). ML predicting multiple engagement types.
- TikTok: pure recommendation (no follow-based), watch time as primary signal.
Best Practices Extracted
Feed systems teach patterns that apply to any system doing personalized content assembly: recommendation engines, email inboxes, notification centers, and content discovery pages. The principles of pre-computation, tiered ranking, and hybrid push/pull apply far beyond social media.
Tiered Ranking
- Stage 1: Candidate retrieval (thousands โ hundreds)
- Stage 2: Coarse ranking (hundreds โ tens, light model)
- Stage 3: Fine ranking (tens โ final order, heavy model)
- Stage 4: Policy filters (remove duplicates, enforce diversity)
- Transfers to: search, recommendations, ad selection
Feed Cache Design
- Store only post_ids in feed cache (not full content)
- Hydrate content separately (post store lookup)
- Cap feed length: keep last 500-1000 post_ids per user
- Evict oldest when cap reached (FIFO)
- Transfers to: any personalized list/inbox system
Selective Fan-Out
- Fan out only to active users (seen in last 7 days)
- Inactive users: build feed on demand when they return
- Saves 40-60% of fan-out writes (many users dormant)
- Trade-off: returning users get slightly stale first feed
- Transfers to: any event distribution with inactive subscribers
Store post_ids, not content, in feed caches. A post might get edited, deleted, or enriched with engagement counts. If you store content in the feed cache, every edit requires updating millions of copies. Storing only IDs means the post lives in one canonical location โ the feed cache is just an ordered list of pointers. Hydrate at read time. This separation is the single most important feed cache design decision.
- Tiered ranking: retrieve โ coarse rank โ fine rank โ policy filter. Each stage reduces candidates.
- IDs not content: feed cache stores post_ids only. Hydrate separately.
- Selective fan-out: only push to active users. Build on demand for returning dormant users.
- Feed cap: 500-1000 items max per user. FIFO eviction.
- Selective fan-out: skip inactive users (seen 7+ days ago). Saves 40-60% of write volume.
What Could Go Wrong
Feed failures are subtle โ users don't see errors, they just see stale content, missing posts from close friends, or an empty feed. These are worse than explicit errors because users don't report them โ they just gradually disengage. Every failure below has happened at major social platforms and took weeks to detect because the symptom is "fewer users opening the app" not "500 errors."
Celebrity Fan-Out Storm
- Celebrity posts without threshold โ 100M writes per post
- Fan-out service queue grows to hours of lag
- Normal users' posts delayed because queue is full of celebrity fan-outs
- Fix: celebrity threshold (skip fan-out for high-follower accounts), separate queues for normal vs high-follower fan-outs.
Feed Staleness
- Feed cache not updated for some users (fan-out worker fell behind)
- User sees 12-hour-old content โ thinks platform is dead
- Silent failure: no errors, just stale data. Hard to detect.
- Fix: monitor fan-out lag per percentile. Alert on p99 > 5min. TTL on feed cache forcing refresh.
What to monitor for fan-out lag:
fan_out_lag_p50: median time from post creation to all followers' caches updated. Alert if > 10s.fan_out_lag_p99: tail latency โ how long it takes for the slowest 1% of followers. Alert if > 2min.fan_out_queue_depth: number of pending fan-out jobs. Alert if growing (means workers falling behind).feed_cache_hit_rate: should be > 99%. Drop indicates fan-out not keeping up or cache eviction.
These metrics are the difference between detecting staleness in minutes vs discovering it from user complaints days later.
Ranking Model Degradation
- ML model starts promoting low-quality content (engagement bait)
- Or: model bug causes same 5 posts shown repeatedly
- User engagement drops 20% over days โ slow detection
- Fix: diversity constraints in ranking, content freshness signals, A/B testing all model changes, kill-switch to revert to chronological.
Cold Start (New User Empty Feed)
- New user follows 0 accounts โ empty feed โ leaves immediately
- Or: user follows accounts but no pre-computed feed exists yet
- Fan-out hasn't run yet โ cache is empty โ blank screen
- Fix: onboarding follows (suggest popular accounts), trending/explore content as fallback, immediate fan-out for new follows.
Feed quality problems are invisible to traditional monitoring. No 500 errors. No latency spikes. The system appears healthy โ but users see stale, irrelevant, or repetitive content and silently leave. You need engagement metrics (time spent, scroll depth, posts seen) as the real health signal, not just infrastructure metrics. If average scroll depth drops 15% โ something is wrong with feed quality, even if all servers are green.
- Celebrity storm: 100M fan-out writes. Fix: threshold + separate queues.
- Staleness: silent failure โ stale feeds with no errors. Fix: lag monitoring + TTL.
- Model degradation: bad ranking causes slow engagement drop. Fix: diversity rules + kill-switch.
- Cold start: new user gets empty feed. Fix: trending fallback + immediate fan-out.
- Principle: engagement metrics are the real feed health signal, not server metrics.