System Design ยท Case Studies

Case Study: Video Streaming

Design, trade-offs, and alternatives for a video streaming platform at scale.

01
Chapter One

Problem Statement

What We Are Building

A video streaming platform handles the full lifecycle โ€” upload, transcode, store, and deliver video to millions of concurrent viewers worldwide. The core challenge is not serving a single video; it is managing petabytes of content, transcoding every upload into dozens of formats and resolutions, and serving billions of segments per day from CDN edge servers close to users. Video is the most bandwidth-intensive workload on the internet โ€” over 80% of all internet traffic is video streaming.

Scale Requirements

Traffic & Scale

  • 500 hours of video uploaded/minute
  • 1B hours watched/day
  • 100M concurrent viewers at peak
  • Average video: 5 min, transcoded to 6 resolutions = 30 renditions

Requirements

  • Playback start: <2 seconds (time to first frame)
  • Buffering ratio: <0.5% of playback time
  • Upload-to-playable: <10 minutes for standard video
  • Storage: growing at ~1 PB/day (raw + transcoded)

Video streaming has two completely separate systems: the upload/transcode pipeline and the playback delivery system. Upload is a batch processing problem (compute-heavy, latency-tolerant). Playback is a CDN delivery problem (bandwidth-heavy, latency-critical). Designing them as one system is the most common mistake โ€” they have nothing in common except the stored video files.

๐Ÿ“‹ Chapter 1 โ€” Summary
  • 500 hours uploaded/minute. 1B hours watched/day. 100M concurrent viewers.
  • Two separate systems: upload pipeline (batch) and playback delivery (real-time CDN).
  • Every video transcoded to 6+ resolutions ร— multiple codecs = dozens of renditions.
  • Playback start under 2 seconds. Buffering below 0.5%. Storage growing at ~1 PB/day.
02
Chapter Two

Questions to Ask

Clarifying Before Designing

A YouTube-style on-demand platform has completely different requirements from a Twitch-style live streaming service. On-demand allows pre-transcoding and heavy CDN caching. Live streaming requires real-time encoding with sub-5-second latency. The questions below determine whether you are building a pipeline or a real-time system โ€” or both.

๐ŸŽฌ

Content Type

  • On-demand (pre-recorded) or live streaming?
  • Short-form (TikTok, 15s-3min) or long-form (Netflix, 2hr)?
  • User-generated or professionally produced?
  • Support for 4K? HDR? Spatial audio?
๐Ÿ“ฑ

Client & Network

  • Devices: web, mobile, smart TV, game console?
  • Adaptive bitrate required? (varying network conditions)
  • Offline download support?
  • DRM required? (content protection)
๐Ÿ“Š

Platform Features

  • Recommendations engine needed?
  • Content moderation (automated + manual)?
  • Monetization: ads, subscriptions, or hybrid?
  • Analytics: view counts, watch time, engagement?

Adaptive bitrate streaming is non-negotiable for any production video platform. Users on 5G get 4K. Users on 3G get 360p. The player switches mid-stream as bandwidth fluctuates. Without ABR, you either waste bandwidth (serving 4K to mobile on slow network โ†’ constant buffering) or waste quality (serving 360p to everyone). ABR is the single most impactful feature for viewer experience.

For This Case Study, Our Answers Are:

  • Content type: on-demand, user-generated (YouTube-style)
  • Max video length: up to 4 hours
  • Formats: web, mobile, smart TV
  • Adaptive bitrate: yes โ€” 6 quality levels (240p to 4K)
  • Live streaming: no (out of scope for this design)
  • DRM: no (public UGC content, no paid licensed content)
  • Offline download: no
  • Transcoding target: <10 minutes upload-to-playable for videos under 30 min
  • Storage: grow at ~1PB/day (raw + all renditions)
  • Moderation: automated scanning before publish (out of scope for this design)
๐Ÿ“‹ Chapter 2 โ€” Summary
  • On-demand vs live: fundamentally different pipelines (batch vs real-time).
  • Adaptive bitrate: non-negotiable. Player selects quality based on current bandwidth.
  • DRM: required for licensed content (Widevine, FairPlay, PlayReady).
  • Content moderation: must scan before publishing (copyright, safety).
  • Short-form vs long-form changes transcoding priority and caching strategy.
03
Chapter Three

Naive Design

Single-Server Monolith

The simplest design: a single server accepts video uploads, transcodes them synchronously using FFmpeg, stores the output files on local disk, and serves them directly via HTTP. The user uploads a 1GB file, waits 30 minutes for transcoding, and gets a single-resolution MP4. Playback is a direct file download โ€” no adaptive bitrate, no CDN, no chunking. Works for a personal project. Collapses immediately under real load.

Naive Design โ€” Monolithic Upload + Direct Serve
Client 1 Monolith Server Upload FFmpeg transcode (30 min โ€” thread blocked) Serve HTTP file serve (single resolution) Local Disk single MP4 files Client 2 queued โ€” server busy transcoding for Client 1 What breaks at scale: Sync transcode blocks upload thread. No CDN = every view served from origin. Single resolution = bad on mobile. Local disk = no redundancy. CPU saturated. 1 upload = 1 CPU ร— 30 min. 10 concurrent uploads = server at 100% for hours. Client 2 waits in queue โ€” upload-to-playable time becomes unpredictable.
โœ…

What Works

  • Simple โ€” FFmpeg CLI, single server, no infrastructure
  • Works for personal/small projects (<100 videos)
  • No external dependencies
  • Easy to debug โ€” everything on one machine
๐Ÿ’ฅ

What Breaks

  • Sync transcoding blocks threads โ€” 1 upload = 1 CPU for 30min
  • Single resolution: 4K to mobile = buffer hell, 360p to TV = unwatchable
  • No CDN: every viewer streams from origin server โ†’ bandwidth exhausted
  • No chunking: must download entire file before seeking
  • Local disk: server dies = all videos lost
๐Ÿ“‹ Chapter 3 โ€” Summary
  • Monolith: upload, transcode, serve all on one server. Blocks on transcode.
  • Single resolution: terrible experience on diverse devices and networks.
  • No CDN: origin serves all traffic. 1 viral video = server down.
  • No chunking: no seeking, no adaptive bitrate, no resume on network change.
04
Chapter Four

Refined Design

Upload Pipeline + CDN Delivery

The refined design splits into two independent systems. The upload pipeline accepts raw video, stores it in object storage, queues a transcoding job, and a fleet of workers produces multiple resolutions and codecs. The playback system serves HLS/DASH manifests and video segments from global CDN edge servers. The origin storage is rarely hit โ€” 95%+ of playback traffic is served from CDN cache. These two systems share nothing except the storage layer.

Refined Design โ€” Upload Pipeline + CDN Playback
Upload Pipeline Creator Upload API chunked upload Object Store S3 (raw video) Job Queue SQS / Kafka Transcode Fleet FFmpeg workers GPU or CPU Segment Store S3 (HLS/DASH segments) shared: upload writes, CDN reads 12 renditions Metadata DB title, status, URLs status: processing status: ready Playback Delivery Viewer CDN Edge (global) 95%+ cache hit rate segments โ†’ CDN origin Adaptive Bitrate 4K 1080p 480p โ† player selects per segment Video API + Manifest resolution list, DRM tokens lookup video URLs Upload: Creator โ†’ API โ†’ S3 โ†’ Queue โ†’ Transcode โ†’ Segment Store Playback: Viewer โ†’ CDN Edge (cache hit) โ†’ stream HLS segments โ†’ adaptive quality Segment Store is shared: upload pipeline writes, CDN reads. One storage layer connects both systems.
โฌ†๏ธ

Upload Pipeline

  • Chunked upload: client sends video in 5MB chunks (resumable)
  • Raw storage: S3 stores original file immediately
  • Transcode queue: job dispatched to worker fleet
  • Workers: produce 6 resolutions ร— 2 codecs = 12 renditions
  • Output: HLS/DASH segments (2-10s each) + manifest file
  • Metadata update: video status โ†’ "ready" in database
โ–ถ๏ธ

Playback Delivery

  • Manifest request: player gets resolution list + segment URLs
  • Segment fetch: player requests 2-10s video chunks from CDN
  • ABR logic: player measures bandwidth, selects resolution per segment
  • CDN cache: 95%+ hit rate โ€” origin rarely touched
  • DRM: license server issues decryption keys per session
  • Result: playback starts in <2s, quality adapts seamlessly

HLS/DASH segment-based streaming is the key architectural decision. Instead of serving a single giant file, the video is split into 2-10 second segments at each quality level. The player requests segments sequentially, and can switch quality level at any segment boundary. This enables adaptive bitrate, seeking without downloading everything, and perfect CDN cacheability (each segment is an independent, cacheable object).

HLS Segment Structure โ€” What the Player Actually Requests
Manifest (m3u8) โ€” tells player where all segments are: #EXT-X-STREAM-INF:BANDWIDTH=4500000,RESOLUTION=1920x1080 1080p/seg_001.ts #EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=854x480 480p/seg_001.ts Timeline (one quality level): seg_001.ts seg_002.ts seg_003.ts ... 0s โ€“ 6s 6s โ€“ 12s 12s โ€“ 18s All qualities (parallel tracks): 1080p seg_001_1080 seg_002_1080 seg_003_1080 480p seg_001_480 seg_002_480 seg_003_480 240p seg_001_240 seg_002_240 seg_003_240 โ†— Player switches quality at any segment boundary (ABR) ... ... ...
๐Ÿ“‹ Chapter 4 โ€” Summary
  • Upload pipeline: chunked upload โ†’ S3 โ†’ job queue โ†’ transcode fleet โ†’ segments to S3.
  • Playback: manifest + segments served from CDN edge. 95%+ cache hit rate.
  • HLS/DASH segments enable ABR, seeking, and CDN cacheability.
  • Each video produces 12+ renditions: 6 resolutions ร— 2 codecs. Stored as thousands of small segments.
  • Upload and playback systems share only the storage layer โ€” independent scaling.
05
Chapter Five

Alternative Approaches

Transcoding & Delivery Strategies

The biggest trade-offs in video systems center on when to transcode (ahead of time vs on demand) and how to deliver (full CDN vs origin-pull vs peer-to-peer). Each approach trades off between storage cost, compute cost, latency, and viewer experience.

Eager Transcoding (Pre-transcode All)
Lazy Transcoding (On-Demand)
  • Transcode every video into all resolutions at upload time
  • Playback is instant โ€” all renditions pre-generated
  • High storage cost: 12+ renditions per video, most never watched
  • Upload-to-playable latency: 10-30 minutes
  • Wasted compute: 98% of YouTube videos have <100 views
  • Used by: YouTube (for popular content), Netflix
  • Store raw video. Transcode when first viewer requests a resolution
  • Save storage: only produce renditions that are actually requested
  • First viewer waits for transcode (seconds to minutes)
  • Cache transcoded segments โ€” subsequent viewers get cached version
  • Complex: must handle concurrent first-view requests gracefully
  • Used by: Cloudflare Stream, some UGC platforms
Transcoding Strategy: Eager vs Lazy Decision Flow
Video uploaded Expected views? (creator tier / history) High (>10K) Eager transcode all resolutions immediately Store all 12 renditions Low / Unknown Lazy transcode on first request, then cache Store raw only until first play Hybrid approach: produce 1 fast rendition (480p) immediately, schedule rest based on views
HLS (Apple)
DASH (MPEG Standard)
  • HTTP Live Streaming โ€” Apple's proprietary protocol
  • M3U8 playlist + .ts segments (MPEG-TS containers)
  • Native support on iOS/Safari โ€” no plugin needed
  • 6-30 second segment duration (latency tradeoff)
  • Dominant for mobile and Apple ecosystem
  • Used by: YouTube, Twitch, most streaming services
  • Dynamic Adaptive Streaming over HTTP โ€” open standard (ISO)
  • MPD manifest + .mp4 segments (fMP4 containers)
  • Codec-agnostic โ€” supports any codec in MP4
  • Lower latency achievable with CMAF (1-2s segments)
  • Better DRM integration (Common Encryption)
  • Used by: Netflix, Disney+, most premium services

CMAF (Common Media Application Format) solves the dual-protocol storage problem. Traditional HLS uses .ts containers and DASH uses .fmp4 containers โ€” requiring two full sets of segments per video. CMAF standardizes on fragmented MP4 (.cmaf or .m4s) containers that both HLS and DASH can reference with their respective manifests. The video data is stored once; only the manifest differs between protocols. For a platform serving both Apple and non-Apple devices, CMAF approximately halves segment storage costs.

HLS vs DASH vs CMAF โ€” Quick Comparison
HLS DASH CMAF (both) Manifest .m3u8 .mpd โ€” Segments .ts .fmp4 .cmaf / .m4s Apple required โœ“ โœ— โœ“ DRM Basic Full Full

Most production platforms use both HLS and DASH. HLS for Apple devices (mandatory), DASH for everything else. With CMAF (Common Media Application Format), the actual video segments can be identical between HLS and DASH โ€” only the manifest format differs. This eliminates 50% of storage duplication between the two protocols.

๐Ÿ“‹ Chapter 5 โ€” Summary
  • Eager transcoding: instant playback, high storage + compute cost. Good for popular content.
  • Lazy transcoding: transcode on first view, save storage. First viewer pays latency penalty.
  • HLS: Apple's protocol, dominant on mobile. M3U8 + TS segments.
  • DASH: open standard, codec-agnostic, better DRM. MPD + fMP4 segments.
  • CMAF: shared segment format for both HLS and DASH. Halves storage duplication.
06
Chapter Six

What Real Companies Did

Production Video Platforms

YouTube, Netflix, and Twitch represent three fundamentally different video architectures: UGC on-demand, premium on-demand, and live streaming. Each made different trade-offs on transcoding, CDN strategy, and content delivery based on their content model and business constraints.

โ–ถ๏ธ

YouTube

  • 500+ hours uploaded/minute, 1B hours watched/day
  • Custom transcoding pipeline: "Borg" scheduled workers
  • VP9/AV1 codec โ€” royalty-free, 30% better compression than H.264
  • Hybrid CDN: Google's private network + ISP caches (Google Global Cache)
  • Tiered transcoding: popular videos get more resolutions over time
๐ŸŽฌ

Netflix

  • 15,000+ titles, 230M+ subscribers, 100M+ concurrent streams
  • Per-title encoding: each title gets custom bitrate ladder
  • Open Connect: custom CDN appliances in 1000+ ISP locations
  • Preposition content: popular titles pushed to edge before demand
  • Published VMAF quality metric for automated quality scoring
๐ŸŽฎ

Twitch

  • Live streaming: <5s glass-to-glass latency requirement
  • Real-time transcoding: encode while streaming (no pre-transcode)
  • Multi-quality live: transcode to 3-4 resolutions in real-time
  • Chat + video sync: timestamp alignment between streams
  • Partners get transcoding; small streamers get source-only
๐Ÿ“ฑ

TikTok

  • Short-form: 15s-10min videos, optimized for mobile-first
  • Aggressive transcoding: videos ready in seconds (short = fast)
  • ByteDance CDN: own global edge network (like Netflix Open Connect)
  • Preload: prefetch next 2-3 videos while current is playing
  • Codec: H.265/HEVC for mobile bandwidth efficiency
Production Video Platforms โ€” Comparison
Company Content Type Codec CDN Strategy Special Pattern YouTube UGC on-demand VP9 / AV1 Google Global Cache (ISP-embedded) Tiered transcoding by popularity Netflix Premium on-demand H.264 / H.265 / AV1 Open Connect (own appliances in ISPs) Per-title custom bitrate ladder Twitch Live streaming H.264 (real-time) AWS CloudFront Real-time transcode, partners only TikTok Short-form UGC H.265 / HEVC ByteDance CDN (own) Preload next 2โ€“3 videos, scroll-optimized
๐Ÿ“‹ Chapter 6 โ€” Summary
  • YouTube: tiered transcoding, VP9/AV1 codecs, Google Global Cache at ISPs.
  • Netflix: per-title encoding (custom bitrate per video), Open Connect CDN in ISPs.
  • Twitch: real-time transcoding during live stream, <5s latency.
  • TikTok: aggressive fast transcode for short-form, video preloading for seamless scroll.
07
Chapter Seven

Best Practices Extracted

Transferable Lessons

Video streaming teaches patterns that transfer to any system dealing with large files, compute-heavy processing pipelines, and global low-latency delivery. The CDN strategy alone applies to any content-heavy application โ€” from image-heavy e-commerce to software distribution.

๐Ÿ”„

Chunked Upload

  • Split large files into 5-10MB chunks
  • Upload each chunk independently (resumable)
  • Server assembles after all chunks received
  • Network drop? Resume from last successful chunk
  • Transfers to: any large file upload (cloud storage, backups)
๐Ÿ“Š

Bitrate Ladder

  • Pre-define quality levels: 240p, 360p, 480p, 720p, 1080p, 4K
  • Each level has target bitrate (e.g., 1080p @ 4.5Mbps)
  • Per-title optimization: simple content needs lower bitrate
  • Test with VMAF/SSIM quality metrics โ€” not just bitrate
  • Transfers to: any adaptive media delivery
๐ŸŒ

CDN Tiering

  • Layer 1: CDN edge PoP (closest to user)
  • Layer 2: CDN regional hub (aggregates misses)
  • Layer 3: Origin storage (S3, only for first-request)
  • Popular content stays hot at edge. Long-tail: regional hub.
  • Transfers to: any globally distributed content delivery
CDN Tiering: Edge โ†’ Regional โ†’ Origin
User in Tokyo request CDN Edge PoP โ€” Tokyo ~5ms 95% served here miss CDN Regional Hub โ€” Asia ~30ms 4% served here miss Origin S3 ~150ms 1% reaches here Popular content stays hot at edge. Long-tail: regional hub. Origin: rare cold start.

Per-title encoding is Netflix's most impactful optimization. A cartoon needs far less bitrate than an action movie at the same perceptual quality. Instead of a fixed bitrate ladder for all content, analyze each title and produce a custom ladder. Result: 20% bandwidth savings with no quality loss. This principle โ€” adapting processing to content characteristics โ€” transfers to image compression, audio encoding, and data serialization.

๐Ÿ“‹ Chapter 7 โ€” Summary
  • Chunked upload: resumable, fault-tolerant large file transfer.
  • Bitrate ladder: pre-defined quality levels. Per-title optimization for bandwidth savings.
  • CDN tiering: edge โ†’ regional โ†’ origin. Minimize origin load.
  • CMAF: shared segments for HLS + DASH. Halves storage for dual-protocol support.
  • Per-title encoding: analyze content complexity, generate custom bitrate ladder. ~20% bandwidth savings.
08
Chapter Eight

What Could Go Wrong

Common Failure Patterns

Video failures are among the most frustrating user experiences โ€” buffering during a climactic scene, quality suddenly dropping to pixel soup, or a live stream freezing for thousands of concurrent viewers. These failures have distinct root causes and well-established fixes that separate amateur video platforms from production-grade ones.

โณ

Rebuffering Spiral

  • Player requests 1080p segment but network can only handle 480p
  • Download takes too long โ†’ buffer empties โ†’ stall
  • Player drops to 480p but buffer already empty โ†’ another stall
  • Viewers leave within 5 seconds of first buffer event
  • Fix: aggressive ABR: switch quality proactively when bandwidth drops. Start playback at lower quality, ramp up. Buffer 3+ segments ahead.
Rebuffering Spiral: Reactive vs Proactive ABR
Reactive ABR (too slow) t=0 t=5s t=15s t=20s t=30s 1080p STALL 3s 480p STALL 4s 360p ๐Ÿ‘‹ viewer leaves Proactive ABR (start low, ramp up) t=0 t=3s t=25s 360p 480p โ†’ 720p 1080p (stable) 1080p โ†’ 4K โœ“ No stalls โ€” buffer always > 3 segments ahead. Smooth ramp to max quality.
๐Ÿ”ฅ

Viral Video CDN Miss Storm

  • New video goes viral โ†’ millions of simultaneous first views
  • CDN cache cold โ†’ all requests hit origin at once
  • Origin overwhelmed โ†’ 503 errors โ†’ playback fails globally
  • Fix: origin shield (single CDN node absorbs all misses), request coalescing (one origin fetch serves queued requests), pre-warm CDN for anticipated viral content.
Origin Shield: Preventing Cache Miss Storms
Without Origin Shield 1M viewers CDN Edge Origin S3 all misses OVERLOADED 503 errors โ†’ playback fails With Origin Shield 1M viewers CDN Edge Origin Shield coalesced Origin S3 1 request โœ“ Origin healthy Shield coalesces 1M edge misses into 1 origin fetch per segment
๐Ÿ’พ

Transcode Queue Backlog

  • Spike in uploads โ†’ transcode queue grows faster than consumed
  • Upload-to-playable time grows from 10min to 2 hours
  • Creators frustrated โ€” video "processing" for hours
  • Fix: auto-scaling transcode fleet (spot instances for cost), priority queue (paying creators first), fast-path: produce one resolution immediately, rest async.
๐Ÿ”“

DRM Key Delivery Failure

  • Player needs decryption key before playback can start
  • License server overloaded โ†’ key delivery times out
  • Playback blocked on DRM even though video segments are cached
  • Fix: cache DRM licenses on CDN edge (short TTL), pre-fetch license during manifest load, graceful degradation to lower DRM tier.

The first 5 seconds determine if a viewer stays or leaves. Studies show that 25% of viewers abandon after one buffering event, and startup delay beyond 2 seconds causes significant drop-off. Optimizing time-to-first-frame (start at low quality, upgrade quickly) and preventing the first buffer stall are more important than peak quality. A smooth 720p experience beats a stuttering 4K one every time.

๐Ÿ“‹ Chapter 8 โ€” Summary
  • Rebuffering: aggressive ABR, start low + ramp up, 3+ segment buffer ahead.
  • CDN miss storm: origin shield + request coalescing + pre-warm for viral content.
  • Transcode backlog: auto-scale workers, priority queues, fast-path single resolution.
  • DRM failure: cache licenses at edge, pre-fetch during manifest load.
  • Principle: smooth playback at any quality beats stuttering at high quality.