System Design · Case Studies

Case Study: Video Streaming

Design, trade-offs, and alternatives for a video streaming platform at scale.

Chapter One

Problem Statement

What We Are Building

A video streaming platform handles the full lifecycle — upload, transcode, store, and deliver video to millions of concurrent viewers worldwide. The core challenge is not serving a single video; it is managing petabytes of content, transcoding every upload into dozens of formats and resolutions, and serving billions of segments per day from CDN edge servers close to users. Video is the most bandwidth-intensive workload on the internet — over 80% of all internet traffic is video streaming.

Scale Requirements

Traffic & Scale

500 hours of video uploaded/minute
1B hours watched/day
100M concurrent viewers at peak
Average video: 5 min, transcoded to 6 resolutions = 30 renditions

Requirements

Playback start: <2 seconds (time to first frame)
Buffering ratio: <0.5% of playback time
Upload-to-playable: <10 minutes for standard video
Storage: growing at ~1 PB/day (raw + transcoded)

Video streaming has two completely separate systems: the upload/transcode pipeline and the playback delivery system. Upload is a batch processing problem (compute-heavy, latency-tolerant). Playback is a CDN delivery problem (bandwidth-heavy, latency-critical). Designing them as one system is the most common mistake — they have nothing in common except the stored video files.

📋 Chapter 1 — Summary

500 hours uploaded/minute. 1B hours watched/day. 100M concurrent viewers.
Two separate systems: upload pipeline (batch) and playback delivery (real-time CDN).
Every video transcoded to 6+ resolutions × multiple codecs = dozens of renditions.
Playback start under 2 seconds. Buffering below 0.5%. Storage growing at ~1 PB/day.

Chapter Two

Questions to Ask

Clarifying Before Designing

A YouTube-style on-demand platform has completely different requirements from a Twitch-style live streaming service. On-demand allows pre-transcoding and heavy CDN caching. Live streaming requires real-time encoding with sub-5-second latency. The questions below determine whether you are building a pipeline or a real-time system — or both.

🎬

Content Type

On-demand (pre-recorded) or live streaming?
Short-form (TikTok, 15s-3min) or long-form (Netflix, 2hr)?
User-generated or professionally produced?
Support for 4K? HDR? Spatial audio?

📱

Client & Network

Devices: web, mobile, smart TV, game console?
Adaptive bitrate required? (varying network conditions)
Offline download support?
DRM required? (content protection)

📊

Platform Features

Recommendations engine needed?
Content moderation (automated + manual)?
Monetization: ads, subscriptions, or hybrid?
Analytics: view counts, watch time, engagement?

Adaptive bitrate streaming is non-negotiable for any production video platform. Users on 5G get 4K. Users on 3G get 360p. The player switches mid-stream as bandwidth fluctuates. Without ABR, you either waste bandwidth (serving 4K to mobile on slow network → constant buffering) or waste quality (serving 360p to everyone). ABR is the single most impactful feature for viewer experience.

For This Case Study, Our Answers Are:

Content type: on-demand, user-generated (YouTube-style)
Max video length: up to 4 hours
Formats: web, mobile, smart TV
Adaptive bitrate: yes — 6 quality levels (240p to 4K)
Live streaming: no (out of scope for this design)
DRM: no (public UGC content, no paid licensed content)
Offline download: no
Transcoding target: <10 minutes upload-to-playable for videos under 30 min
Storage: grow at ~1PB/day (raw + all renditions)
Moderation: automated scanning before publish (out of scope for this design)

📋 Chapter 2 — Summary

On-demand vs live: fundamentally different pipelines (batch vs real-time).
Adaptive bitrate: non-negotiable. Player selects quality based on current bandwidth.
DRM: required for licensed content (Widevine, FairPlay, PlayReady).
Content moderation: must scan before publishing (copyright, safety).
Short-form vs long-form changes transcoding priority and caching strategy.

Chapter Three

Naive Design

Single-Server Monolith

The simplest design: a single server accepts video uploads, transcodes them synchronously using FFmpeg, stores the output files on local disk, and serves them directly via HTTP. The user uploads a 1GB file, waits 30 minutes for transcoding, and gets a single-resolution MP4. Playback is a direct file download — no adaptive bitrate, no CDN, no chunking. Works for a personal project. Collapses immediately under real load.

Naive Design — Monolithic Upload + Direct Serve

✅

What Works

Simple — FFmpeg CLI, single server, no infrastructure
Works for personal/small projects (<100 videos)
No external dependencies
Easy to debug — everything on one machine

💥

What Breaks

Sync transcoding blocks threads — 1 upload = 1 CPU for 30min
Single resolution: 4K to mobile = buffer hell, 360p to TV = unwatchable
No CDN: every viewer streams from origin server → bandwidth exhausted
No chunking: must download entire file before seeking
Local disk: server dies = all videos lost

📋 Chapter 3 — Summary

Monolith: upload, transcode, serve all on one server. Blocks on transcode.
Single resolution: terrible experience on diverse devices and networks.
No CDN: origin serves all traffic. 1 viral video = server down.
No chunking: no seeking, no adaptive bitrate, no resume on network change.

Chapter Four

Refined Design

Upload Pipeline + CDN Delivery

The refined design splits into two independent systems. The upload pipeline accepts raw video, stores it in object storage, queues a transcoding job, and a fleet of workers produces multiple resolutions and codecs. The playback system serves HLS/DASH manifests and video segments from global CDN edge servers. The origin storage is rarely hit — 95%+ of playback traffic is served from CDN cache. These two systems share nothing except the storage layer.

Refined Design — Upload Pipeline + CDN Playback

⬆️

Upload Pipeline

Chunked upload: client sends video in 5MB chunks (resumable)
Raw storage: S3 stores original file immediately
Transcode queue: job dispatched to worker fleet
Workers: produce 6 resolutions × 2 codecs = 12 renditions
Output: HLS/DASH segments (2-10s each) + manifest file
Metadata update: video status → "ready" in database

▶️

Playback Delivery

Manifest request: player gets resolution list + segment URLs
Segment fetch: player requests 2-10s video chunks from CDN
ABR logic: player measures bandwidth, selects resolution per segment
CDN cache: 95%+ hit rate — origin rarely touched
DRM: license server issues decryption keys per session
Result: playback starts in <2s, quality adapts seamlessly

HLS/DASH segment-based streaming is the key architectural decision. Instead of serving a single giant file, the video is split into 2-10 second segments at each quality level. The player requests segments sequentially, and can switch quality level at any segment boundary. This enables adaptive bitrate, seeking without downloading everything, and perfect CDN cacheability (each segment is an independent, cacheable object).

HLS Segment Structure — What the Player Actually Requests

📋 Chapter 4 — Summary

Upload pipeline: chunked upload → S3 → job queue → transcode fleet → segments to S3.
Playback: manifest + segments served from CDN edge. 95%+ cache hit rate.
HLS/DASH segments enable ABR, seeking, and CDN cacheability.
Each video produces 12+ renditions: 6 resolutions × 2 codecs. Stored as thousands of small segments.
Upload and playback systems share only the storage layer — independent scaling.

Chapter Five

Alternative Approaches

Transcoding & Delivery Strategies

The biggest trade-offs in video systems center on when to transcode (ahead of time vs on demand) and how to deliver (full CDN vs origin-pull vs peer-to-peer). Each approach trades off between storage cost, compute cost, latency, and viewer experience.

Eager Transcoding (Pre-transcode All)

Lazy Transcoding (On-Demand)

Transcode every video into all resolutions at upload time
Playback is instant — all renditions pre-generated
High storage cost: 12+ renditions per video, most never watched
Upload-to-playable latency: 10-30 minutes
Wasted compute: 98% of YouTube videos have <100 views
Used by: YouTube (for popular content), Netflix

Store raw video. Transcode when first viewer requests a resolution
Save storage: only produce renditions that are actually requested
First viewer waits for transcode (seconds to minutes)
Cache transcoded segments — subsequent viewers get cached version
Complex: must handle concurrent first-view requests gracefully
Used by: Cloudflare Stream, some UGC platforms

Transcoding Strategy: Eager vs Lazy Decision Flow

HLS (Apple)

DASH (MPEG Standard)

HTTP Live Streaming — Apple's proprietary protocol
M3U8 playlist + .ts segments (MPEG-TS containers)
Native support on iOS/Safari — no plugin needed
6-30 second segment duration (latency tradeoff)
Dominant for mobile and Apple ecosystem
Used by: YouTube, Twitch, most streaming services

Dynamic Adaptive Streaming over HTTP — open standard (ISO)
MPD manifest + .mp4 segments (fMP4 containers)
Codec-agnostic — supports any codec in MP4
Lower latency achievable with CMAF (1-2s segments)
Better DRM integration (Common Encryption)
Used by: Netflix, Disney+, most premium services

CMAF (Common Media Application Format) solves the dual-protocol storage problem. Traditional HLS uses .ts containers and DASH uses .fmp4 containers — requiring two full sets of segments per video. CMAF standardizes on fragmented MP4 (.cmaf or .m4s) containers that both HLS and DASH can reference with their respective manifests. The video data is stored once; only the manifest differs between protocols. For a platform serving both Apple and non-Apple devices, CMAF approximately halves segment storage costs.

HLS vs DASH vs CMAF — Quick Comparison

Most production platforms use both HLS and DASH. HLS for Apple devices (mandatory), DASH for everything else. With CMAF (Common Media Application Format), the actual video segments can be identical between HLS and DASH — only the manifest format differs. This eliminates 50% of storage duplication between the two protocols.

📋 Chapter 5 — Summary

Eager transcoding: instant playback, high storage + compute cost. Good for popular content.
Lazy transcoding: transcode on first view, save storage. First viewer pays latency penalty.
HLS: Apple's protocol, dominant on mobile. M3U8 + TS segments.
DASH: open standard, codec-agnostic, better DRM. MPD + fMP4 segments.
CMAF: shared segment format for both HLS and DASH. Halves storage duplication.

Chapter Six

What Real Companies Did

Production Video Platforms

YouTube, Netflix, and Twitch represent three fundamentally different video architectures: UGC on-demand, premium on-demand, and live streaming. Each made different trade-offs on transcoding, CDN strategy, and content delivery based on their content model and business constraints.

▶️

YouTube

500+ hours uploaded/minute, 1B hours watched/day
Custom transcoding pipeline: "Borg" scheduled workers
VP9/AV1 codec — royalty-free, 30% better compression than H.264
Hybrid CDN: Google's private network + ISP caches (Google Global Cache)
Tiered transcoding: popular videos get more resolutions over time

🎬

Netflix

15,000+ titles, 230M+ subscribers, 100M+ concurrent streams
Per-title encoding: each title gets custom bitrate ladder
Open Connect: custom CDN appliances in 1000+ ISP locations
Preposition content: popular titles pushed to edge before demand
Published VMAF quality metric for automated quality scoring

🎮

Twitch

Live streaming: <5s glass-to-glass latency requirement
Real-time transcoding: encode while streaming (no pre-transcode)
Multi-quality live: transcode to 3-4 resolutions in real-time
Chat + video sync: timestamp alignment between streams
Partners get transcoding; small streamers get source-only

📱

TikTok

Short-form: 15s-10min videos, optimized for mobile-first
Aggressive transcoding: videos ready in seconds (short = fast)
ByteDance CDN: own global edge network (like Netflix Open Connect)
Preload: prefetch next 2-3 videos while current is playing
Codec: H.265/HEVC for mobile bandwidth efficiency

Production Video Platforms — Comparison

📋 Chapter 6 — Summary

YouTube: tiered transcoding, VP9/AV1 codecs, Google Global Cache at ISPs.
Netflix: per-title encoding (custom bitrate per video), Open Connect CDN in ISPs.
Twitch: real-time transcoding during live stream, <5s latency.
TikTok: aggressive fast transcode for short-form, video preloading for seamless scroll.

Chapter Seven

Best Practices Extracted

Transferable Lessons

Video streaming teaches patterns that transfer to any system dealing with large files, compute-heavy processing pipelines, and global low-latency delivery. The CDN strategy alone applies to any content-heavy application — from image-heavy e-commerce to software distribution.

🔄

Chunked Upload

Split large files into 5-10MB chunks
Upload each chunk independently (resumable)
Server assembles after all chunks received
Network drop? Resume from last successful chunk
Transfers to: any large file upload (cloud storage, backups)

📊

Bitrate Ladder

Pre-define quality levels: 240p, 360p, 480p, 720p, 1080p, 4K
Each level has target bitrate (e.g., 1080p @ 4.5Mbps)
Per-title optimization: simple content needs lower bitrate
Test with VMAF/SSIM quality metrics — not just bitrate
Transfers to: any adaptive media delivery

🌐

CDN Tiering

Layer 1: CDN edge PoP (closest to user)
Layer 2: CDN regional hub (aggregates misses)
Layer 3: Origin storage (S3, only for first-request)
Popular content stays hot at edge. Long-tail: regional hub.
Transfers to: any globally distributed content delivery

CDN Tiering: Edge → Regional → Origin

Per-title encoding is Netflix's most impactful optimization. A cartoon needs far less bitrate than an action movie at the same perceptual quality. Instead of a fixed bitrate ladder for all content, analyze each title and produce a custom ladder. Result: 20% bandwidth savings with no quality loss. This principle — adapting processing to content characteristics — transfers to image compression, audio encoding, and data serialization.

📋 Chapter 7 — Summary

Chunked upload: resumable, fault-tolerant large file transfer.
Bitrate ladder: pre-defined quality levels. Per-title optimization for bandwidth savings.
CDN tiering: edge → regional → origin. Minimize origin load.
CMAF: shared segments for HLS + DASH. Halves storage for dual-protocol support.
Per-title encoding: analyze content complexity, generate custom bitrate ladder. ~20% bandwidth savings.

Chapter Eight

What Could Go Wrong

Common Failure Patterns

Video failures are among the most frustrating user experiences — buffering during a climactic scene, quality suddenly dropping to pixel soup, or a live stream freezing for thousands of concurrent viewers. These failures have distinct root causes and well-established fixes that separate amateur video platforms from production-grade ones.

⏳

Rebuffering Spiral

Player requests 1080p segment but network can only handle 480p
Download takes too long → buffer empties → stall
Player drops to 480p but buffer already empty → another stall
Viewers leave within 5 seconds of first buffer event
Fix: aggressive ABR: switch quality proactively when bandwidth drops. Start playback at lower quality, ramp up. Buffer 3+ segments ahead.

Rebuffering Spiral: Reactive vs Proactive ABR

🔥

Viral Video CDN Miss Storm

New video goes viral → millions of simultaneous first views
CDN cache cold → all requests hit origin at once
Origin overwhelmed → 503 errors → playback fails globally
Fix: origin shield (single CDN node absorbs all misses), request coalescing (one origin fetch serves queued requests), pre-warm CDN for anticipated viral content.

Origin Shield: Preventing Cache Miss Storms

💾

Transcode Queue Backlog

Spike in uploads → transcode queue grows faster than consumed
Upload-to-playable time grows from 10min to 2 hours
Creators frustrated — video "processing" for hours
Fix: auto-scaling transcode fleet (spot instances for cost), priority queue (paying creators first), fast-path: produce one resolution immediately, rest async.

🔓

DRM Key Delivery Failure

Player needs decryption key before playback can start
License server overloaded → key delivery times out
Playback blocked on DRM even though video segments are cached
Fix: cache DRM licenses on CDN edge (short TTL), pre-fetch license during manifest load, graceful degradation to lower DRM tier.

The first 5 seconds determine if a viewer stays or leaves. Studies show that 25% of viewers abandon after one buffering event, and startup delay beyond 2 seconds causes significant drop-off. Optimizing time-to-first-frame (start at low quality, upgrade quickly) and preventing the first buffer stall are more important than peak quality. A smooth 720p experience beats a stuttering 4K one every time.

📋 Chapter 8 — Summary

Rebuffering: aggressive ABR, start low + ramp up, 3+ segment buffer ahead.
CDN miss storm: origin shield + request coalescing + pre-warm for viral content.
Transcode backlog: auto-scale workers, priority queues, fast-path single resolution.
DRM failure: cache licenses at edge, pre-fetch during manifest load.
Principle: smooth playback at any quality beats stuttering at high quality.

← Chat System Social Feed →