Case Study: Video Streaming
Design, trade-offs, and alternatives for a video streaming platform at scale.
Problem Statement
A video streaming platform handles the full lifecycle โ upload, transcode, store, and deliver video to millions of concurrent viewers worldwide. The core challenge is not serving a single video; it is managing petabytes of content, transcoding every upload into dozens of formats and resolutions, and serving billions of segments per day from CDN edge servers close to users. Video is the most bandwidth-intensive workload on the internet โ over 80% of all internet traffic is video streaming.
Traffic & Scale
- 500 hours of video uploaded/minute
- 1B hours watched/day
- 100M concurrent viewers at peak
- Average video: 5 min, transcoded to 6 resolutions = 30 renditions
Requirements
- Playback start: <2 seconds (time to first frame)
- Buffering ratio: <0.5% of playback time
- Upload-to-playable: <10 minutes for standard video
- Storage: growing at ~1 PB/day (raw + transcoded)
Video streaming has two completely separate systems: the upload/transcode pipeline and the playback delivery system. Upload is a batch processing problem (compute-heavy, latency-tolerant). Playback is a CDN delivery problem (bandwidth-heavy, latency-critical). Designing them as one system is the most common mistake โ they have nothing in common except the stored video files.
- 500 hours uploaded/minute. 1B hours watched/day. 100M concurrent viewers.
- Two separate systems: upload pipeline (batch) and playback delivery (real-time CDN).
- Every video transcoded to 6+ resolutions ร multiple codecs = dozens of renditions.
- Playback start under 2 seconds. Buffering below 0.5%. Storage growing at ~1 PB/day.
Questions to Ask
A YouTube-style on-demand platform has completely different requirements from a Twitch-style live streaming service. On-demand allows pre-transcoding and heavy CDN caching. Live streaming requires real-time encoding with sub-5-second latency. The questions below determine whether you are building a pipeline or a real-time system โ or both.
Content Type
- On-demand (pre-recorded) or live streaming?
- Short-form (TikTok, 15s-3min) or long-form (Netflix, 2hr)?
- User-generated or professionally produced?
- Support for 4K? HDR? Spatial audio?
Client & Network
- Devices: web, mobile, smart TV, game console?
- Adaptive bitrate required? (varying network conditions)
- Offline download support?
- DRM required? (content protection)
Platform Features
- Recommendations engine needed?
- Content moderation (automated + manual)?
- Monetization: ads, subscriptions, or hybrid?
- Analytics: view counts, watch time, engagement?
Adaptive bitrate streaming is non-negotiable for any production video platform. Users on 5G get 4K. Users on 3G get 360p. The player switches mid-stream as bandwidth fluctuates. Without ABR, you either waste bandwidth (serving 4K to mobile on slow network โ constant buffering) or waste quality (serving 360p to everyone). ABR is the single most impactful feature for viewer experience.
For This Case Study, Our Answers Are:
- Content type: on-demand, user-generated (YouTube-style)
- Max video length: up to 4 hours
- Formats: web, mobile, smart TV
- Adaptive bitrate: yes โ 6 quality levels (240p to 4K)
- Live streaming: no (out of scope for this design)
- DRM: no (public UGC content, no paid licensed content)
- Offline download: no
- Transcoding target: <10 minutes upload-to-playable for videos under 30 min
- Storage: grow at ~1PB/day (raw + all renditions)
- Moderation: automated scanning before publish (out of scope for this design)
- On-demand vs live: fundamentally different pipelines (batch vs real-time).
- Adaptive bitrate: non-negotiable. Player selects quality based on current bandwidth.
- DRM: required for licensed content (Widevine, FairPlay, PlayReady).
- Content moderation: must scan before publishing (copyright, safety).
- Short-form vs long-form changes transcoding priority and caching strategy.
Naive Design
The simplest design: a single server accepts video uploads, transcodes them synchronously using FFmpeg, stores the output files on local disk, and serves them directly via HTTP. The user uploads a 1GB file, waits 30 minutes for transcoding, and gets a single-resolution MP4. Playback is a direct file download โ no adaptive bitrate, no CDN, no chunking. Works for a personal project. Collapses immediately under real load.
What Works
- Simple โ FFmpeg CLI, single server, no infrastructure
- Works for personal/small projects (<100 videos)
- No external dependencies
- Easy to debug โ everything on one machine
What Breaks
- Sync transcoding blocks threads โ 1 upload = 1 CPU for 30min
- Single resolution: 4K to mobile = buffer hell, 360p to TV = unwatchable
- No CDN: every viewer streams from origin server โ bandwidth exhausted
- No chunking: must download entire file before seeking
- Local disk: server dies = all videos lost
- Monolith: upload, transcode, serve all on one server. Blocks on transcode.
- Single resolution: terrible experience on diverse devices and networks.
- No CDN: origin serves all traffic. 1 viral video = server down.
- No chunking: no seeking, no adaptive bitrate, no resume on network change.
Refined Design
The refined design splits into two independent systems. The upload pipeline accepts raw video, stores it in object storage, queues a transcoding job, and a fleet of workers produces multiple resolutions and codecs. The playback system serves HLS/DASH manifests and video segments from global CDN edge servers. The origin storage is rarely hit โ 95%+ of playback traffic is served from CDN cache. These two systems share nothing except the storage layer.
Upload Pipeline
- Chunked upload: client sends video in 5MB chunks (resumable)
- Raw storage: S3 stores original file immediately
- Transcode queue: job dispatched to worker fleet
- Workers: produce 6 resolutions ร 2 codecs = 12 renditions
- Output: HLS/DASH segments (2-10s each) + manifest file
- Metadata update: video status โ "ready" in database
Playback Delivery
- Manifest request: player gets resolution list + segment URLs
- Segment fetch: player requests 2-10s video chunks from CDN
- ABR logic: player measures bandwidth, selects resolution per segment
- CDN cache: 95%+ hit rate โ origin rarely touched
- DRM: license server issues decryption keys per session
- Result: playback starts in <2s, quality adapts seamlessly
HLS/DASH segment-based streaming is the key architectural decision. Instead of serving a single giant file, the video is split into 2-10 second segments at each quality level. The player requests segments sequentially, and can switch quality level at any segment boundary. This enables adaptive bitrate, seeking without downloading everything, and perfect CDN cacheability (each segment is an independent, cacheable object).
- Upload pipeline: chunked upload โ S3 โ job queue โ transcode fleet โ segments to S3.
- Playback: manifest + segments served from CDN edge. 95%+ cache hit rate.
- HLS/DASH segments enable ABR, seeking, and CDN cacheability.
- Each video produces 12+ renditions: 6 resolutions ร 2 codecs. Stored as thousands of small segments.
- Upload and playback systems share only the storage layer โ independent scaling.
Alternative Approaches
The biggest trade-offs in video systems center on when to transcode (ahead of time vs on demand) and how to deliver (full CDN vs origin-pull vs peer-to-peer). Each approach trades off between storage cost, compute cost, latency, and viewer experience.
- Transcode every video into all resolutions at upload time
- Playback is instant โ all renditions pre-generated
- High storage cost: 12+ renditions per video, most never watched
- Upload-to-playable latency: 10-30 minutes
- Wasted compute: 98% of YouTube videos have <100 views
- Used by: YouTube (for popular content), Netflix
- Store raw video. Transcode when first viewer requests a resolution
- Save storage: only produce renditions that are actually requested
- First viewer waits for transcode (seconds to minutes)
- Cache transcoded segments โ subsequent viewers get cached version
- Complex: must handle concurrent first-view requests gracefully
- Used by: Cloudflare Stream, some UGC platforms
- HTTP Live Streaming โ Apple's proprietary protocol
- M3U8 playlist + .ts segments (MPEG-TS containers)
- Native support on iOS/Safari โ no plugin needed
- 6-30 second segment duration (latency tradeoff)
- Dominant for mobile and Apple ecosystem
- Used by: YouTube, Twitch, most streaming services
- Dynamic Adaptive Streaming over HTTP โ open standard (ISO)
- MPD manifest + .mp4 segments (fMP4 containers)
- Codec-agnostic โ supports any codec in MP4
- Lower latency achievable with CMAF (1-2s segments)
- Better DRM integration (Common Encryption)
- Used by: Netflix, Disney+, most premium services
CMAF (Common Media Application Format) solves the dual-protocol storage problem. Traditional HLS uses .ts containers and DASH uses .fmp4 containers โ requiring two full sets of segments per video. CMAF standardizes on fragmented MP4 (.cmaf or .m4s) containers that both HLS and DASH can reference with their respective manifests. The video data is stored once; only the manifest differs between protocols. For a platform serving both Apple and non-Apple devices, CMAF approximately halves segment storage costs.
Most production platforms use both HLS and DASH. HLS for Apple devices (mandatory), DASH for everything else. With CMAF (Common Media Application Format), the actual video segments can be identical between HLS and DASH โ only the manifest format differs. This eliminates 50% of storage duplication between the two protocols.
- Eager transcoding: instant playback, high storage + compute cost. Good for popular content.
- Lazy transcoding: transcode on first view, save storage. First viewer pays latency penalty.
- HLS: Apple's protocol, dominant on mobile. M3U8 + TS segments.
- DASH: open standard, codec-agnostic, better DRM. MPD + fMP4 segments.
- CMAF: shared segment format for both HLS and DASH. Halves storage duplication.
What Real Companies Did
YouTube, Netflix, and Twitch represent three fundamentally different video architectures: UGC on-demand, premium on-demand, and live streaming. Each made different trade-offs on transcoding, CDN strategy, and content delivery based on their content model and business constraints.
YouTube
- 500+ hours uploaded/minute, 1B hours watched/day
- Custom transcoding pipeline: "Borg" scheduled workers
- VP9/AV1 codec โ royalty-free, 30% better compression than H.264
- Hybrid CDN: Google's private network + ISP caches (Google Global Cache)
- Tiered transcoding: popular videos get more resolutions over time
Netflix
- 15,000+ titles, 230M+ subscribers, 100M+ concurrent streams
- Per-title encoding: each title gets custom bitrate ladder
- Open Connect: custom CDN appliances in 1000+ ISP locations
- Preposition content: popular titles pushed to edge before demand
- Published VMAF quality metric for automated quality scoring
Twitch
- Live streaming: <5s glass-to-glass latency requirement
- Real-time transcoding: encode while streaming (no pre-transcode)
- Multi-quality live: transcode to 3-4 resolutions in real-time
- Chat + video sync: timestamp alignment between streams
- Partners get transcoding; small streamers get source-only
TikTok
- Short-form: 15s-10min videos, optimized for mobile-first
- Aggressive transcoding: videos ready in seconds (short = fast)
- ByteDance CDN: own global edge network (like Netflix Open Connect)
- Preload: prefetch next 2-3 videos while current is playing
- Codec: H.265/HEVC for mobile bandwidth efficiency
- YouTube: tiered transcoding, VP9/AV1 codecs, Google Global Cache at ISPs.
- Netflix: per-title encoding (custom bitrate per video), Open Connect CDN in ISPs.
- Twitch: real-time transcoding during live stream, <5s latency.
- TikTok: aggressive fast transcode for short-form, video preloading for seamless scroll.
Best Practices Extracted
Video streaming teaches patterns that transfer to any system dealing with large files, compute-heavy processing pipelines, and global low-latency delivery. The CDN strategy alone applies to any content-heavy application โ from image-heavy e-commerce to software distribution.
Chunked Upload
- Split large files into 5-10MB chunks
- Upload each chunk independently (resumable)
- Server assembles after all chunks received
- Network drop? Resume from last successful chunk
- Transfers to: any large file upload (cloud storage, backups)
Bitrate Ladder
- Pre-define quality levels: 240p, 360p, 480p, 720p, 1080p, 4K
- Each level has target bitrate (e.g., 1080p @ 4.5Mbps)
- Per-title optimization: simple content needs lower bitrate
- Test with VMAF/SSIM quality metrics โ not just bitrate
- Transfers to: any adaptive media delivery
CDN Tiering
- Layer 1: CDN edge PoP (closest to user)
- Layer 2: CDN regional hub (aggregates misses)
- Layer 3: Origin storage (S3, only for first-request)
- Popular content stays hot at edge. Long-tail: regional hub.
- Transfers to: any globally distributed content delivery
Per-title encoding is Netflix's most impactful optimization. A cartoon needs far less bitrate than an action movie at the same perceptual quality. Instead of a fixed bitrate ladder for all content, analyze each title and produce a custom ladder. Result: 20% bandwidth savings with no quality loss. This principle โ adapting processing to content characteristics โ transfers to image compression, audio encoding, and data serialization.
- Chunked upload: resumable, fault-tolerant large file transfer.
- Bitrate ladder: pre-defined quality levels. Per-title optimization for bandwidth savings.
- CDN tiering: edge โ regional โ origin. Minimize origin load.
- CMAF: shared segments for HLS + DASH. Halves storage for dual-protocol support.
- Per-title encoding: analyze content complexity, generate custom bitrate ladder. ~20% bandwidth savings.
What Could Go Wrong
Video failures are among the most frustrating user experiences โ buffering during a climactic scene, quality suddenly dropping to pixel soup, or a live stream freezing for thousands of concurrent viewers. These failures have distinct root causes and well-established fixes that separate amateur video platforms from production-grade ones.
Rebuffering Spiral
- Player requests 1080p segment but network can only handle 480p
- Download takes too long โ buffer empties โ stall
- Player drops to 480p but buffer already empty โ another stall
- Viewers leave within 5 seconds of first buffer event
- Fix: aggressive ABR: switch quality proactively when bandwidth drops. Start playback at lower quality, ramp up. Buffer 3+ segments ahead.
Viral Video CDN Miss Storm
- New video goes viral โ millions of simultaneous first views
- CDN cache cold โ all requests hit origin at once
- Origin overwhelmed โ 503 errors โ playback fails globally
- Fix: origin shield (single CDN node absorbs all misses), request coalescing (one origin fetch serves queued requests), pre-warm CDN for anticipated viral content.
Transcode Queue Backlog
- Spike in uploads โ transcode queue grows faster than consumed
- Upload-to-playable time grows from 10min to 2 hours
- Creators frustrated โ video "processing" for hours
- Fix: auto-scaling transcode fleet (spot instances for cost), priority queue (paying creators first), fast-path: produce one resolution immediately, rest async.
DRM Key Delivery Failure
- Player needs decryption key before playback can start
- License server overloaded โ key delivery times out
- Playback blocked on DRM even though video segments are cached
- Fix: cache DRM licenses on CDN edge (short TTL), pre-fetch license during manifest load, graceful degradation to lower DRM tier.
The first 5 seconds determine if a viewer stays or leaves. Studies show that 25% of viewers abandon after one buffering event, and startup delay beyond 2 seconds causes significant drop-off. Optimizing time-to-first-frame (start at low quality, upgrade quickly) and preventing the first buffer stall are more important than peak quality. A smooth 720p experience beats a stuttering 4K one every time.
- Rebuffering: aggressive ABR, start low + ramp up, 3+ segment buffer ahead.
- CDN miss storm: origin shield + request coalescing + pre-warm for viral content.
- Transcode backlog: auto-scale workers, priority queues, fast-path single resolution.
- DRM failure: cache licenses at edge, pre-fetch during manifest load.
- Principle: smooth playback at any quality beats stuttering at high quality.