System Design · Case Studies

Case Study: URL Shortener

Design, trade-offs, and alternatives for a URL shortener at scale.

Chapter One

Problem Statement

What We Are Building

A URL shortener takes a long URL and produces a short alias (e.g., short.ly/abc123) that redirects to the original. Sounds trivial — until you need to handle billions of URLs, serve redirects in under 10ms, never return a broken link, and provide analytics on every click. The simplicity of the interface hides meaningful distributed systems challenges: unique ID generation at scale, read-heavy caching, and geographic latency optimization.

Traffic

100M URLs created/month (~40 writes/sec)
10B redirects/month (~4,000 reads/sec)
Read:Write ratio = 100:1
Peak: 10x average → 40,000 reads/sec

Requirements

Latency SLA: redirect <10ms (p99)
Availability: 99.99% (52 min downtime/year)
Storage: 5 years retention → 6B URLs
Short URL length: 7 characters (Base62 = 3.5T combinations)

Authentication requirement: is URL creation anonymous or does it require a registered account? Anonymous creation is simpler but exposes the system to spam and abuse at scale. Authenticated creation enables per-user analytics, rate limiting by user identity, and custom alias namespace management. This choice affects whether the system needs an auth service, a user database, and session management — significant additional infrastructure.

Storage & Traffic Estimation

URL Records

100M new URLs/month × 12 months × 5 years = 6B URLs
Per URL: short code (7 bytes) + long URL (avg 100 bytes) + timestamp (8 bytes) + metadata (user_id, expiry, click count: ~50 bytes) = ~165 bytes
Total: 6B × 165 bytes = ~1TB for URL records

Analytics Events

10B clicks/month × 60 months = 600B click events
Per click event: ~50 bytes (timestamp, geo, referrer, device)
Total: 600B × 50 bytes = ~30TB for analytics
Requires separate store (ClickHouse, Redshift) — not the operational database

This is a read-dominated system. 100:1 read-to-write ratio means caching strategy is the most important design decision. If you optimize writes but ignore reads, you have designed for 1% of the traffic.

📋 Chapter 1 — Summary

URL shortener: simple interface, real distributed systems challenges at scale.
100:1 read:write ratio — caching dominates the design.
6B URLs over 5 years. 7-char Base62 gives 3.5T combinations — no collision pressure.
Redirect latency target: <10ms p99. Availability: 99.99%.
Storage: 6B URLs × 165 bytes = ~1TB for URL records. Analytics events: separate store needed — 30TB over 5 years.

Chapter Two

Questions to Ask

Clarifying Before Designing

A senior engineer does not jump to architecture diagrams. They ask questions that reveal hidden requirements, constraints, and trade-offs. Each question below changes the design in meaningful ways — get them wrong, and you build the wrong system.

📊

Traffic Pattern

What is the read:write ratio?
Are there traffic spikes (viral links)?
Geographic distribution of users?
Is there a burst pattern (marketing campaigns)?

🔗

URL Behavior

Custom aliases allowed? (branding)
Do URLs expire? (TTL or forever?)
Same long URL → same short URL? (deduplication)
Maximum URL length accepted?

📈

Analytics & Features

Click analytics required? (who, when, where)
Real-time or batch analytics?
Rate limiting per user?
Abuse detection (phishing URLs)?

🛡️

Security & Access Control

Anonymous creation or registered users only?
What is the maximum creation rate per IP or user?
Is URL scanning (phishing, malware) required at creation time?
Should creators be able to delete or update their URLs?
Is there a dashboard for creators to see their URL analytics?
Are there geographic restrictions on redirect destinations?

The deduplication question changes the entire ID generation strategy. If the same long URL always maps to the same short URL, you need a lookup on write (expensive at scale). If each creation generates a new short URL regardless, write path is much simpler — just generate a unique ID.

SLA degradation behavior: what is the acceptable behavior when the system is degraded? Should it serve stale cached redirects rather than returning errors, or fail fast with 503? This determines whether the system should be designed for graceful degradation (serve from cache even if the database is down) or strict consistency (refuse to redirect if mapping cannot be verified). Most URL shorteners choose graceful degradation — a stale redirect is better than no redirect.

For This Case Study, Our Answers Are:

Read:write ratio: 100:1 (read-dominated — caching is the primary design concern)
Custom aliases: yes — must check uniqueness on write
URL expiry: tiered — anonymous: 30 days, registered: 1 year, paid: permanent
Deduplication: no — each creation generates a new short URL (simplifies write path)
Analytics: yes — async, eventual consistency acceptable, separate store
Geographic distribution: global users — CDN required, DB origin in one region initially
Degradation behavior: graceful — serve stale cache rather than error
Creation rate limit: yes — 10 URLs/minute per IP (anonymous), higher for authenticated
Abuse scanning: yes — async scan at creation, interstitial for flagged URLs
Redirect type: 301 with 24h max-age Cache-Control (not indefinite)

📋 Chapter 2 — Summary

Ask about read:write ratio — it determines caching strategy.
Custom aliases add uniqueness checking complexity.
Expiry/TTL changes storage planning and cache invalidation.
Deduplication (same URL → same short) forces lookup-on-write.
Analytics can be async — don't block the redirect path.
Security questions: anonymous vs authenticated creation, scanning requirements, creator management, geographic restrictions.
Degradation behavior: graceful (serve stale cache) vs strict (fail fast). Most URL shorteners choose graceful.

Chapter Three

Naive Design

Single Server + Auto-Increment IDs

The simplest design: a single web server with a relational database. Auto-increment ID, Base62 encode it, store the mapping. On redirect, look up the short code in the database and return a 302 (temporary) redirect — the default in most frameworks. This works perfectly at low scale — and breaks in predictable ways as traffic grows. (Notice the redirect type: we will revisit this choice in Chapter 4, where switching to 301 unlocks CDN caching and eliminates 70%+ of server load.)

Naive Design — Single Server + SQL Database

✅

What Works

Simple to implement and reason about
Auto-increment guarantees uniqueness
Transactional consistency (SQL)
Great for MVP / <1M URLs

💥

What Breaks

Single server = no fault tolerance
Every redirect = DB query (no caching)
Auto-increment = predictable IDs (security risk) — an attacker can increment the ID to enumerate every URL in the system, exposing private or unlisted links
Single DB = write bottleneck at 10K+ writes/sec
No geographic distribution = high latency globally
302 redirect forces every request back to the server — at 4,000 requests/second, no request is ever cached at the CDN or browser level. Switching to 301 in the refined design immediately offloads 70%+ of traffic to the CDN without any other changes. This single decision has the largest single-step impact on server load in the entire system.

📋 Chapter 3 — Summary

Naive: single server, PostgreSQL, auto-increment → Base62.
Works for MVP. Fails at scale on reads (no cache), writes (single DB), and availability (SPOF).
Predictable IDs are a security risk — users can enumerate URLs.
Every redirect hitting the database is the #1 problem to solve.

Chapter Four

Refined Design

A System That Actually Scales

The refined design addresses every failure mode: a distributed ID generator avoids single-point bottlenecks, a multi-layer cache handles the 100:1 read ratio, database sharding distributes writes, and CDN edge caching brings redirects close to users. The redirect path should almost never hit the database in steady state.

Refined Design — Distributed URL Shortener

📖

Read Path (Redirect)

Layer 1: CDN caches 301 redirects at edge (~70% hit rate)
Layer 2: Redis cache for hot URLs (~25% of remaining)
Layer 3: Database lookup (only ~5% of total traffic)
Result: 95%+ redirects never hit the database
Use 301 (permanent) for CDN cacheability

✍️

Write Path (Create)

Generate unique ID (Snowflake or pre-allocated ranges)
Base62 encode the ID → short code
Write to sharded database (shard by first char or hash)
Warm Redis cache with new mapping
Return short URL to client

Cache warming on write: when a new short URL is created, immediately write it to Redis with the TTL policy (hot URLs: no TTL, cold URLs: 24-hour TTL based on access pattern). The first redirect to a new URL would otherwise be a cold miss — hitting Redis (miss), then the database. Pre-warming eliminates this first-request latency spike, which matters for newly created URLs that are immediately shared (social media posts, email campaigns). Cache warming costs one additional Redis write per URL creation — negligible compared to the benefit.

301 vs 302 Redirect — Impact on Server Load

The key insight: 301 vs 302 redirect. A 301 (permanent) redirect lets browsers and CDNs cache the mapping indefinitely — reducing your server load by 70%+. A 302 (temporary) forces every request back to your server — useful only if you need to change the destination later or track every click. Most URL shorteners use 301 + async analytics.

The 301 cache invalidation problem: once a browser or CDN caches a 301 redirect, you cannot reliably update the destination. If a URL owner wants to change where their short URL points, or if a URL is flagged as malicious and needs to be deactivated, CDNs and browsers may continue serving the cached 301 for days or indefinitely. Mitigation: set a Cache-Control max-age on 301 responses (e.g., max-age=86400 for 24 hours) rather than allowing indefinite caching. This balances caching benefit with the ability to update or deactivate URLs within a reasonable timeframe. Twitter uses 302 specifically to retain this control — every click comes back to their servers.

Geographic database placement: CDN handles read latency globally (users get served from the nearest PoP). But on a cache miss, the request falls through to the origin database. If all database shards are in one region (us-east-1), a user in Tokyo who misses the CDN cache still experiences 150-200ms of round-trip latency to the database before receiving their redirect. At 5% cache miss rate with 4,000 reads/second, that is 200 requests/second hitting the origin from potentially far-away locations. Solutions: read replicas in multiple regions (replication lag is acceptable since URL mappings rarely change), or a globally distributed database (PlanetScale, CockroachDB). Most URL shorteners at moderate scale accept the origin latency for cache misses and invest in CDN cache hit rate rather than multi-region database replication.

📋 Chapter 4 — Summary

CDN edge caching (301) handles 70%+ of redirects without hitting origin.
Redis handles hot URLs for the remaining cache misses.
Only ~5% of reads reach the database — acceptable load.
Snowflake IDs or pre-allocated ranges avoid single-point ID generation bottleneck.
Database sharded by short code hash for write distribution.
Analytics decoupled via Kafka — never blocks the redirect path.
Cache warming on write: pre-populate Redis on creation to eliminate first-redirect cold miss.
301 invalidation problem: set max-age (e.g. 86400) not indefinite. Cannot reliably update cached 301s at CDN or browser level.

Chapter Five

Alternative Approaches

Two Valid Paths to Short IDs

The central design decision in a URL shortener is how to generate the short code. Two dominant approaches exist, each with distinct trade-offs. Neither is universally better — the choice depends on whether you need deduplication, predictability, and how you handle collisions.

Base62 Counter-Based

Hash-Based (MD5/SHA)

Generate unique counter → Base62 encode
Guaranteed unique — no collisions ever
Sequential → predictable (enumerate URLs)
Counter is a distributed systems challenge
No deduplication: same URL → different short codes
Used by: Bitly (with Snowflake IDs)

Hash the long URL → take first 7 chars of Base62(hash)
Same input → same output (natural deduplication)
Collision risk: must check DB + retry on conflict
No sequential pattern → unpredictable (secure)
Write path slower (hash + check + possible retry)
Used by: TinyURL (with collision handling)

ID Generation Strategies — Trade-off Comparison

Pre-Allocated Ranges

Random Generation + Bloom Filter

Central service allocates ID ranges to app servers
Each server generates IDs within its range locally
No coordination needed for individual writes
Range exhaustion: request new range (rare)
Gap in IDs if server crashes mid-range (acceptable)
Used by: Twitter Snowflake variant

Generate random 7-char string
Check Bloom filter for probable collision
If no collision → write to DB
If collision → regenerate (probabilistically rare)
Bloom filter: ~1.2GB for 6B entries at 0.1% FP rate
Formula: m = −(n·ln p) / (ln 2)², k = (m/n)·ln 2 → for n=6B, p=0.001: m ≈ 10B bits (~1.2 GB), k ≈ 10 hash functions
Tradeoff: space for speed, false positives = unnecessary retries

Counter-based is simpler and collision-free. Hash-based gives natural deduplication. In practice, most production URL shorteners use counter-based (Snowflake/range allocation) because collision handling adds write-path complexity that isn't worth it at scale — especially when deduplication isn't a hard requirement.

Recommendation for new systems: use Snowflake IDs (or a compatible distributed ID generator like Twitter Snowflake, Instagram's ID generator, or ULID) with Base62 encoding. Implement hash-based generation only if deduplication is a stated product requirement from the business — not as a default choice. Hash-based deduplication adds write-path complexity and collision retry logic that is not worth the cost unless users explicitly expect the same URL to always produce the same short code.

📋 Chapter 5 — Summary

Counter-based: guaranteed unique, simple, sequential (predictable). Use Snowflake for randomness.
Hash-based: natural dedup, collision risk on write, unpredictable.
Pre-allocated ranges: distributed counters without coordination. Best for high write throughput.
Production choice: counter-based (Snowflake) with Base62 encoding dominates.
Recommendation: Snowflake or ULID for new systems. Hash-based only if deduplication is a stated product requirement.

Chapter Six

What Real Companies Did

Production Decisions at Scale

Theory is good. Real-world decisions under real-world constraints are better. Here is how actual URL shortening services solved the key challenges — each making different trade-offs based on their specific requirements.

🔗

Bitly

Processes 10B+ clicks/month
Snowflake-like ID generation (counter-based)
Heavy CDN caching with 301 redirects
Real-time analytics pipeline (Kafka + Spark)
Multi-region deployment for low-latency redirects

🔗

TinyURL

Founded 2002, one of the first URL shorteners — billions of URLs stored over 22+ years of operation
Hash-based with collision handling
MySQL backend with multiple read replicas
Custom aliases supported (check uniqueness on write)
No expiry by default — URLs live forever
Exact traffic numbers not publicly disclosed but continues to operate at significant scale with a small engineering team — a testament to operational simplicity of a well-designed URL shortener

📸

Instagram (ID generation)

Custom Snowflake: 41-bit timestamp + 13-bit shard + 10-bit sequence
Generates ~1K unique IDs/ms per shard
No coordination between shards
Time-ordered: IDs are roughly sortable by creation time
Published approach as reference architecture
Was generating ~25M new IDs/day at publish time (2012), growing to billions/day by Facebook acquisition. Adopted by dozens of large-scale systems beyond URL shortening.

🐦

Twitter (t.co)

Auto-shortens all URLs in tweets
Snowflake IDs: 64-bit unique across all servers
Used for link wrapping (analytics + safety scanning)
Handles extreme burst traffic (viral tweets)
302 redirect (temporary) — forces every click back through Twitter's servers so they can log analytics, scan for malware, and update link destinations without cache invalidation

Google URL Shortener (goo.gl) was shut down in 2019 after being available since 2009. Google's public reason: the service was increasingly used for spam and phishing, and the cost of abuse prevention outweighed the benefits of operating a free public URL shortener. All existing goo.gl URLs continued to redirect after shutdown but no new URLs could be created. This is instructive: abuse prevention is not an afterthought — it is a core operational cost that can determine whether a service is viable to run at all.

Production URL Shorteners — Comparison

Service	ID Strategy	Redirect	Expiry	Special Pattern
Bitly	Snowflake (counter)	301	Optional (paid)	Real-time analytics, multi-region
TinyURL	Hash + collision	301	Never	Custom aliases, MySQL + replicas
Instagram	Custom Snowflake (41+13+10)	N/A	N/A	1K IDs/ms/shard, time-ordered
Twitter t.co	Snowflake	302	No expiry	302 for per-click tracking + malware scan
Google goo.gl	Not disclosed	Not disclosed	Redirects continue	Shutdown 2019: abuse cost > value

📋 Chapter 6 — Summary

Bitly: counter-based IDs, CDN-first, real-time analytics at 10B clicks/month.
TinyURL: hash-based, MySQL, URLs never expire.
Instagram: custom Snowflake (timestamp+shard+sequence) — no coordination.
Twitter: Snowflake IDs, 302 redirects for per-click tracking, burst-tolerant.
Google goo.gl shutdown (2019): abuse prevention cost exceeded service value. Abuse is a core operational cost, not an afterthought.

Chapter Seven

Best Practices Extracted

Transferable Lessons

The URL shortener is a simple system, but its patterns transfer to almost every read-heavy service you will build. These are not URL-shortener-specific lessons — they are architectural principles exposed clearly because the domain is simple enough to see them.

🆔

Distributed ID Generation

Never use auto-increment across shards
Snowflake pattern: time + machine + sequence
Pre-allocate ranges for zero-coordination writes
Transfers to: any system needing globally unique IDs

⚡

Cache-First Read Path

For read-heavy systems: cache IS the system
Multi-layer: CDN → app cache → DB
95%+ reads should never reach the database
Transfers to: any 100:1 read:write service

📊

Async Analytics

Never block user-facing path for analytics
Fire event → process async (Kafka/SQS)
Accept eventual consistency for analytics data
Transfers to: any system with click/view tracking

🛡️

Abuse Prevention

Scan destination URLs against phishing/malware blocklists on creation
Rate-limit creation per IP/user to prevent spam campaigns
Add interstitial warning page for flagged URLs
URL scanning implementation: integrate with Google Safe Browsing API or VirusTotal API at creation time. The scan adds 100-500ms to the write path — acceptable since URL creation is not latency sensitive
For URLs that cannot be scanned synchronously (API rate limits), accept the URL but flag it for async scanning and show a warning interstitial on first redirect until scanning completes
Blocklist maintenance: maintain an internal blocklist of domains used in past abuse campaigns. Share threat intelligence with other URL shorteners where possible
Creator accountability: authenticated URL creation enables banning abusive accounts, retroactively invalidating all URLs created by a banned account, and providing abuse reporting tied to creator identity
Transfers to: any user-generated-content system accepting external links

⏰

URL Expiry & Storage Management

Not all URLs should live forever — offer tiered lifetimes
Anonymous URLs: expire after 30 days
Registered users: 1-year URLs
Paid plans: permanent URLs
Implement soft deletion — mark as expired, keep record 90 days post-expiry for recovery
Serve a clean "this URL has expired" page rather than a 404
Background jobs handle actual deletion to prevent unbounded DB growth
Transfers to: any system with user-generated content and storage cost management

The URL shortener teaches: separate the hot path from everything else. The redirect (hot path) must be fast — CDN, cache, done. Analytics, abuse detection, expiry — all happen asynchronously. This pattern applies to every latency-sensitive system: payment confirmation pages, API responses, search results.

Hot Path vs Cold Path — Isolation Pattern

📋 Chapter 7 — Summary

ID generation: Snowflake or range-based. Never auto-increment across distributed nodes.
Cache-first: for read-heavy systems, the cache layer IS your primary serving infrastructure.
Async everything non-critical: analytics, abuse scanning, notifications — off the hot path.
301 vs 302: infrastructure-level caching decision with massive cost implications.
URL expiry tiers: anonymous 30 days, registered 1 year, paid permanent. Soft deletion preserves records 90 days post-expiry.
Abuse prevention: sync scan at creation (Google Safe Browsing), blocklist maintenance, creator accountability via authentication.

Chapter Eight

What Could Go Wrong

Common Failure Patterns

Every system has failure modes. The ones that catch teams off guard are not the obvious ones (server crash) but the subtle ones (cache stampede after a viral link, hash collisions under load, analytics pipeline backpressure affecting the write path). These are the mistakes real teams have made — learn from them without repeating them.

💥

Cache Stampede

Viral link expires from cache
1000s of simultaneous requests all hit DB
DB overwhelmed → cascade failure
Fix: lock on cache miss (only 1 request refills), stale-while-revalidate, no TTL for popular URLs

🔄

Redirect Loops

Short URL A → URL B → short URL A (circular)
Browser loops infinitely, bad user experience
Loops can form later: B is shortened after A already points to it, creating a cycle that didn't exist at A's creation time
Fix: reject destination URLs on your own domain, follow redirects at creation time (max 5 hops), and periodically scan for newly formed cycles
Periodic scan: a background job runs daily and checks a sample of URLs against the current state of all short URLs to catch newly formed cycles. Flag detected cycles for manual review rather than automatically deactivating — a false positive deactivation of a legitimate URL is a worse outcome than a brief redirect loop for a few users

⚠️

Hash Collision Under Load

High write volume + hash-based IDs → collision rate spikes
Retry storm: collision → retry → more collisions
Write latency degrades exponentially
Fix: use counter-based IDs instead, or append timestamp to hash input

📉

Analytics Backpressure

Analytics queue fills up (Kafka lag)
If tightly coupled: write path blocks waiting for queue
Redirect latency spikes from unrelated analytics issue
Fix: fire-and-forget to queue (never block), overflow to dead letter queue

🔥

DB Shard Hotspot

One viral short URL concentrates all reads on a single shard
Shard-by-key means popular URLs don't spread across nodes
Shard overwhelmed while others sit idle
Fix: cache-first architecture absorbs hot-key traffic before it reaches DB; replicate hot shards read-only; use consistent hashing with virtual nodes for better distribution

🕐

Clock Skew in Snowflake IDs

Snowflake IDs embed a timestamp — depends on accurate clocks
If a server's clock moves backward (NTP correction, VM migration, hardware issue), the ID generator may produce IDs with a timestamp earlier than recently-generated IDs
Breaks monotonic ordering guarantee — could generate duplicate ID if same timestamp + machine + sequence combination is used twice
Fix: ID generator libraries detect clock drift and either wait until the clock catches up, throw an exception, or use the last known timestamp plus sequence increment rather than the current (backward) clock
Ensure NTP is configured and monitored on all application servers. Alert on clock drift exceeding 100ms

☠️

Cache Poisoning

If an attacker or malfunctioning client can trigger a cache write with incorrect data, every user who hits the cached entry receives the wrong redirect
Users are redirected to an attacker-controlled URL even though the database contains the correct destination
Fix: application servers never accept external input that directly populates the cache. The cache is only populated from database reads or from the write path under the application's control
Validate cache entries against the database on suspicious patterns (multiple cache misses followed by a sudden cache hit for a URL that had no recent write)

The most dangerous failures are cross-concern contamination. Analytics should never affect redirects. Cache expiry should never cause DB overload. Abuse detection should never slow down legitimate users. Isolate concerns — bulkhead pattern at system level.

📋 Chapter 8 — Summary

Cache stampede: viral link + cache expiry = DB flood. Fix: locking, stale-while-revalidate.
Redirect loops: circular references. Fix: follow chain at creation, max-hop limit.
Collision storms: hash-based under high load. Fix: counter-based IDs or append nonce.
Analytics backpressure: async queue overflow blocking writes. Fix: fire-and-forget, dead letter.
DB shard hotspot: viral URL overwhelms one shard. Fix: cache absorbs hot keys, replicate hot shards, consistent hashing with vnodes.
Clock skew in Snowflake: backward NTP correction can produce duplicate IDs. Monitor clock drift, alert above 100ms.
Cache poisoning: only populate cache from controlled write path, never from external input. Validate on suspicious patterns.
Principle: isolate concerns — never let non-critical paths degrade the critical path.

← Case Studies Rate Limiter →