Case Study: URL Shortener
Design, trade-offs, and alternatives for a URL shortener at scale.
Problem Statement
A URL shortener takes a long URL and produces a short alias (e.g., short.ly/abc123) that redirects to the original. Sounds trivial โ until you need to handle billions of URLs, serve redirects in under 10ms, never return a broken link, and provide analytics on every click. The simplicity of the interface hides meaningful distributed systems challenges: unique ID generation at scale, read-heavy caching, and geographic latency optimization.
Traffic
- 100M URLs created/month (~40 writes/sec)
- 10B redirects/month (~4,000 reads/sec)
- Read:Write ratio = 100:1
- Peak: 10x average โ 40,000 reads/sec
Requirements
- Latency SLA: redirect <10ms (p99)
- Availability: 99.99% (52 min downtime/year)
- Storage: 5 years retention โ 6B URLs
- Short URL length: 7 characters (Base62 = 3.5T combinations)
Authentication requirement: is URL creation anonymous or does it require a registered account? Anonymous creation is simpler but exposes the system to spam and abuse at scale. Authenticated creation enables per-user analytics, rate limiting by user identity, and custom alias namespace management. This choice affects whether the system needs an auth service, a user database, and session management โ significant additional infrastructure.
URL Records
- 100M new URLs/month ร 12 months ร 5 years = 6B URLs
- Per URL: short code (7 bytes) + long URL (avg 100 bytes) + timestamp (8 bytes) + metadata (user_id, expiry, click count: ~50 bytes) = ~165 bytes
- Total: 6B ร 165 bytes = ~1TB for URL records
Analytics Events
- 10B clicks/month ร 60 months = 600B click events
- Per click event: ~50 bytes (timestamp, geo, referrer, device)
- Total: 600B ร 50 bytes = ~30TB for analytics
- Requires separate store (ClickHouse, Redshift) โ not the operational database
This is a read-dominated system. 100:1 read-to-write ratio means caching strategy is the most important design decision. If you optimize writes but ignore reads, you have designed for 1% of the traffic.
- URL shortener: simple interface, real distributed systems challenges at scale.
- 100:1 read:write ratio โ caching dominates the design.
- 6B URLs over 5 years. 7-char Base62 gives 3.5T combinations โ no collision pressure.
- Redirect latency target: <10ms p99. Availability: 99.99%.
- Storage: 6B URLs ร 165 bytes = ~1TB for URL records. Analytics events: separate store needed โ 30TB over 5 years.
Questions to Ask
A senior engineer does not jump to architecture diagrams. They ask questions that reveal hidden requirements, constraints, and trade-offs. Each question below changes the design in meaningful ways โ get them wrong, and you build the wrong system.
Traffic Pattern
- What is the read:write ratio?
- Are there traffic spikes (viral links)?
- Geographic distribution of users?
- Is there a burst pattern (marketing campaigns)?
URL Behavior
- Custom aliases allowed? (branding)
- Do URLs expire? (TTL or forever?)
- Same long URL โ same short URL? (deduplication)
- Maximum URL length accepted?
Analytics & Features
- Click analytics required? (who, when, where)
- Real-time or batch analytics?
- Rate limiting per user?
- Abuse detection (phishing URLs)?
Security & Access Control
- Anonymous creation or registered users only?
- What is the maximum creation rate per IP or user?
- Is URL scanning (phishing, malware) required at creation time?
- Should creators be able to delete or update their URLs?
- Is there a dashboard for creators to see their URL analytics?
- Are there geographic restrictions on redirect destinations?
The deduplication question changes the entire ID generation strategy. If the same long URL always maps to the same short URL, you need a lookup on write (expensive at scale). If each creation generates a new short URL regardless, write path is much simpler โ just generate a unique ID.
SLA degradation behavior: what is the acceptable behavior when the system is degraded? Should it serve stale cached redirects rather than returning errors, or fail fast with 503? This determines whether the system should be designed for graceful degradation (serve from cache even if the database is down) or strict consistency (refuse to redirect if mapping cannot be verified). Most URL shorteners choose graceful degradation โ a stale redirect is better than no redirect.
For This Case Study, Our Answers Are:
- Read:write ratio: 100:1 (read-dominated โ caching is the primary design concern)
- Custom aliases: yes โ must check uniqueness on write
- URL expiry: tiered โ anonymous: 30 days, registered: 1 year, paid: permanent
- Deduplication: no โ each creation generates a new short URL (simplifies write path)
- Analytics: yes โ async, eventual consistency acceptable, separate store
- Geographic distribution: global users โ CDN required, DB origin in one region initially
- Degradation behavior: graceful โ serve stale cache rather than error
- Creation rate limit: yes โ 10 URLs/minute per IP (anonymous), higher for authenticated
- Abuse scanning: yes โ async scan at creation, interstitial for flagged URLs
- Redirect type: 301 with 24h max-age Cache-Control (not indefinite)
- Ask about read:write ratio โ it determines caching strategy.
- Custom aliases add uniqueness checking complexity.
- Expiry/TTL changes storage planning and cache invalidation.
- Deduplication (same URL โ same short) forces lookup-on-write.
- Analytics can be async โ don't block the redirect path.
- Security questions: anonymous vs authenticated creation, scanning requirements, creator management, geographic restrictions.
- Degradation behavior: graceful (serve stale cache) vs strict (fail fast). Most URL shorteners choose graceful.
Naive Design
The simplest design: a single web server with a relational database. Auto-increment ID, Base62 encode it, store the mapping. On redirect, look up the short code in the database and return a 302 (temporary) redirect โ the default in most frameworks. This works perfectly at low scale โ and breaks in predictable ways as traffic grows. (Notice the redirect type: we will revisit this choice in Chapter 4, where switching to 301 unlocks CDN caching and eliminates 70%+ of server load.)
What Works
- Simple to implement and reason about
- Auto-increment guarantees uniqueness
- Transactional consistency (SQL)
- Great for MVP / <1M URLs
What Breaks
- Single server = no fault tolerance
- Every redirect = DB query (no caching)
- Auto-increment = predictable IDs (security risk) โ an attacker can increment the ID to enumerate every URL in the system, exposing private or unlisted links
- Single DB = write bottleneck at 10K+ writes/sec
- No geographic distribution = high latency globally
- 302 redirect forces every request back to the server โ at 4,000 requests/second, no request is ever cached at the CDN or browser level. Switching to 301 in the refined design immediately offloads 70%+ of traffic to the CDN without any other changes. This single decision has the largest single-step impact on server load in the entire system.
- Naive: single server, PostgreSQL, auto-increment โ Base62.
- Works for MVP. Fails at scale on reads (no cache), writes (single DB), and availability (SPOF).
- Predictable IDs are a security risk โ users can enumerate URLs.
- Every redirect hitting the database is the #1 problem to solve.
Refined Design
The refined design addresses every failure mode: a distributed ID generator avoids single-point bottlenecks, a multi-layer cache handles the 100:1 read ratio, database sharding distributes writes, and CDN edge caching brings redirects close to users. The redirect path should almost never hit the database in steady state.
Read Path (Redirect)
- Layer 1: CDN caches 301 redirects at edge (~70% hit rate)
- Layer 2: Redis cache for hot URLs (~25% of remaining)
- Layer 3: Database lookup (only ~5% of total traffic)
- Result: 95%+ redirects never hit the database
- Use 301 (permanent) for CDN cacheability
Write Path (Create)
- Generate unique ID (Snowflake or pre-allocated ranges)
- Base62 encode the ID โ short code
- Write to sharded database (shard by first char or hash)
- Warm Redis cache with new mapping
- Return short URL to client
Cache warming on write: when a new short URL is created, immediately write it to Redis with the TTL policy (hot URLs: no TTL, cold URLs: 24-hour TTL based on access pattern). The first redirect to a new URL would otherwise be a cold miss โ hitting Redis (miss), then the database. Pre-warming eliminates this first-request latency spike, which matters for newly created URLs that are immediately shared (social media posts, email campaigns). Cache warming costs one additional Redis write per URL creation โ negligible compared to the benefit.
The key insight: 301 vs 302 redirect. A 301 (permanent) redirect lets browsers and CDNs cache the mapping indefinitely โ reducing your server load by 70%+. A 302 (temporary) forces every request back to your server โ useful only if you need to change the destination later or track every click. Most URL shorteners use 301 + async analytics.
The 301 cache invalidation problem: once a browser or CDN caches a 301 redirect, you cannot reliably update the destination. If a URL owner wants to change where their short URL points, or if a URL is flagged as malicious and needs to be deactivated, CDNs and browsers may continue serving the cached 301 for days or indefinitely. Mitigation: set a Cache-Control max-age on 301 responses (e.g., max-age=86400 for 24 hours) rather than allowing indefinite caching. This balances caching benefit with the ability to update or deactivate URLs within a reasonable timeframe. Twitter uses 302 specifically to retain this control โ every click comes back to their servers.
Geographic database placement: CDN handles read latency globally (users get served from the nearest PoP). But on a cache miss, the request falls through to the origin database. If all database shards are in one region (us-east-1), a user in Tokyo who misses the CDN cache still experiences 150-200ms of round-trip latency to the database before receiving their redirect. At 5% cache miss rate with 4,000 reads/second, that is 200 requests/second hitting the origin from potentially far-away locations. Solutions: read replicas in multiple regions (replication lag is acceptable since URL mappings rarely change), or a globally distributed database (PlanetScale, CockroachDB). Most URL shorteners at moderate scale accept the origin latency for cache misses and invest in CDN cache hit rate rather than multi-region database replication.
- CDN edge caching (301) handles 70%+ of redirects without hitting origin.
- Redis handles hot URLs for the remaining cache misses.
- Only ~5% of reads reach the database โ acceptable load.
- Snowflake IDs or pre-allocated ranges avoid single-point ID generation bottleneck.
- Database sharded by short code hash for write distribution.
- Analytics decoupled via Kafka โ never blocks the redirect path.
- Cache warming on write: pre-populate Redis on creation to eliminate first-redirect cold miss.
- 301 invalidation problem: set max-age (e.g. 86400) not indefinite. Cannot reliably update cached 301s at CDN or browser level.
Alternative Approaches
The central design decision in a URL shortener is how to generate the short code. Two dominant approaches exist, each with distinct trade-offs. Neither is universally better โ the choice depends on whether you need deduplication, predictability, and how you handle collisions.
- Generate unique counter โ Base62 encode
- Guaranteed unique โ no collisions ever
- Sequential โ predictable (enumerate URLs)
- Counter is a distributed systems challenge
- No deduplication: same URL โ different short codes
- Used by: Bitly (with Snowflake IDs)
- Hash the long URL โ take first 7 chars of Base62(hash)
- Same input โ same output (natural deduplication)
- Collision risk: must check DB + retry on conflict
- No sequential pattern โ unpredictable (secure)
- Write path slower (hash + check + possible retry)
- Used by: TinyURL (with collision handling)
- Central service allocates ID ranges to app servers
- Each server generates IDs within its range locally
- No coordination needed for individual writes
- Range exhaustion: request new range (rare)
- Gap in IDs if server crashes mid-range (acceptable)
- Used by: Twitter Snowflake variant
- Generate random 7-char string
- Check Bloom filter for probable collision
- If no collision โ write to DB
- If collision โ regenerate (probabilistically rare)
- Bloom filter: ~1.2GB for 6B entries at 0.1% FP rate
- Formula: m = โ(nยทln p) / (ln 2)ยฒ, k = (m/n)ยทln 2 โ for n=6B, p=0.001: m โ 10B bits (~1.2 GB), k โ 10 hash functions
- Tradeoff: space for speed, false positives = unnecessary retries
Counter-based is simpler and collision-free. Hash-based gives natural deduplication. In practice, most production URL shorteners use counter-based (Snowflake/range allocation) because collision handling adds write-path complexity that isn't worth it at scale โ especially when deduplication isn't a hard requirement.
Recommendation for new systems: use Snowflake IDs (or a compatible distributed ID generator like Twitter Snowflake, Instagram's ID generator, or ULID) with Base62 encoding. Implement hash-based generation only if deduplication is a stated product requirement from the business โ not as a default choice. Hash-based deduplication adds write-path complexity and collision retry logic that is not worth the cost unless users explicitly expect the same URL to always produce the same short code.
- Counter-based: guaranteed unique, simple, sequential (predictable). Use Snowflake for randomness.
- Hash-based: natural dedup, collision risk on write, unpredictable.
- Pre-allocated ranges: distributed counters without coordination. Best for high write throughput.
- Production choice: counter-based (Snowflake) with Base62 encoding dominates.
- Recommendation: Snowflake or ULID for new systems. Hash-based only if deduplication is a stated product requirement.
What Real Companies Did
Theory is good. Real-world decisions under real-world constraints are better. Here is how actual URL shortening services solved the key challenges โ each making different trade-offs based on their specific requirements.
Bitly
- Processes 10B+ clicks/month
- Snowflake-like ID generation (counter-based)
- Heavy CDN caching with 301 redirects
- Real-time analytics pipeline (Kafka + Spark)
- Multi-region deployment for low-latency redirects
TinyURL
- Founded 2002, one of the first URL shorteners โ billions of URLs stored over 22+ years of operation
- Hash-based with collision handling
- MySQL backend with multiple read replicas
- Custom aliases supported (check uniqueness on write)
- No expiry by default โ URLs live forever
- Exact traffic numbers not publicly disclosed but continues to operate at significant scale with a small engineering team โ a testament to operational simplicity of a well-designed URL shortener
Instagram (ID generation)
- Custom Snowflake: 41-bit timestamp + 13-bit shard + 10-bit sequence
- Generates ~1K unique IDs/ms per shard
- No coordination between shards
- Time-ordered: IDs are roughly sortable by creation time
- Published approach as reference architecture
- Was generating ~25M new IDs/day at publish time (2012), growing to billions/day by Facebook acquisition. Adopted by dozens of large-scale systems beyond URL shortening.
Twitter (t.co)
- Auto-shortens all URLs in tweets
- Snowflake IDs: 64-bit unique across all servers
- Used for link wrapping (analytics + safety scanning)
- Handles extreme burst traffic (viral tweets)
- 302 redirect (temporary) โ forces every click back through Twitter's servers so they can log analytics, scan for malware, and update link destinations without cache invalidation
Google URL Shortener (goo.gl) was shut down in 2019 after being available since 2009. Google's public reason: the service was increasingly used for spam and phishing, and the cost of abuse prevention outweighed the benefits of operating a free public URL shortener. All existing goo.gl URLs continued to redirect after shutdown but no new URLs could be created. This is instructive: abuse prevention is not an afterthought โ it is a core operational cost that can determine whether a service is viable to run at all.
| Service | ID Strategy | Redirect | Expiry | Special Pattern |
|---|---|---|---|---|
| Bitly | Snowflake (counter) | 301 | Optional (paid) | Real-time analytics, multi-region |
| TinyURL | Hash + collision | 301 | Never | Custom aliases, MySQL + replicas |
| Custom Snowflake (41+13+10) | N/A | N/A | 1K IDs/ms/shard, time-ordered | |
| Twitter t.co | Snowflake | 302 | No expiry | 302 for per-click tracking + malware scan |
| Google goo.gl | Not disclosed | Not disclosed | Redirects continue | Shutdown 2019: abuse cost > value |
- Bitly: counter-based IDs, CDN-first, real-time analytics at 10B clicks/month.
- TinyURL: hash-based, MySQL, URLs never expire.
- Instagram: custom Snowflake (timestamp+shard+sequence) โ no coordination.
- Twitter: Snowflake IDs, 302 redirects for per-click tracking, burst-tolerant.
- Google goo.gl shutdown (2019): abuse prevention cost exceeded service value. Abuse is a core operational cost, not an afterthought.
Best Practices Extracted
The URL shortener is a simple system, but its patterns transfer to almost every read-heavy service you will build. These are not URL-shortener-specific lessons โ they are architectural principles exposed clearly because the domain is simple enough to see them.
Distributed ID Generation
- Never use auto-increment across shards
- Snowflake pattern: time + machine + sequence
- Pre-allocate ranges for zero-coordination writes
- Transfers to: any system needing globally unique IDs
Cache-First Read Path
- For read-heavy systems: cache IS the system
- Multi-layer: CDN โ app cache โ DB
- 95%+ reads should never reach the database
- Transfers to: any 100:1 read:write service
Async Analytics
- Never block user-facing path for analytics
- Fire event โ process async (Kafka/SQS)
- Accept eventual consistency for analytics data
- Transfers to: any system with click/view tracking
Abuse Prevention
- Scan destination URLs against phishing/malware blocklists on creation
- Rate-limit creation per IP/user to prevent spam campaigns
- Add interstitial warning page for flagged URLs
- URL scanning implementation: integrate with Google Safe Browsing API or VirusTotal API at creation time. The scan adds 100-500ms to the write path โ acceptable since URL creation is not latency sensitive
- For URLs that cannot be scanned synchronously (API rate limits), accept the URL but flag it for async scanning and show a warning interstitial on first redirect until scanning completes
- Blocklist maintenance: maintain an internal blocklist of domains used in past abuse campaigns. Share threat intelligence with other URL shorteners where possible
- Creator accountability: authenticated URL creation enables banning abusive accounts, retroactively invalidating all URLs created by a banned account, and providing abuse reporting tied to creator identity
- Transfers to: any user-generated-content system accepting external links
URL Expiry & Storage Management
- Not all URLs should live forever โ offer tiered lifetimes
- Anonymous URLs: expire after 30 days
- Registered users: 1-year URLs
- Paid plans: permanent URLs
- Implement soft deletion โ mark as expired, keep record 90 days post-expiry for recovery
- Serve a clean "this URL has expired" page rather than a 404
- Background jobs handle actual deletion to prevent unbounded DB growth
- Transfers to: any system with user-generated content and storage cost management
The URL shortener teaches: separate the hot path from everything else. The redirect (hot path) must be fast โ CDN, cache, done. Analytics, abuse detection, expiry โ all happen asynchronously. This pattern applies to every latency-sensitive system: payment confirmation pages, API responses, search results.
- ID generation: Snowflake or range-based. Never auto-increment across distributed nodes.
- Cache-first: for read-heavy systems, the cache layer IS your primary serving infrastructure.
- Async everything non-critical: analytics, abuse scanning, notifications โ off the hot path.
- 301 vs 302: infrastructure-level caching decision with massive cost implications.
- URL expiry tiers: anonymous 30 days, registered 1 year, paid permanent. Soft deletion preserves records 90 days post-expiry.
- Abuse prevention: sync scan at creation (Google Safe Browsing), blocklist maintenance, creator accountability via authentication.
What Could Go Wrong
Every system has failure modes. The ones that catch teams off guard are not the obvious ones (server crash) but the subtle ones (cache stampede after a viral link, hash collisions under load, analytics pipeline backpressure affecting the write path). These are the mistakes real teams have made โ learn from them without repeating them.
Cache Stampede
- Viral link expires from cache
- 1000s of simultaneous requests all hit DB
- DB overwhelmed โ cascade failure
- Fix: lock on cache miss (only 1 request refills), stale-while-revalidate, no TTL for popular URLs
Redirect Loops
- Short URL A โ URL B โ short URL A (circular)
- Browser loops infinitely, bad user experience
- Loops can form later: B is shortened after A already points to it, creating a cycle that didn't exist at A's creation time
- Fix: reject destination URLs on your own domain, follow redirects at creation time (max 5 hops), and periodically scan for newly formed cycles
- Periodic scan: a background job runs daily and checks a sample of URLs against the current state of all short URLs to catch newly formed cycles. Flag detected cycles for manual review rather than automatically deactivating โ a false positive deactivation of a legitimate URL is a worse outcome than a brief redirect loop for a few users
Hash Collision Under Load
- High write volume + hash-based IDs โ collision rate spikes
- Retry storm: collision โ retry โ more collisions
- Write latency degrades exponentially
- Fix: use counter-based IDs instead, or append timestamp to hash input
Analytics Backpressure
- Analytics queue fills up (Kafka lag)
- If tightly coupled: write path blocks waiting for queue
- Redirect latency spikes from unrelated analytics issue
- Fix: fire-and-forget to queue (never block), overflow to dead letter queue
DB Shard Hotspot
- One viral short URL concentrates all reads on a single shard
- Shard-by-key means popular URLs don't spread across nodes
- Shard overwhelmed while others sit idle
- Fix: cache-first architecture absorbs hot-key traffic before it reaches DB; replicate hot shards read-only; use consistent hashing with virtual nodes for better distribution
Clock Skew in Snowflake IDs
- Snowflake IDs embed a timestamp โ depends on accurate clocks
- If a server's clock moves backward (NTP correction, VM migration, hardware issue), the ID generator may produce IDs with a timestamp earlier than recently-generated IDs
- Breaks monotonic ordering guarantee โ could generate duplicate ID if same timestamp + machine + sequence combination is used twice
- Fix: ID generator libraries detect clock drift and either wait until the clock catches up, throw an exception, or use the last known timestamp plus sequence increment rather than the current (backward) clock
- Ensure NTP is configured and monitored on all application servers. Alert on clock drift exceeding 100ms
Cache Poisoning
- If an attacker or malfunctioning client can trigger a cache write with incorrect data, every user who hits the cached entry receives the wrong redirect
- Users are redirected to an attacker-controlled URL even though the database contains the correct destination
- Fix: application servers never accept external input that directly populates the cache. The cache is only populated from database reads or from the write path under the application's control
- Validate cache entries against the database on suspicious patterns (multiple cache misses followed by a sudden cache hit for a URL that had no recent write)
The most dangerous failures are cross-concern contamination. Analytics should never affect redirects. Cache expiry should never cause DB overload. Abuse detection should never slow down legitimate users. Isolate concerns โ bulkhead pattern at system level.
- Cache stampede: viral link + cache expiry = DB flood. Fix: locking, stale-while-revalidate.
- Redirect loops: circular references. Fix: follow chain at creation, max-hop limit.
- Collision storms: hash-based under high load. Fix: counter-based IDs or append nonce.
- Analytics backpressure: async queue overflow blocking writes. Fix: fire-and-forget, dead letter.
- DB shard hotspot: viral URL overwhelms one shard. Fix: cache absorbs hot keys, replicate hot shards, consistent hashing with vnodes.
- Clock skew in Snowflake: backward NTP correction can produce duplicate IDs. Monitor clock drift, alert above 100ms.
- Cache poisoning: only populate cache from controlled write path, never from external input. Validate on suspicious patterns.
- Principle: isolate concerns โ never let non-critical paths degrade the critical path.