Case Study: Ride Sharing
Design, trade-offs, and alternatives for a ride-sharing platform at scale.
Problem Statement
A ride-sharing platform matches riders requesting trips with nearby drivers in real-time. The core challenge is a spatial-temporal matching problem: given a rider at location (lat, lng) requesting a ride now, find the best available driver within a few kilometers β considering driver location, direction of travel, ETA, and driver rating β all within 3-5 seconds. Meanwhile, drivers' locations update every 3-5 seconds, creating a constantly moving dataset of millions of points that must be queried geospatially at thousands of times per second.
Traffic & Scale
- 100M monthly active riders
- 5M active drivers (concurrent online at peak)
- 20M rides/day (~230 matches/sec)
- Location updates: 5M drivers Γ every 4s = 1.25M updates/sec
Requirements
- Match latency: driver found in <5 seconds
- ETA accuracy: Β±2 minutes at time of match
- Location freshness: driver position <5 seconds stale
- Availability: 99.99% (rides are time-critical)
Ride-sharing is a real-time geospatial matching system with strict time constraints. Unlike a restaurant search (where results can be cached), driver positions are moving constantly. The index is never stable. And unlike a chat system (where latency of 200ms vs 500ms is acceptable), a 30-second delay in matching means the rider leaves and opens a competitor's app. Speed of matching directly equals revenue.
- 5M concurrent drivers sending location updates every 4 seconds = 1.25M updates/sec.
- Match a rider to nearest driver in <5 seconds with accurate ETA.
- Geospatial index is constantly moving β never cacheable.
- 20M rides/day. Matching speed = revenue. Downtime = riders switch to competitor.
Questions to Ask
A basic taxi dispatch system and a full Uber-like platform differ enormously. The questions below determine whether you need surge pricing, ride pooling (shared rides), scheduled rides, multi-modal transport, and how sophisticated the matching algorithm needs to be.
Matching Model
- Single rider β single driver? Or ride pooling?
- Match by nearest or by optimal ETA?
- Driver can reject? Or auto-assigned?
- Scheduled rides (future) or real-time only?
Pricing & Economics
- Fixed pricing or dynamic (surge)?
- Upfront fare estimate before ride?
- Multiple vehicle types (economy, premium, XL)?
- Tips, cancellation fees, toll handling?
Geography & Routing
- Urban only or suburban/rural too?
- ETA based on straight-line or real road routing?
- Traffic-aware ETA?
- Multi-city / global deployment?
Ride pooling changes the matching problem from O(N) to NP-hard. Single matching is: find nearest available driver. Pooled matching is: given multiple riders going in similar directions, find a driver whose route can serve all of them with acceptable detour. This is a constraint optimization problem that requires fundamentally different algorithms (not just nearest-neighbor).
For This Case Study, Our Answers Are:
- Ride type: single rider β single driver (no pooling β keeps matching O(N))
- Match strategy: nearest driver by ETA (not pure distance)
- Driver rejection: yes β driver can reject, 15s timeout, cascade to next
- Scheduled rides: no β real-time only
- Pricing: dynamic surge pricing per geographic zone
- Fare estimate: yes β shown upfront before rider confirms
- Vehicle types: 3 tiers (economy, comfort, premium) β separate matching pools
- ETA: road-routing based (not straight-line), traffic-aware
- Geographic scope: multi-city global deployment β partitioned by city
- Location freshness SLA: driver position must be <5 seconds stale at time of match
- Single vs pooled rides: pooling is exponentially harder to match optimally.
- Surge pricing requires real-time supply/demand tracking per geographic zone.
- ETA accuracy depends on road routing + traffic data β not just distance.
- Driver rejection adds timeout handling and fallback to next-best driver.
Naive Design
The simplest design: store all driver locations in a PostgreSQL table with PostGIS. When a rider requests a ride, run a spatial query: SELECT * FROM drivers WHERE status='available' ORDER BY ST_Distance(location, rider_point) LIMIT 10. This scans the table, computes distances, sorts, and returns nearest drivers. Works for 100 drivers. With 5M active drivers updating every 4 seconds, the table is being written 1.25M times/sec while being queried for nearest neighbors β the database cannot handle both simultaneously.
What Works
- Simple β one table, one query, standard SQL
- PostGIS spatial indexes handle small datasets well
- ACID guarantees on driver status
- Fine for <10K drivers in a single city
What Breaks
- 1.25M location writes/sec β write throughput exceeded
- Spatial index constantly rebuilt as drivers move β index thrashing
- 230 match queries/sec each scanning millions of rows
- Single database = SPOF + no geographic distribution
- ETA not computable from SQL (needs routing engine)
- SQL + PostGIS: works for small scale. Fails at 1.25M writes/sec + spatial queries.
- Index thrashing: spatial index rebuilt constantly as every driver moves.
- No separation between hot path (matching) and warm path (location ingestion).
Refined Design
The refined design separates concerns: a high-throughput location service ingests 1.25M updates/sec into an in-memory geospatial index (geohash-based grid or S2 cells). A separate matching service queries this index to find nearby candidates, then scores them by ETA (via routing service). A dispatch service sends the match offer to the selected driver and handles accept/reject/timeout. The ride lifecycle (request β match β pickup β trip β dropoff) is managed as a state machine.
Location Ingestion
- Drivers send GPS every 3-5 seconds via WebSocket/MQTT
- Location service writes to in-memory geospatial index
- Index partitioned by geographic region (city/zone)
- S2 cells or geohash for spatial bucketing
- Old locations expire (TTL 30s) β no explicit delete needed
Matching Flow
- 1. Rider requests ride β matching service receives (lat, lng, type)
- 2. Query geo index: all available drivers within 3km radius
- 3. For top-10 candidates: compute ETA via routing service
- 4. Score: ETA Γ 0.7 + rating Γ 0.2 + direction Γ 0.1
- 5. Dispatch to best driver. 15s timeout β try next.
The geospatial index must be in-memory and partitioned by city/region. You cannot query a global index for "nearest drivers in San Francisco" when the index also contains drivers in Tokyo. Partition by city. Within a city, use S2 cells (Google's spherical geometry library) or geohash prefixes to bucket drivers into geographic tiles. Querying "nearby" = check the rider's cell + adjacent cells. This is O(1) per query regardless of total driver count.
- Location service: 1.25M updates/sec into in-memory geospatial index (S2 cells).
- Matching: query nearby cells β get candidates β score by ETA + rating β dispatch.
- Partitioned by city/region: each city is an independent matching domain.
- Dispatch with timeout: 15s to accept, then cascade to next driver.
- Match latency target: <3 seconds from request to driver notification.
Alternative Approaches
- Match each rider to their nearest available driver immediately
- Simple, fast β <1 second per match
- Locally optimal: best for THIS rider right now
- Globally suboptimal: may leave another rider with no drivers nearby
- Good for: low demand, or when supply >> demand
- Used by: early Uber, lower-traffic markets
- Collect ride requests for 2-5 seconds, batch match all at once
- Optimization: minimize total wait time across ALL riders in batch
- Globally optimal: better aggregate experience
- Higher latency: rider waits 2-5 seconds before matching starts
- Complex: Hungarian algorithm or linear programming
- Used by: Uber (high-demand), Lyft (peak hours)
- Encode lat/lng into string prefix. Nearby = shared prefix.
- Simple to implement β Redis sorted sets by geohash
- Non-uniform cell sizes near poles (distortion)
- Edge cases at cell boundaries (neighbors need special handling)
- Good for: most applications, simple infrastructure
- Used by: MongoDB geospatial, Elasticsearch
- Spherical geometry β divides earth into hierarchical cells
- Uniform coverage: no distortion at poles or equator
- Precise containment and intersection queries
- More complex to implement β requires S2 library
- Good for: precision-critical, global-scale geospatial
- Used by: Google Maps, Uber H3 (Hexagonal variant)
Uber's switch from geohash to H3 hexagonal cells solved a structural problem. Rectangular geohash cells have irregular adjacency β a cell at the edge of "9q8y" is much closer to the center of "9q8z" than to the center of its own cell. Hexagonal grids (H3) have a key property: every cell's center is equidistant from all 6 neighboring cell centers. This eliminates boundary artifacts and makes surge pricing per-cell more accurate β a hexagon's area doesn't abruptly end at a street corner.
- Greedy: fast, simple, locally optimal. Good for low demand.
- Batch matching: globally optimal, higher latency. Good for high demand.
- Geohash: simple, Redis-friendly. Non-uniform near poles.
- S2/H3: uniform cells, precision. Used by Uber and Google at global scale.
What Real Companies Did
Uber
- H3: custom hexagonal grid system for geospatial indexing
- DISCO: dispatch optimization (batch matching at scale)
- Ringpop: consistent hashing for partitioning by city
- Schemaless: custom distributed database for trip data
- Surge pricing computed per H3 hexagon cell
Lyft
- S2 cells for geospatial indexing
- Custom matching with ML-based scoring
- Shared rides: constraint optimization for route compatibility
- Envoy service mesh for inter-service communication
- Predictive positioning: suggest drivers where demand will appear
Grab (Southeast Asia)
- Multi-modal: cars, bikes, tuk-tuks β different matching
- Optimized for dense urban + rural (varied road quality)
- Real-time traffic from driver GPS data (crowdsourced)
- Kafka for all location event streaming
- GrabPlatform: shared infrastructure across food/ride/deliver
DiDi (China)
- 600M users, 30M rides/day (peak)
- AI-based demand prediction per zone (15-min windows)
- Order dispatch: 3-second batch windows, global optimization
- Handles extreme density: 100K+ concurrent drivers in single city
- Real-time repositioning suggestions for idle drivers
- Uber: H3 hexagonal grid, DISCO batch optimization, Ringpop partitioning.
- Lyft: S2 cells, ML-based matching, predictive driver positioning.
- Grab: multi-modal, crowdsourced traffic, shared platform across verticals.
- DiDi: 30M rides/day, AI demand prediction, 3-second batch dispatch.
Best Practices Extracted
Geospatial Partitioning
- Partition data by geographic region (city/zone)
- Each partition serves independently β no cross-city queries
- Scale horizontally: add servers per city as demand grows
- Isolation: outage in one city doesn't affect others
- Transfers to: delivery, food ordering, fleet management
State Machine Lifecycle
- Ride states: requested β matched β pickup β in_transit β completed
- Each transition has timeout + fallback
- Audit log of every state change (dispute resolution)
- Exactly-once state transitions (idempotent)
- Transfers to: order fulfillment, workflow engines
Supply/Demand Balancing
- Real-time supply (available drivers) per zone
- Predicted demand (ride requests) per zone per time window
- Imbalance β surge pricing (demand > supply) or repositioning (supply > demand)
- Heat maps for driver repositioning guidance
- Transfers to: any marketplace with dynamic pricing
- Geo-partition: data and compute partitioned by city. Independent scaling and failure isolation.
- State machine: ride lifecycle with timeouts and fallbacks at every transition.
- Supply/demand: real-time tracking per zone enables surge pricing and repositioning.
- Simultaneous offers: send dispatch to top-3 drivers at once. First to accept wins. Eliminates cascade rejection delay.
What Could Go Wrong
Ghost Drivers
- Driver app crashes β no location updates β still shows as available
- Rider matched to "ghost" β dispatch sent but never accepted
- 15s timeout β retry β another ghost β 45 seconds wasted
- Fix: TTL on location records (30s). No update = marked offline. Heartbeat requirement for "available" status.
Dispatch Cascade Failure
- Best driver rejects β send to 2nd β they reject β 3rd...
- 5 rejections = 75 seconds elapsed β rider cancels
- All nearby drivers busy rejecting offers from other riders
- Fix: batch matching (assign optimally in one shot), timeout reduction (10s), simultaneous offers to multiple drivers (first to accept wins).
GPS Drift and Inaccuracy
- Urban canyons: GPS shows driver in wrong street (50-100m off)
- ETA computed to wrong location β driver arrives but rider doesn't see them
- Matching selects "nearby" driver who is actually on the highway above
- Fix: snap GPS to road network, use multi-point averaging, consider altitude/road level.
Surge Pricing Oscillation
- Demand spike β surge 3x β riders stop requesting β demand drops
- Surge removed β riders flood back β surge again
- Oscillating surge confuses both riders and drivers
- Fix: smooth surge transitions (ramp up/down over minutes, not seconds), minimum surge duration, demand prediction (anticipate vs react).
The most damaging failure is matching a rider to a ghost driver. The rider waits, nothing happens, and they leave for a competitor. This single failure mode has caused more user churn than any other issue in ride-sharing. The fix is simple: aggressive TTL on driver availability. If no heartbeat in 30 seconds, the driver is dead to the matching system. Better to have fewer candidates than to match to a ghost.
- Ghost drivers: TTL on availability. No heartbeat = offline. Never match to stale data.
- Cascade rejection: batch matching or simultaneous offers. Don't serial-dispatch.
- GPS drift: snap to road network. Don't trust raw GPS in urban environments.
- Surge oscillation: smooth transitions + minimum duration. Predict, don't react.