System Design Notes All designs

Location & Matching

Uber — Ride Hailing

A rider enters pickup + destination, gets a fare estimate, and requests a ride; the system matches a nearby driver who accepts. Hard problems: absorbing a firehose of driver location writes, low-latency proximity search, and consistent 1:1 matching so a driver is never double-booked.

Requirements

Functional

Non-functional

CAP stance

The matching path chooses CP (fail-and-retry rather than double-assign). Fare-estimate, location-ingest, and notifications choose AP.

Scale & back-of-the-envelope

Quantity Value Notes
Total / active drivers 6M / 3M concurrent online
Location interval every 5 s each active driver pings
Location writes ~600k/s 3M ÷ 5 — the dominant write load
Location DB throughput 100k–1M TPS proximity reads + writes

The 600k writes/s must never touch the transactional DB — it lands in an in-memory geospatial store (Redis, ~300 MB hot state). Reads push toward 1M TPS → shard by geography. Matching is regionally bursty → shard/scale per region and queue requests.

API design

POST  /ride/fare-estimate   { source, destination }   -> { rideId, fare, ETA }   # creates Ride
PATCH /ride/request         { rideId }                # enqueues for matching
POST  /location/update      { lat, long }             -> 202   # async, ~every 5s
PATCH /ride/driver/accept   { rideId, accept }
PATCH /ride/driver/update   { rideId, status }        -> next nav target | null

/location/update returns 202 immediately and is processed asynchronously — it must never block on the transactional DB.

High-level design

flowchart LR
    Rider["Rider App"]
    Driver["Driver App"]
    GW["API Gateway: auth, SSL, rate limit"]
    RS["Ride Service: fare estimation"]
    MAP["3rd Party Mapping"]
    Q["Ride Request Queue"]
    RMS["Ride Matching Service"]
    LS["Location Service"]
    LDB[("Location DB - Redis geohash")]
    PDB[("Primary DB - DynamoDB")]
    LOCK[("Driver Lock - TTL")]
    NOTIF["Notification - APN/FCM"]
    Rider -->|fare est / request| GW
    Driver -->|location / accept| GW
    GW -->|fare, accept| RS
    GW -->|requestRide| Q
    GW -->|updateLocation| LS
    RS -->|ETA, route| MAP
    RS -->|read/write ride| PDB
    Q -->|dequeue| RMS
    RMS -->|getDriverLocations| LDB
    RMS -->|lock driverId| LOCK
    RMS -->|notify driver| NOTIF
    LS -->|write location| LDB
    NOTIF -->|push| Driver
      
sequenceDiagram
    participant R as Rider
    participant RS as Ride Service
    participant Q as Request Queue
    participant RMS as Matching Service
    participant LDB as Location DB
    participant LK as Driver Lock
    participant D as Driver
    R->>RS: POST /ride/fare-estimate
    RS-->>R: fare, ETA, rideId
    R->>Q: PATCH /ride/request
    Q->>RMS: dequeue ride
    RMS->>LDB: getDriverLocations(geo)
    LDB-->>RMS: nearby drivers ranked
    loop until match or 1 min timeout
        RMS->>LK: lock(driverId, TTL)
        RMS->>D: notify ride offer
        D->>RMS: accept
    end
    RMS-->>R: driver assigned, ETA
      

Deep dive · geospatial indexing

Scanning all 3M drivers per query is O(n) and impossible at 1M TPS. A spatial index turns proximity search into a key lookup over a few cells.

Criterion Geohash (chosen) Quadtree H3
Implementation Simplest (Redis built-in) Complex Moderate (library)
Density adaptivity Fixed grid Adaptive Multi-resolution
Write cost (600k/s) Cheapest Expensive (rebalance) Cheap
Neighbor accuracy Corner anomalies Good Best (6 equal)

Decision: Geohash on Redis (GEOADD/GEOSEARCH) — meets 1M TPS with simple ops and easy geographic sharding. Query the rider's cell + its 8 neighbors and re-rank by true distance/ETA to remove boundary effects. Production Uber uses H3 (hexagons → 6 equidistant neighbors) — the strongest "evolve the design" answer.

Deep dive · high-frequency location writes

~600k writes/s is the system's heartbeat:

Trade-off: in-memory + async = blistering throughput but weaker durability — acceptable because live position is ephemeral and self-healing; only committed ride state needs DynamoDB durability.

Deep dive · matching & avoiding double-booking

A region-partitioned queue absorbs surge bursts and decouples request latency from matching. The offer loop reserves one driver at a time with a TTL lock:

while (noMatch) {
  driver = nextDriver        // best-ranked candidate
  lock(driver)               // conditional PutItem, with TTL
  sendNotification(driver)   // push the offer
  wait(10s)                  // accept -> break; decline/timeout -> next
}  // bounded by the < 1 min match-or-fail SLA
flowchart TD
    A["Dequeue ride request"] --> B["getDriverLocations from Redis"]
    B --> C{"Drivers found?"}
    C -->|No| F["Mark ride Failed"]
    C -->|Yes| D["driver = nextDriver"]
    D --> E{"Acquire driver lock - TTL?"}
    E -->|Lock held| L
    E -->|Lock acquired| G["sendNotification(driver)"]
    G --> H["wait 10s"]
    H --> I{"Accepted?"}
    I -->|Yes| J["Assign ride, status = matched"]
    I -->|No or timeout| K["Lock auto-expires via TTL"]
    K --> L{"More drivers and within 1 min?"}
    L -->|Yes| D
    L -->|No| F
    J --> M["Driver status = in_ride"]
      

Two invariants: never send more than one request per ride, and never send a driver more than one request at a time. A Driver Lock keyed by driverId with TTL in DynamoDB acquired via conditional write (attribute_not_exists) gives atomic compare-and-set; the TTL auto-releases if the matcher crashes or the window lapses (no stranded driver). On accept, a TransactWriteItems commits ride + driver status together. Sequential + lock trades a little latency for the 1:1 correctness the NFR demands (vs broadcast, which reintroduces the double-accept race).

Data model & ride lifecycle

stateDiagram-v2
    [*] --> FareEstimated
    FareEstimated --> Requested: rider confirms
    Requested --> Matching: enqueued
    Matching --> Matched: driver accepts
    Matching --> Failed: timeout or no driver
    Matched --> EnRoute: heading to pickup
    EnRoute --> InRide: pickedup
    InRide --> Completed: droppedoff
    Completed --> [*]
    Failed --> [*]
      
Primary DB (DynamoDB)
  Ride    id PK; riderId, driverId?, fare, ETA, source, destination, status
  Driver  id PK; status: available | in_ride | offline
  DriverLock  driverId PK; rideId, ttl (epoch -> DynamoDB TTL auto-delete)
Location DB (Redis)
  GEOADD region driverId long lat   # per-key TTL; GEOSEARCH BYRADIUS

Surge pricing (extension)

A pricing component watches per-cell demand:supply (open requests vs available drivers in Redis) and applies a multiplier on the fare-estimate path, so the rider sees the surged price before requesting — and it naturally throttles demand during peaks.