System Design Notes All designs

Commerce & Booking

Ticketmaster — Ticketing

Browse and search events, view a seat map, and buy tickets at ~100M DAU. The hard part is selling a finite, contended inventory of seats correctly during massive on-sale surges without ever double-booking, while keeping browse/search fast and always available.

Requirements

Functional

Non-functional

Two systems glued together

Discovery (search + view) is AP — Elasticsearch + Redis + CDN. Transaction (reserve + book) is CP — Postgres source of truth + Redis lock. Designing them separately keeps a flash-sale write storm from taking down search.

Scale & back-of-the-envelope

API design

GET  /search?term&location&type&date        -> Partial<Event>[]
GET  /events/{id}                           -> Event & Venue & Performer & Ticket[]
POST /bookings/reserve   { ticketId }       -> 200 { reservationId, expiresAt } | 409
POST /bookings/confirm   { ticketId, paymentDetails }  -> 200 { orderId } | 402 | 410

reserve/confirm are idempotent (client idempotency key) — retries are guaranteed during a surge — and sit behind the waiting queue + gateway rate limiting.

High-level design

flowchart LR
    Client["Client"] --> CDN["CDN"]
    CDN --> GW["API Gateway"]
    GW --> SRCH["Search Service"]
    GW --> EC["Event CRUD Service"]
    GW --> VQ["Virtual Waiting Queue"]
    SRCH --> ES["Elasticsearch"]
    SRCH --> CACHE["Redis Cache"]
    EC --> CACHE
    EC --> DB[("Postgres")]
    DB -->|CDC| ES
    VQ --> BK["Booking Service"]
    BK --> LOCK["Ticket Lock Redis TTL 10m"]
    BK --> DB
    BK --> STRIPE["Stripe"]
      

The AP discovery plane (CDN → Redis → Elasticsearch, fed by CDC from Postgres) serves search + view; the CP transaction plane (queue → Redis lock → Postgres → Stripe) serves booking.

Deep dive · seat hold (distributed lock + TTL)

Thousands target the same seat in the same second; a naive read-then-write races. On reserve(ticketId) the Booking Service writes a Redis lock {ticketId: userId} NX EX 600 — a per-seat lock with a short TTL that gives a bounded checkout window and auto-expires (no cron sweeper) if the buyer doesn't confirm.

sequenceDiagram
    participant U as User
    participant B as Booking Service
    participant L as Lock Redis
    participant DB as Postgres
    participant S as Stripe
    U->>B: reserve(ticketId)
    B->>L: SET lock ticketId=userId NX EX 600
    alt lock acquired
        L-->>B: OK
        B->>DB: UPDATE status=reserved WHERE available
        B-->>U: held, 10 min countdown
    else lock taken
        L-->>B: nil
        B-->>U: 409 seat unavailable
    end
    U->>B: confirm(ticketId, paymentDetails)
    B->>S: charge(paymentDetails)
    S-->>B: success
    B->>DB: UPDATE status=booked WHERE reserved
    B->>L: DEL lock if owner
    B-->>U: booking confirmed
      

The lock value = userId so only the owner releases (Lua compare-and-delete avoids deleting a re-acquired lock). TTL too short → users lose seats mid-payment; too long → inventory starves. The Redis lock sheds contention, but the DB conditional update is the final arbiter.

Deep dive · strong consistency for booking

The database guarantees correctness at the moment of truth via a conditional compare-and-set inside a transaction:

BEGIN;
UPDATE tickets SET status='booked', user_id=:userId
 WHERE id=:ticketId AND status='reserved' AND user_id=:userId;
-- affected rows must equal seats requested; else ROLLBACK and refund
COMMIT;

Row-level locking + the WHERE status=... predicate means exactly one writer wins, independent of Redis. Payment saga: reserve → charge → mark booked; reconcile via an outbox/idempotent retry if Stripe succeeds but the DB write fails; void/refund if the hold expired. Partition tickets by eventId so unrelated events don't contend.

Deep dive · virtual waiting queue

6,600+ reserve QPS against one event would melt the lock/DB and reward the fastest bot. A waiting queue between the gateway and Booking Service converts a write storm into a controlled trickle.

flowchart TD
    U["User clicks Buy"] --> GW["API Gateway"]
    GW --> Q["Virtual Waiting Queue"]
    Q --> POS["Assign position + token"]
    POS --> ADMIT{"Admitted?"}
    ADMIT -->|no, stay queued| POS
    ADMIT -->|yes, throttled admit| BK["Booking Service"]
    BK --> LOCK["Acquire seat lock + reserve"]
      

Users enqueue (Redis sorted set by enqueue time, or a Kafka partition per event) and get a token + live position; a dispatcher admits N users/sec sized to inventory + lock/DB capacity. Provides fairness (FIFO or a randomized lottery to defeat refresh-spam), backpressure, and a good UX. Slightly over-admit since people abandon checkout.