Commerce & Booking
Ticketmaster — Ticketing
Browse and search events, view a seat map, and buy tickets at ~100M DAU. The hard part is selling a finite, contended inventory of seats correctly during massive on-sale surges without ever double-booking, while keeping browse/search fast and always available.
Requirements
Functional
- Book tickets (reserve + pay for specific seats); view an event (details + seat map); search events by term, location, type, date.
Non-functional
- Strong consistency for booking (a seat sold at most once) & high availability for search/viewing.
-
read ≫ write; scale to handle surges concentrated on a single hot event.
Two systems glued together
Discovery (search + view) is AP — Elasticsearch + Redis + CDN. Transaction (reserve + book) is CP — Postgres source of truth + Redis lock. Designing them separately keeps a flash-sale write storm from taking down search.
Scale & back-of-the-envelope
- ~100M DAU → ~1B reads/day (~12k QPS avg, ~100k+ peak); ~1M bookings/day (~12 QPS avg — trivial).
- The challenge is per-event contention: a 60k-seat stadium with ~2M fans buying in 5 min ≈ ~6,600 reserve QPS all on one event (~33 buyers per seat) → motivates the waiting queue + per-seat lock.
API design
GET /search?term&location&type&date -> Partial<Event>[]
GET /events/{id} -> Event & Venue & Performer & Ticket[]
POST /bookings/reserve { ticketId } -> 200 { reservationId, expiresAt } | 409
POST /bookings/confirm { ticketId, paymentDetails } -> 200 { orderId } | 402 | 410
reserve/confirm are
idempotent (client idempotency key) — retries are
guaranteed during a surge — and sit behind the waiting queue + gateway
rate limiting.
High-level design
flowchart LR
Client["Client"] --> CDN["CDN"]
CDN --> GW["API Gateway"]
GW --> SRCH["Search Service"]
GW --> EC["Event CRUD Service"]
GW --> VQ["Virtual Waiting Queue"]
SRCH --> ES["Elasticsearch"]
SRCH --> CACHE["Redis Cache"]
EC --> CACHE
EC --> DB[("Postgres")]
DB -->|CDC| ES
VQ --> BK["Booking Service"]
BK --> LOCK["Ticket Lock Redis TTL 10m"]
BK --> DB
BK --> STRIPE["Stripe"]
The AP discovery plane (CDN → Redis → Elasticsearch, fed by CDC from Postgres) serves search + view; the CP transaction plane (queue → Redis lock → Postgres → Stripe) serves booking.
Deep dive · seat hold (distributed lock + TTL)
Thousands target the same seat in the same second; a naive
read-then-write races. On reserve(ticketId) the Booking
Service writes a Redis lock
{ticketId: userId} NX EX 600 — a per-seat lock with a
short TTL that gives a bounded checkout window and
auto-expires (no cron sweeper) if the buyer doesn't
confirm.
sequenceDiagram
participant U as User
participant B as Booking Service
participant L as Lock Redis
participant DB as Postgres
participant S as Stripe
U->>B: reserve(ticketId)
B->>L: SET lock ticketId=userId NX EX 600
alt lock acquired
L-->>B: OK
B->>DB: UPDATE status=reserved WHERE available
B-->>U: held, 10 min countdown
else lock taken
L-->>B: nil
B-->>U: 409 seat unavailable
end
U->>B: confirm(ticketId, paymentDetails)
B->>S: charge(paymentDetails)
S-->>B: success
B->>DB: UPDATE status=booked WHERE reserved
B->>L: DEL lock if owner
B-->>U: booking confirmed
The lock value = userId so only the owner releases (Lua compare-and-delete avoids deleting a re-acquired lock). TTL too short → users lose seats mid-payment; too long → inventory starves. The Redis lock sheds contention, but the DB conditional update is the final arbiter.
Deep dive · strong consistency for booking
The database guarantees correctness at the moment of truth via a conditional compare-and-set inside a transaction:
BEGIN;
UPDATE tickets SET status='booked', user_id=:userId
WHERE id=:ticketId AND status='reserved' AND user_id=:userId;
-- affected rows must equal seats requested; else ROLLBACK and refund
COMMIT;
Row-level locking + the WHERE status=... predicate means
exactly one writer wins, independent of Redis.
Payment saga: reserve → charge → mark booked; reconcile
via an outbox/idempotent retry if Stripe succeeds but the DB write
fails; void/refund if the hold expired. Partition
tickets by eventId so unrelated events don't
contend.
Deep dive · virtual waiting queue
6,600+ reserve QPS against one event would melt the lock/DB and reward the fastest bot. A waiting queue between the gateway and Booking Service converts a write storm into a controlled trickle.
flowchart TD
U["User clicks Buy"] --> GW["API Gateway"]
GW --> Q["Virtual Waiting Queue"]
Q --> POS["Assign position + token"]
POS --> ADMIT{"Admitted?"}
ADMIT -->|no, stay queued| POS
ADMIT -->|yes, throttled admit| BK["Booking Service"]
BK --> LOCK["Acquire seat lock + reserve"]
Users enqueue (Redis sorted set by enqueue time, or a Kafka partition per event) and get a token + live position; a dispatcher admits N users/sec sized to inventory + lock/DB capacity. Provides fairness (FIFO or a randomized lottery to defeat refresh-spam), backpressure, and a good UX. Slightly over-admit since people abandon checkout.
Deep dive · search & seat-map freshness
Search Service → Elasticsearch with a
Redis query cache for hot terms (playoff: [...], swift: [...]) and
CDC from Postgres to keep the index fresh. The seat
map is treated as
best-effort / eventually consistent (served from
Redis/CDN, optional pub/sub deltas for hot events) — the authoritative
conflict point is the reserve call, which keeps the read
plane highly available even while inventory churns.
tickets id PK; event_id, seat, price, status (available|reserved|booked),
user_id, UNIQUE(event_id, seat) PARTITION BY HASH(event_id)
Redis lock:{ticketId}=userId EX ttl; querycache:{q}=[eventIds]; queue:{eventId}
Guiding principle
Split into a highly-available discovery plane (CDN → Redis → Elasticsearch via CDC) and a strongly-consistent transaction plane (queue → Redis lock with TTL → Postgres compare-and-set → Stripe): strong consistency for booking, high availability for everything else, graceful behavior under surges.