System Design Notes All designs

Core Systems

LeetCode — Online Judge

Browse problems, write a solution in an in-browser editor, submit, and get feedback after the code runs against hidden tests inside secure, isolated runtimes — while powering timed contests (90 min, 10 problems, up to 100k users) with a fresh live leaderboard.

Requirements

Functional

Non-functional

Scale & back-of-the-envelope

Metric Value Derived
DAU / accounts 100k / 5M given
Problems ~3k static, cache aggressively
Contest 90 min, 10 problems given
Submissions/contest ~3M 100k × 10 × ~3 → ~555/s avg
Start burst several k/s everyone attacks problem #1
Leaderboard reads ~20k/s 100k polling every ~5s → serve from cache

The bottleneck is the execution fleet (CPU-bound, untrusted) and the leaderboard read fan-out — not raw storage.

API design

GET  /problems?category&difficulty&page&size   -> Problem[]
GET  /problems/:id                             -> Problem (stubs, samples, description)
POST /problems/:id   { code, language, competitionId? }
                                               -> 200 { submissionId }   // async
GET  /problems/:id/submission/:submissionId    -> Submission (poll or push)
GET  /leaderboard/:competitionId?page&size     -> Leaderboard

Submit returns immediately with a submissionId; execution is asynchronous (poll or upgrade to WebSocket/SSE). completionTime is stamped server-side for a fair tie-break.

High-level design

A stateless Primary Server writes a pending submission and enqueues it on SQS; a Worker runs the code in the correct per-language Docker runtime and writes the result back; the DB streams changes via CDC into a Redis sorted set that backs the leaderboard.

flowchart LR
    Client["Client (Monaco IDE)"]
    LB["API Gateway / LB"]
    Server["Primary Server"]
    Queue["AWS SQS"]
    Worker["Worker (sandbox host)"]
    DB[("Database")]
    Redis[("Redis Sorted Set")]
    subgraph Runtimes["Docker Containers"]
        Java["Java Runtime"]
        Py["Python Runtime"]
        JS["JavaScript Runtime"]
    end
    Client -->|"GET problems / POST submit / GET leaderboard"| LB
    LB --> Server
    Server -->|"read / write"| DB
    Server -->|"enqueue"| Queue
    Queue --> Worker
    Worker -->|"run code"| Runtimes
    Runtimes -->|"stdout / exit"| Worker
    Worker -->|"write result"| Server
    DB -->|"CDC"| Redis
    Server -->|"read rank"| Redis
      
sequenceDiagram
    participant C as Client
    participant S as Primary Server
    participant Q as SQS Queue
    participant W as Worker
    participant R as Runtime Sandbox
    participant DB as Database
    participant Z as Redis Leaderboard
    C->>S: POST submit code and language
    S->>DB: insert Submission pending
    S->>Q: enqueue submissionId
    S-->>C: 200 submissionId
    Q->>W: deliver message
    W->>R: run code against test cases
    R-->>W: results pass fail and runtime
    W->>S: write submission result
    S->>DB: update Submission passed
    DB->>Z: CDC then ZADD score
    C->>S: GET submission status poll
    S-->>C: Submission result
      

Deep dive · secure sandboxing (the core problem)

Running untrusted code safely is the defining challenge. The runtime checklist:

flowchart TD
    A["Untrusted submission"] --> B["Worker host"]
    B --> C["microVM (Firecracker / gVisor)"]
    C --> D["Container read-only FS"]
    D --> E["Language runtime"]
    E --> F["Resource limits"]
    F --> G["CPU + memory cgroups"]
    F --> H["Wall-clock timeout"]
    F --> I["Network isolation VPC"]
    F --> J["Seccomp blocks syscalls"]
      
Approach Isolation Startup Notes
Process + ulimit + seccomp Weak ~ms Shares kernel; one escape = total compromise
Docker container Medium ~10–100ms Drawn baseline; shared kernel is residual risk
gVisor (user-space kernel) High ~100–300ms Shrinks kernel attack surface
Firecracker microVM Very high ~125ms Per-execution VM (Lambda model)

Recommended: per-language Docker images run inside gVisor or Firecracker for a true kernel boundary; use one-shot ephemeral runners (fresh sandbox per submission) with a warm pool to hide boot latency.

Deep dive · submission queue & worker fleet

The SQS queue decouples accepting a submission from executing it — the Primary Server stays fast even when execution is saturated, and the queue absorbs bursts.

Deep dive · live contest leaderboard

Rank by problems solved (desc), tie-break by fastest completion (asc). Computing this with a SQL GROUP BY on every read is too slow at 100k pollers:

SELECT userId, COUNT(*) AS solved, MIN(submittedAt) AS t
FROM Submissions
WHERE competitionId = ? AND passed = TRUE
GROUP BY userId
ORDER BY solved DESC, t ASC;     -- correct but re-scans on every read

Instead, materialize into a Redis sorted set leaderboard:competitionId, kept fresh by CDC from the DB (so the DB stays the source of truth and there's no dual-write risk). Encode both sort keys into one score so problem count dominates time:

score = passedProblems * 1e7 - lastPassTimeInSeconds   // 5400s << 1e7
ZADD     leaderboard:competitionId  <score>  <userId>
ZREVRANGE leaderboard:competitionId 0 49 WITHSCORES    // O(log N + page)
flowchart LR
    W["Worker"] -->|"passed submission"| DB[("Submissions DB")]
    DB -->|"CDC stream"| P["Leaderboard Consumer"]
    P -->|"ZADD score"| Z[("Redis Sorted Set")]
    C["Client"] -->|"GET leaderboard paged"| API["Primary Server"]
    API -->|"ZREVRANGE"| Z
    Z -->|"top N and rank"| API
      

At 100k viewers, prefer server push (WebSocket/SSE) of the top-N + the viewer's own rank over naive polling; cache leaderboard pages with a short TTL (deep pages are near-static).

Deep dive · thundering herd at contest start

Data model

Entity Key fields
Problem id (PK), name, difficulty, category, codeStubs[], testCases[] (hidden), description
Submission competitionId (partition key), problemId, userId, testCaseResults, passed, error?, runTime, completionTime (server-stamped)
Competition id, startTime, endTime (start+90m), problems[]
Leaderboard (Redis) key leaderboard:competitionId, ZSET, element userId, composite score

Shard submissions by competitionId for the write-heavy contest workload; problems are small, read-mostly, cache-friendly; Redis holds the hot derived leaderboard.

Summary

Stateless fast API tier → all untrusted execution behind SQS into an autoscaling, strongly-isolated worker fleet → near-real-time leaderboard from a Redis sorted set fed by CDC. Satisfies the four pillars: availability >> consistency, secure isolation, 100k-user scale, fresh leaderboard.