Core Systems

LeetCode — Online Judge

Browse problems, write a solution in an in-browser editor, submit, and get feedback after the code runs against hidden tests inside secure, isolated runtimes — while powering timed contests (90 min, 10 problems, up to 100k users) with a fresh live leaderboard.

Requirements

Functional

View a list of problems; view a problem & code a solution in-browser.
Submit a solution & get feedback (pass/fail per test, runtime, errors).
Support competitions with a live leaderboard.

Non-functional

Availability >> consistency — the leaderboard may be slightly stale.
Security & isolation running untrusted code (hard requirement).
Scale to 100k concurrent contest users; fresh, near-real-time leaderboard.
Low-latency feedback (seconds); fairness during contests.

Scale & back-of-the-envelope

Metric	Value	Derived
DAU / accounts	100k / 5M	given
Problems	~3k	static, cache aggressively
Contest	90 min, 10 problems	given
Submissions/contest	~3M	100k × 10 × ~3 → ~555/s avg
Start burst	several k/s	everyone attacks problem #1
Leaderboard reads	~20k/s	100k polling every ~5s → serve from cache

The bottleneck is the execution fleet (CPU-bound, untrusted) and the leaderboard read fan-out — not raw storage.

API design

GET  /problems?category&difficulty&page&size   -> Problem[]
GET  /problems/:id                             -> Problem (stubs, samples, description)
POST /problems/:id   { code, language, competitionId? }
                                               -> 200 { submissionId }   // async
GET  /problems/:id/submission/:submissionId    -> Submission (poll or push)
GET  /leaderboard/:competitionId?page&size     -> Leaderboard

Submit returns immediately with a submissionId; execution is asynchronous (poll or upgrade to WebSocket/SSE). completionTime is stamped server-side for a fair tie-break.

High-level design

A stateless Primary Server writes a pending submission and enqueues it on SQS; a Worker runs the code in the correct per-language Docker runtime and writes the result back; the DB streams changes via CDC into a Redis sorted set that backs the leaderboard.

flowchart LR
    Client["Client (Monaco IDE)"]
    LB["API Gateway / LB"]
    Server["Primary Server"]
    Queue["AWS SQS"]
    Worker["Worker (sandbox host)"]
    DB[("Database")]
    Redis[("Redis Sorted Set")]
    subgraph Runtimes["Docker Containers"]
        Java["Java Runtime"]
        Py["Python Runtime"]
        JS["JavaScript Runtime"]
    end
    Client -->|"GET problems / POST submit / GET leaderboard"| LB
    LB --> Server
    Server -->|"read / write"| DB
    Server -->|"enqueue"| Queue
    Queue --> Worker
    Worker -->|"run code"| Runtimes
    Runtimes -->|"stdout / exit"| Worker
    Worker -->|"write result"| Server
    DB -->|"CDC"| Redis
    Server -->|"read rank"| Redis

sequenceDiagram
    participant C as Client
    participant S as Primary Server
    participant Q as SQS Queue
    participant W as Worker
    participant R as Runtime Sandbox
    participant DB as Database
    participant Z as Redis Leaderboard
    C->>S: POST submit code and language
    S->>DB: insert Submission pending
    S->>Q: enqueue submissionId
    S-->>C: 200 submissionId
    Q->>W: deliver message
    W->>R: run code against test cases
    R-->>W: results pass fail and runtime
    W->>S: write submission result
    S->>DB: update Submission passed
    DB->>Z: CDC then ZADD score
    C->>S: GET submission status poll
    S-->>C: Submission result

Deep dive · secure sandboxing (the core problem)

Running untrusted code safely is the defining challenge. The runtime checklist:

Explicit timeout per execution (wall-clock + CPU-time) to kill infinite loops.
CPU & memory bounds via cgroups — OOM-kill memory bombs, cap CPU vs fork bombs.
Read-only filesystem; writes only to an ephemeral /tmp.
Network isolation (VPC, no egress) — code can't phone home or exfiltrate test data.
No system calls — seccomp-BPF allow-list, dropped Linux capabilities.

flowchart TD
    A["Untrusted submission"] --> B["Worker host"]
    B --> C["microVM (Firecracker / gVisor)"]
    C --> D["Container read-only FS"]
    D --> E["Language runtime"]
    E --> F["Resource limits"]
    F --> G["CPU + memory cgroups"]
    F --> H["Wall-clock timeout"]
    F --> I["Network isolation VPC"]
    F --> J["Seccomp blocks syscalls"]

Approach	Isolation	Startup	Notes
Process + ulimit + seccomp	Weak	~ms	Shares kernel; one escape = total compromise
Docker container	Medium	~10–100ms	Drawn baseline; shared kernel is residual risk
gVisor (user-space kernel)	High	~100–300ms	Shrinks kernel attack surface
Firecracker microVM	Very high	~125ms	Per-execution VM (Lambda model)

Recommended: per-language Docker images run inside gVisor or Firecracker for a true kernel boundary; use one-shot ephemeral runners (fresh sandbox per submission) with a warm pool to hide boot latency.

Deep dive · submission queue & worker fleet

The SQS queue decouples accepting a submission from executing it — the Primary Server stays fast even when execution is saturated, and the queue absorbs bursts.

Visibility timeout ≈ max exec time + margin; a dead worker's message reappears.
Idempotency on submissionId — re-delivery must not double-count the leaderboard.
DLQ for poison payloads; autoscale workers on queue depth + CPU.
Fairness: per-user concurrency caps + rate limits; separate contest vs practice queues so a casual run never delays a contestant.

Deep dive · live contest leaderboard

Rank by problems solved (desc), tie-break by fastest completion (asc). Computing this with a SQL GROUP BY on every read is too slow at 100k pollers:

SELECT userId, COUNT(*) AS solved, MIN(submittedAt) AS t
FROM Submissions
WHERE competitionId = ? AND passed = TRUE
GROUP BY userId
ORDER BY solved DESC, t ASC;     -- correct but re-scans on every read

Instead, materialize into a Redis sorted set leaderboard:competitionId, kept fresh by CDC from the DB (so the DB stays the source of truth and there's no dual-write risk). Encode both sort keys into one score so problem count dominates time:

score = passedProblems * 1e7 - lastPassTimeInSeconds   // 5400s << 1e7
ZADD     leaderboard:competitionId  <score>  <userId>
ZREVRANGE leaderboard:competitionId 0 49 WITHSCORES    // O(log N + page)

flowchart LR
    W["Worker"] -->|"passed submission"| DB[("Submissions DB")]
    DB -->|"CDC stream"| P["Leaderboard Consumer"]
    P -->|"ZADD score"| Z[("Redis Sorted Set")]
    C["Client"] -->|"GET leaderboard paged"| API["Primary Server"]
    API -->|"ZREVRANGE"| Z
    Z -->|"top N and rank"| API

At 100k viewers, prefer server push (WebSocket/SSE) of the top-N + the viewer's own rank over naive polling; cache leaderboard pages with a short TTL (deep pages are near-static).

Deep dive · thundering herd at contest start

Pre-warm the fleet and a pool of sandboxes before T=0 using registration counts (predictive autoscaling).
Queue as shock absorber — SQS accepts the burst instantly; results stream back as capacity allows.
Edge-cache static content — all 100k load the same ~10 statements + sample tests → CDN/Redis; pre-load worker images with hidden tests.
Backpressure & fairness — per-user rate limits; jittered client refresh; multiple queue shards.

Data model

Entity	Key fields
Problem	`id` (PK), name, difficulty, category, codeStubs[], testCases[] (hidden), description
Submission	`competitionId` (partition key), problemId, userId, testCaseResults, `passed`, error?, runTime, `completionTime` (server-stamped)
Competition	`id`, startTime, endTime (start+90m), problems[]
Leaderboard (Redis)	key `leaderboard:competitionId`, ZSET, element userId, composite score

Shard submissions by competitionId for the write-heavy contest workload; problems are small, read-mostly, cache-friendly; Redis holds the hot derived leaderboard.

Summary

Stateless fast API tier → all untrusted execution behind SQS into an autoscaling, strongly-isolated worker fleet → near-real-time leaderboard from a Redis sorted set fed by CDC. Satisfies the four pillars: availability >> consistency, secure isolation, 100k-user scale, fresh leaderboard.