Core Systems
LeetCode — Online Judge
Browse problems, write a solution in an in-browser editor, submit, and get feedback after the code runs against hidden tests inside secure, isolated runtimes — while powering timed contests (90 min, 10 problems, up to 100k users) with a fresh live leaderboard.
Requirements
Functional
- View a list of problems; view a problem & code a solution in-browser.
- Submit a solution & get feedback (pass/fail per test, runtime, errors).
- Support competitions with a live leaderboard.
Non-functional
- Availability >> consistency — the leaderboard may be slightly stale.
- Security & isolation running untrusted code (hard requirement).
- Scale to 100k concurrent contest users; fresh, near-real-time leaderboard.
- Low-latency feedback (seconds); fairness during contests.
Scale & back-of-the-envelope
| Metric | Value | Derived |
|---|---|---|
| DAU / accounts | 100k / 5M | given |
| Problems | ~3k | static, cache aggressively |
| Contest | 90 min, 10 problems | given |
| Submissions/contest | ~3M | 100k × 10 × ~3 → ~555/s avg |
| Start burst | several k/s | everyone attacks problem #1 |
| Leaderboard reads | ~20k/s | 100k polling every ~5s → serve from cache |
The bottleneck is the execution fleet (CPU-bound, untrusted) and the leaderboard read fan-out — not raw storage.
API design
GET /problems?category&difficulty&page&size -> Problem[]
GET /problems/:id -> Problem (stubs, samples, description)
POST /problems/:id { code, language, competitionId? }
-> 200 { submissionId } // async
GET /problems/:id/submission/:submissionId -> Submission (poll or push)
GET /leaderboard/:competitionId?page&size -> Leaderboard
Submit returns immediately with a submissionId; execution
is asynchronous (poll or upgrade to WebSocket/SSE).
completionTime is stamped
server-side for a fair tie-break.
High-level design
A stateless Primary Server writes a pending submission and enqueues it on SQS; a Worker runs the code in the correct per-language Docker runtime and writes the result back; the DB streams changes via CDC into a Redis sorted set that backs the leaderboard.
flowchart LR
Client["Client (Monaco IDE)"]
LB["API Gateway / LB"]
Server["Primary Server"]
Queue["AWS SQS"]
Worker["Worker (sandbox host)"]
DB[("Database")]
Redis[("Redis Sorted Set")]
subgraph Runtimes["Docker Containers"]
Java["Java Runtime"]
Py["Python Runtime"]
JS["JavaScript Runtime"]
end
Client -->|"GET problems / POST submit / GET leaderboard"| LB
LB --> Server
Server -->|"read / write"| DB
Server -->|"enqueue"| Queue
Queue --> Worker
Worker -->|"run code"| Runtimes
Runtimes -->|"stdout / exit"| Worker
Worker -->|"write result"| Server
DB -->|"CDC"| Redis
Server -->|"read rank"| Redis
sequenceDiagram
participant C as Client
participant S as Primary Server
participant Q as SQS Queue
participant W as Worker
participant R as Runtime Sandbox
participant DB as Database
participant Z as Redis Leaderboard
C->>S: POST submit code and language
S->>DB: insert Submission pending
S->>Q: enqueue submissionId
S-->>C: 200 submissionId
Q->>W: deliver message
W->>R: run code against test cases
R-->>W: results pass fail and runtime
W->>S: write submission result
S->>DB: update Submission passed
DB->>Z: CDC then ZADD score
C->>S: GET submission status poll
S-->>C: Submission result
Deep dive · secure sandboxing (the core problem)
Running untrusted code safely is the defining challenge. The runtime checklist:
- Explicit timeout per execution (wall-clock + CPU-time) to kill infinite loops.
- CPU & memory bounds via cgroups — OOM-kill memory bombs, cap CPU vs fork bombs.
-
Read-only filesystem; writes only to an ephemeral
/tmp. - Network isolation (VPC, no egress) — code can't phone home or exfiltrate test data.
- No system calls — seccomp-BPF allow-list, dropped Linux capabilities.
flowchart TD
A["Untrusted submission"] --> B["Worker host"]
B --> C["microVM (Firecracker / gVisor)"]
C --> D["Container read-only FS"]
D --> E["Language runtime"]
E --> F["Resource limits"]
F --> G["CPU + memory cgroups"]
F --> H["Wall-clock timeout"]
F --> I["Network isolation VPC"]
F --> J["Seccomp blocks syscalls"]
| Approach | Isolation | Startup | Notes |
|---|---|---|---|
| Process + ulimit + seccomp | Weak | ~ms | Shares kernel; one escape = total compromise |
| Docker container | Medium | ~10–100ms | Drawn baseline; shared kernel is residual risk |
| gVisor (user-space kernel) | High | ~100–300ms | Shrinks kernel attack surface |
| Firecracker microVM | Very high | ~125ms | Per-execution VM (Lambda model) |
Recommended: per-language Docker images run inside gVisor or Firecracker for a true kernel boundary; use one-shot ephemeral runners (fresh sandbox per submission) with a warm pool to hide boot latency.
Deep dive · submission queue & worker fleet
The SQS queue decouples accepting a submission from executing it — the Primary Server stays fast even when execution is saturated, and the queue absorbs bursts.
- Visibility timeout ≈ max exec time + margin; a dead worker's message reappears.
-
Idempotency on
submissionId— re-delivery must not double-count the leaderboard. - DLQ for poison payloads; autoscale workers on queue depth + CPU.
- Fairness: per-user concurrency caps + rate limits; separate contest vs practice queues so a casual run never delays a contestant.
Deep dive · live contest leaderboard
Rank by problems solved (desc), tie-break by
fastest completion (asc). Computing this with a SQL
GROUP BY on every read is too slow at 100k pollers:
SELECT userId, COUNT(*) AS solved, MIN(submittedAt) AS t
FROM Submissions
WHERE competitionId = ? AND passed = TRUE
GROUP BY userId
ORDER BY solved DESC, t ASC; -- correct but re-scans on every read
Instead, materialize into a Redis sorted set
leaderboard:competitionId, kept fresh by
CDC from the DB (so the DB stays the source of truth
and there's no dual-write risk). Encode both sort keys into one score
so problem count dominates time:
score = passedProblems * 1e7 - lastPassTimeInSeconds // 5400s << 1e7
ZADD leaderboard:competitionId <score> <userId>
ZREVRANGE leaderboard:competitionId 0 49 WITHSCORES // O(log N + page)
flowchart LR
W["Worker"] -->|"passed submission"| DB[("Submissions DB")]
DB -->|"CDC stream"| P["Leaderboard Consumer"]
P -->|"ZADD score"| Z[("Redis Sorted Set")]
C["Client"] -->|"GET leaderboard paged"| API["Primary Server"]
API -->|"ZREVRANGE"| Z
Z -->|"top N and rank"| API
At 100k viewers, prefer server push (WebSocket/SSE) of the top-N + the viewer's own rank over naive polling; cache leaderboard pages with a short TTL (deep pages are near-static).
Deep dive · thundering herd at contest start
- Pre-warm the fleet and a pool of sandboxes before T=0 using registration counts (predictive autoscaling).
- Queue as shock absorber — SQS accepts the burst instantly; results stream back as capacity allows.
- Edge-cache static content — all 100k load the same ~10 statements + sample tests → CDN/Redis; pre-load worker images with hidden tests.
- Backpressure & fairness — per-user rate limits; jittered client refresh; multiple queue shards.
Data model
| Entity | Key fields |
|---|---|
| Problem |
id (PK), name, difficulty, category, codeStubs[],
testCases[] (hidden), description
|
| Submission |
competitionId (partition key), problemId, userId,
testCaseResults, passed, error?, runTime,
completionTime (server-stamped)
|
| Competition |
id, startTime, endTime (start+90m), problems[]
|
| Leaderboard (Redis) |
key leaderboard:competitionId, ZSET, element
userId, composite score
|
Shard submissions by competitionId for the write-heavy
contest workload; problems are small, read-mostly, cache-friendly;
Redis holds the hot derived leaderboard.
Summary
Stateless fast API tier → all untrusted execution behind SQS into an autoscaling, strongly-isolated worker fleet → near-real-time leaderboard from a Redis sorted set fed by CDC. Satisfies the four pillars: availability >> consistency, secure isolation, 100k-user scale, fresh leaderboard.