Fundamentals
Interview Toolkit & Cheat Sheet
The cross-cutting building blocks every design on this site leans on — a repeatable delivery framework, the non-functional qualities you're scored on, CAP, clean REST modeling, and when to reach for PostgreSQL vs DynamoDB. Behavioral prep now has its own dedicated pages.
The delivery framework
Almost every design here follows the same six steps. The first half nails what the system does; the second half defends how it holds up under load and failure.
flowchart LR
R["1 Requirements"] --> E["2 Core Entities"]
E --> A["3 API / Interface"]
A --> D["4 Data Flow (optional)"]
D --> H["5 High-Level Design"]
H --> DD["6 Deep Dives"]
- Requirements — functional (features) + non-functional (qualities); state what's out of scope.
- Core entities — the nouns the system stores (User, Post, Ride…).
- API / interface — the contract; REST resources or event/stream shapes.
- Data flow — optional; trace a request end-to-end for infra-heavy problems.
- High-level design — the boxes-and-arrows that satisfy the functional requirements.
- Deep dives — the hard problems and trade-offs that satisfy the non-functional requirements.
Mental model
High-level design = "does it work?" Deep dives = "does it still work at 100× scale, during a failure, and under contention?" Spend your interview minutes proportional to where the difficulty actually is.
Functional vs non-functional requirements
Functional = what the system does (post a tweet, book a seat). Non-functional = the qualities it must exhibit. Pick the 2–3 that actually dominate the problem and let them drive the deep dives — don't recite the whole list.
| Quality | What you're really being asked |
|---|---|
| Scalability | Does it hold up as users/data/QPS grow by orders of magnitude? |
| Availability | Uptime target (how many 9s); graceful behavior during failures. |
| Operational characteristics | Latency, throughput, monitoring, deploy/rollback — running it in production. |
| Security | AuthN/Z, encryption in transit/at rest, input validation, rate limiting. |
| Testability | Can components be verified in isolation and end-to-end? |
| Usability | Is the API/UX clear and hard to misuse? |
| Extensibility | Can new features/entities be added without a rewrite? |
| Portability | Can it move across environments/clouds without deep coupling? |
Quantify them: "p99 read < 200 ms," "99.99% availability," "eventual consistency within 1 minute." A number turns a buzzword into a design constraint that justifies caching, replication, or async pipelines.
CAP theorem
Brewer's theorem: a distributed data store can guarantee at most two of three — Consistency, Availability, Partition tolerance. Networks partition in the real world, so P is non-negotiable — which means under a partition you must choose between consistency and availability.
flowchart TD
P{"Network partition happens"}
P -->|"choose Consistency"| CP["CP: reject/block until consistent"]
P -->|"choose Availability"| AP["AP: always answer, may be stale"]
CP --> CPe["Ticketmaster booking, Auction bids, Uber matching"]
AP --> APe["News Feed, Yelp search, Web Crawler, Tinder stack"]
| Term | Meaning | Example on this site |
|---|---|---|
| Consistency | Every read sees the most recent write | Auction highest bid, seat booking |
| Availability | Every request gets a (non-error) response | Feed reads, search, redirects |
| Partition tolerance | Works despite dropped/delayed network messages | Any multi-node system |
In practice you choose per data path, not per system: Ticketmaster runs a CP transaction plane (booking) alongside an AP discovery plane (search). "Availability >> consistency" or the reverse is the single most useful NFR to state up front. PACELC extends this: else (no partition) you still trade Latency vs Consistency.
REST resource modeling
Clean REST falls out of your core entities. Resources are your core entities, named with plural nouns; the HTTP method is the verb — never put verbs in the path.
GET /events # get all events
GET /events/{id} # get a specific event
GET /venues/{id} # get a specific venue
GET /events/{id}/tickets # available tickets for an event
POST /events/{id}/bookings# create a booking for an event
GET /bookings/{id} # get a specific booking
# NOT:
POST /events/create <- no verbs in the path!
-
Methods carry intent:
GET(read),POST(create),PUT/PATCH(update/idempotent),DELETE. Make writes idempotent with a client key so retries are safe. -
Nesting shows ownership
(
/events/{id}/bookings), but don't nest more than ~2 levels. -
Pagination: prefer
cursor/keyset over
OFFSETfor large or live lists — offsets re-scan skipped rows and shift when new items arrive. -
Identity from the token, never the body — never
trust a client-supplied
userId.
PostgreSQL — when & why
The default relational choice across these designs (YouTube metadata, Yelp, Ticketmaster, auctions). An object-relational, open-source database with broad SQL compliance.
- ACID transactions — the reason to pick it for money/inventory correctness (bookings, bids).
-
MVCC concurrency — readers don't block writers;
needs
VACUUMto reclaim dead tuples. -
Extensible & rich types —
JSON/JSONB, arrays, PostGIS (geo), full-text (tsvector+ GIN), custom types/extensions. - Indexing — B-tree default, plus GIN/GiST/BRIN for full-text, geo, and ranges.
Reach for it when you need transactions, ad-hoc queries/joins, ranges, or a single store that does geo + full-text + ACID. Scale reads with replicas; scale writes by sharding on a high-cardinality key. Trade-off: horizontal write scaling is manual versus a natively partitioned NoSQL store.
DynamoDB & TTL — when & why
The default when the access pattern is key-based at massive, predictable scale (Dropbox metadata, FB feed tables, Uber rides). A managed key-value / wide-column store partitioned by key.
- Partition key (+ optional sort key) — design the table around your queries; single-digit-ms point reads; horizontal scale is automatic.
-
Conditional writes — atomic compare-and-set
(
attribute_not_exists,ConditionExpression) — the basis for race-free locks and "beat the highest bid." - TTL auto-expiry — set an epoch-seconds attribute and DynamoDB deletes the item automatically in the background. No cron sweeper needed.
TTL is a system-design primitive
The auto-expiring item powers several designs on this site:
Uber/Ticketmaster locks (a seat/driver hold that
releases itself if checkout is abandoned), and
WhatsApp inbox
cleanup. "Write a row with a TTL" replaces a whole
background-deletion service. Note TTL deletion is
eventual (minutes), so guard the hot path with a
WHERE/condition too.
Trade-off: no rich joins/ad-hoc queries (model for known access patterns), and you design around hot partitions. Pick it for write-heavy, key-addressable workloads where you'd otherwise shard SQL by hand.
Behavioral interview
Senior/staff loops grade behavioral signal as heavily as the technical rounds. That material now has its own dedicated pages: