Fundamentals

Interview Toolkit & Cheat Sheet

The cross-cutting building blocks every design on this site leans on — a repeatable delivery framework, the non-functional qualities you're scored on, CAP, clean REST modeling, and when to reach for PostgreSQL vs DynamoDB. Behavioral prep now has its own dedicated pages.

The delivery framework

Almost every design here follows the same six steps. The first half nails what the system does; the second half defends how it holds up under load and failure.

flowchart LR
    R["1 Requirements"] --> E["2 Core Entities"]
    E --> A["3 API / Interface"]
    A --> D["4 Data Flow (optional)"]
    D --> H["5 High-Level Design"]
    H --> DD["6 Deep Dives"]

Requirements — functional (features) + non-functional (qualities); state what's out of scope.
Core entities — the nouns the system stores (User, Post, Ride…).
API / interface — the contract; REST resources or event/stream shapes.
Data flow — optional; trace a request end-to-end for infra-heavy problems.
High-level design — the boxes-and-arrows that satisfy the functional requirements.
Deep dives — the hard problems and trade-offs that satisfy the non-functional requirements.

Mental model

High-level design = "does it work?" Deep dives = "does it still work at 100× scale, during a failure, and under contention?" Spend your interview minutes proportional to where the difficulty actually is.

Functional vs non-functional requirements

Functional = what the system does (post a tweet, book a seat). Non-functional = the qualities it must exhibit. Pick the 2–3 that actually dominate the problem and let them drive the deep dives — don't recite the whole list.

Quality	What you're really being asked
Scalability	Does it hold up as users/data/QPS grow by orders of magnitude?
Availability	Uptime target (how many 9s); graceful behavior during failures.
Operational characteristics	Latency, throughput, monitoring, deploy/rollback — running it in production.
Security	AuthN/Z, encryption in transit/at rest, input validation, rate limiting.
Testability	Can components be verified in isolation and end-to-end?
Usability	Is the API/UX clear and hard to misuse?
Extensibility	Can new features/entities be added without a rewrite?
Portability	Can it move across environments/clouds without deep coupling?

Quantify them: "p99 read < 200 ms," "99.99% availability," "eventual consistency within 1 minute." A number turns a buzzword into a design constraint that justifies caching, replication, or async pipelines.

CAP theorem

Brewer's theorem: a distributed data store can guarantee at most two of three — Consistency, Availability, Partition tolerance. Networks partition in the real world, so P is non-negotiable — which means under a partition you must choose between consistency and availability.

flowchart TD
    P{"Network partition happens"}
    P -->|"choose Consistency"| CP["CP: reject/block until consistent"]
    P -->|"choose Availability"| AP["AP: always answer, may be stale"]
    CP --> CPe["Ticketmaster booking, Auction bids, Uber matching"]
    AP --> APe["News Feed, Yelp search, Web Crawler, Tinder stack"]

Term	Meaning	Example on this site
Consistency	Every read sees the most recent write	Auction highest bid, seat booking
Availability	Every request gets a (non-error) response	Feed reads, search, redirects
Partition tolerance	Works despite dropped/delayed network messages	Any multi-node system

In practice you choose per data path, not per system: Ticketmaster runs a CP transaction plane (booking) alongside an AP discovery plane (search). "Availability >> consistency" or the reverse is the single most useful NFR to state up front. PACELC extends this: else (no partition) you still trade Latency vs Consistency.

REST resource modeling

Clean REST falls out of your core entities. Resources are your core entities, named with plural nouns; the HTTP method is the verb — never put verbs in the path.

GET  /events              # get all events
GET  /events/{id}         # get a specific event
GET  /venues/{id}         # get a specific venue
GET  /events/{id}/tickets # available tickets for an event
POST /events/{id}/bookings# create a booking for an event
GET  /bookings/{id}       # get a specific booking

# NOT:
POST /events/create       <-  no verbs in the path!

Methods carry intent: GET (read), POST (create), PUT/PATCH (update/idempotent), DELETE. Make writes idempotent with a client key so retries are safe.
Nesting shows ownership (/events/{id}/bookings), but don't nest more than ~2 levels.
Pagination: prefer cursor/keyset over OFFSET for large or live lists — offsets re-scan skipped rows and shift when new items arrive.
Identity from the token, never the body — never trust a client-supplied userId.

PostgreSQL — when & why

The default relational choice across these designs (YouTube metadata, Yelp, Ticketmaster, auctions). An object-relational, open-source database with broad SQL compliance.

ACID transactions — the reason to pick it for money/inventory correctness (bookings, bids).
MVCC concurrency — readers don't block writers; needs VACUUM to reclaim dead tuples.
Extensible & rich types — JSON/JSONB, arrays, PostGIS (geo), full-text (tsvector + GIN), custom types/extensions.
Indexing — B-tree default, plus GIN/GiST/BRIN for full-text, geo, and ranges.

Reach for it when you need transactions, ad-hoc queries/joins, ranges, or a single store that does geo + full-text + ACID. Scale reads with replicas; scale writes by sharding on a high-cardinality key. Trade-off: horizontal write scaling is manual versus a natively partitioned NoSQL store.

DynamoDB & TTL — when & why

The default when the access pattern is key-based at massive, predictable scale (Dropbox metadata, FB feed tables, Uber rides). A managed key-value / wide-column store partitioned by key.

Partition key (+ optional sort key) — design the table around your queries; single-digit-ms point reads; horizontal scale is automatic.
Conditional writes — atomic compare-and-set (attribute_not_exists, ConditionExpression) — the basis for race-free locks and "beat the highest bid."
TTL auto-expiry — set an epoch-seconds attribute and DynamoDB deletes the item automatically in the background. No cron sweeper needed.

TTL is a system-design primitive

The auto-expiring item powers several designs on this site: Uber/Ticketmaster locks (a seat/driver hold that releases itself if checkout is abandoned), and WhatsApp inbox cleanup. "Write a row with a TTL" replaces a whole background-deletion service. Note TTL deletion is eventual (minutes), so guard the hot path with a WHERE/condition too.

Trade-off: no rich joins/ad-hoc queries (model for known access patterns), and you design around hot partitions. Pick it for write-heavy, key-addressable workloads where you'd otherwise shard SQL by hand.

Behavioral interview

Senior/staff loops grade behavioral signal as heavily as the technical rounds. That material now has its own dedicated pages:

Behavioral Interview Framework

The STAR method, the five focus areas, building a story bank, and the prep strategy + anti-patterns.

Behavioral Questions & Answers

The exact question per focus area with a strategy and a worked example answer, plus a common-question cheat sheet.