Social & Feeds
Facebook Post Search
Full-text search over posts by keyword, sortable by likes or recency — building the inverted index, ingestion pipeline, ranking, and scatter-gather query layer ourselves (no Elasticsearch/Postgres FTS). The trick: keep two pre-sorted posting lists per keyword so the read path never sorts.
Requirements
Functional
- Create posts; like posts; search posts by keyword, sortable by likes and by time. (No filters, fuzzy matching, or personalization.)
Non-functional
- Low latency (~500 ms); high throughput; eventual consistency (~1 min) to become searchable.
- Priority: availability + low read latency over strict consistency — what makes async CDC → Kafka ingestion acceptable.
Scale & back-of-the-envelope
| Stream | Volume | Per second |
|---|---|---|
| Posts | 1B/day | ~10k/s |
| Likes | 10B/day | ~100k/s (10× posts — the dominant write problem) |
| Searches | 1B/day | ~10k/s |
| Retained posts | ~3.6T | ~3.6 PB raw → index must be sharded |
Like-count churn directly mutates the likes-sorted index → needs an in-memory counter tier; 10k searches/s with 500 ms → hot-query caching + pre-sorted lists are essential.
API design
POST /posts { content } -> 201 { postId }
PUT /posts/{id}/likes -> 200 { likeCount }
GET /search?query={kw}&sort={likes|time}&limit=20&cursor={opaque}
-> { results:[ {postId, content, likeCount, createdAt} ], nextCursor }
Multiple terms are AND-intersected; sort selects which
pre-sorted posting list to read.
No OFFSET — a cursor encodes the last
item's sort key ((likeCount, postId) or
(createdAt, postId)) so each shard resumes with a range
seek.
High-level design
A write path (create/like posts) and a read/ingestion path decoupled by CDC + Kafka — which buys the 1-minute eventual-consistency window.
flowchart LR
Client["Clients"] --> GW["API Gateway"]
GW --> PS["Post Service"]
GW --> SS["Search Service + Cache"]
GW --> LS["Likes Service"]
PS --> PDB[("Posts DB")]
LS --> LC{"Like Counts"}
LS --> LDB[("Likes DB")]
PDB -- CDC --> K{"Kafka"}
LC -- CDC --> K
LC --> S3[("Cold Storage S3")]
K --> ING["Ingestion Service"]
ING --> IDX[("Search Index")]
SS --> IDX
The index stores, per keyword, two pre-sorted posting lists:
like:1:{keyword} -> List<PostId> # ordered by like count desc
time:2:{keyword} -> List<PostId> # ordered by createdAt desc
Deep dive · inverted index & real-time updates
An inverted index maps each keyword → a posting list of post IDs. Two lists per keyword support both sort orders without sorting at query time.
sequenceDiagram
participant U as User
participant PS as Post Service
participant PDB as Posts DB
participant K as Kafka
participant IG as Ingestion Service
participant IDX as Search Index
U->>PS: POST /posts
PS->>PDB: Insert post row
PDB-->>K: CDC change event
K->>IG: Consume post event
IG->>IG: Tokenize content into keywords
IG->>IDX: Append postId to each keyword list
Note over IDX: lists kept sorted by time and by likes
Why CDC + Kafka, not dual-write? Decoupling (a slow
index never blocks post creation), replayability (rebuild the whole
index by replaying the log / S3), backpressure absorption, and no
"wrote-DB-but-crashed-before-index" inconsistency. Ingestion writes
immutable segments (LSM-style) that compact — turning
random updates into sequential writes; idempotent by
postId.
Deep dive · keeping like-counts fresh
The hardest part: 100k likes/s continuously reorder
the like:1:{keyword} lists. A like appends to Likes DB
and increments an in-memory
Like Counts counter (Redis), periodically persisted
to S3; count changes are CDC'd into the same Kafka and re-position the
post.
flowchart LR
Like["100k likes per second"] --> LS["Likes Service"]
LS --> LDB[("Likes DB")]
LS --> LC{"Like Counts counter"}
LC -- CDC --> K{"Kafka"}
LC --> S3[("Cold Storage S3")]
K --> ING["Ingestion Service"]
ING --> IDX[("Reorder like list")]
Don't index every like — reordering on every increment would melt the index. Batch/coalesce deltas over a window; only re-rank on bucket crossings (log-scaled bands — a post at 1M likes gaining 50 doesn't move, 10→100 does); keep a hot "top-K per keyword" updated aggressively and the long tail lazily. The 1-min freshness budget makes slightly stale ordering legal.
Deep dive · caching hot queries
At 10k searches/s, traffic is heavily skewed toward a few trending terms.
flowchart LR
Req["Search Request"] --> SS["Search Service"]
SS --> C{"Cache hit"}
C -- Yes --> Resp["Return cached page"]
C -- No --> RT["Scatter to index shards"]
RT --> SH1["Shard A"]
RT --> SH2["Shard B"]
SH1 --> MG["Gather and merge"]
SH2 --> MG
MG --> RK["Rank by likes or time"]
RK --> PG["Apply pagination cursor"]
PG --> WC["Write to cache"]
WC --> Resp
Cache key = (normalized query, sort, page/cursor); TTL ≤
the 1-min freshness budget so cached results never violate the
contract. Results aren't personalized → hot first pages are even
CDN-cacheable. Cache the expensive first page; deep pages are rare.
Data model
Posts DB id PK, content, authorId, createdDate
Likes DB (postId, likerId) PK, createdAt # composite PK => idempotent likes
Like Counts postId PK, likeCount # in-memory + S3 backup, CDC'd
Search Index like:1:{keyword} -> List<PostId> # ordered by likes desc
time:2:{keyword} -> List<PostId> # ordered by time desc
(immutable compacted segments, early-termination reads)
Consistency posture
Search is eventually consistent (~1 min); posts and like-orderings converge through CDC → Kafka → ingestion. This trade — availability + low read latency over strict consistency — is what makes async indexing, count batching, and short-TTL caching all valid.