Analytics & Streaming
Ad Click Aggregator
When a user clicks an ad, redirect them to the advertiser and aggregate per-ad click counts over time windows for near-real-time advertiser analytics. The central tension is real-time vs. accurate — resolved with a hybrid Lambda architecture: a streaming speed layer plus a batch layer that recomputes exact counts.
Requirements
Functional
- Users click an ad and get redirected; advertisers query click metrics over time at 1-minute granularity.
Non-functional
- Scalable to 10k clicks/sec; analytics queries < 1 s; fault tolerant.
- Data integrity (billing-grade — no lost/double-counted clicks); as real-time as possible; idempotent clicks.
Scale & back-of-the-envelope
- 10M active ads; 10k clicks/sec peak ≈ ~860M/day; ~100–200 B/event → ~100 GB/day archived to S3 (~36 TB/yr).
- Stream shards: 10k ÷ 1,000 rec/s/shard = ≥10 shards (provision 20–50 for skew); 7-day retention for replay.
-
Ingestion is write-heavy & skewed (a viral ad blows past one
shard); reads are low-QPS but scan-heavy →
pre-aggregate by
(adId, minute)to hit sub-second queries.
API design
GET /click?adId={adId}&impressionId={impId}&signature={sig}
-> 302 Found Location: {redirectUrl} # the ad links HERE, not direct to advertiser
GET /api/v1/ads/{adId}/aggregates?from&to&granularity=minute
-> { series: [ { minute, clicks }, ... ] }
302, never 301
A 301 (permanent) is cached by the browser, so later clicks skip our
server → undercounting. 302/307 guarantees every
click hits the Click Processor. impressionId +
signature make the click verifiable and idempotent.
High-level design
The Click Processor validates (verify signature), dedups (Redis), looks up the redirect URL, issues the 302, and emits the click to a stream. Flink windows + aggregates into an OLAP store (speed layer); raw events also land in S3 for a daily Spark reconciliation (batch layer).
flowchart LR
U["User Browser"] -->|"GET /click?adId"| GW["API Gateway"]
GW -->|"302 redirect"| U
GW --> CP["Click Processor Svc"]
CP -->|"getRedirectUrl(adId)"| ADB[("Ad DB")]
CP -->|"dedup check"| R[("Redis")]
CP -->|"click event"| K["Kinesis (Kafka-equiv)"]
K --> F["Flink (stream agg)"]
F --> OLAP[("OLAP store")]
K --> S3[("S3 raw events")]
CRON["Cron Scheduler"] --> SP["Spark (map reduce)"]
S3 --> SP
SP --> RW["Reconciliation Worker"]
RW --> OLAP
ADV["Advertiser Browser"] --> QS["Query Svc"]
QS --> OLAP
Deep dive · stream aggregation with windowing
Flink keys by adId and maintains
1-minute tumbling windows on
event time (with watermarks), but
flushes partial results every 10 s so dashboards
update in near real time. The OLAP sink upserts keyed
by (adId, minute); combined with Flink checkpointing this
gives exactly-once — a replay overwrites rather than
double-counts.
Flush trade-off: smaller interval = fresher but more OLAP write amplification; 10 s is the chosen balance. Event-time (not processing-time) keeps counts correct even when ingestion lags.
Deep dive · idempotency / dedup
A single click must count once despite double-clicks,
retries, and replay. At impression time the Ad service generates a
signed impressionId (HMAC) embedded in the click URL; the
Click Processor verifies the signature, then does an atomic
SETNX impressionId in Redis — first
write wins, duplicates are dropped (the user is still redirected).
sequenceDiagram
participant U as User
participant CP as Click Processor
participant AD as Ad DB
participant R as Redis
participant K as Kinesis
U->>CP: GET /click adId impressionId sig
CP->>CP: verify signature
CP->>R: SETNX impressionId
alt new click
R-->>CP: stored (1)
CP->>AD: getRedirectUrl(adId)
CP->>K: emit click event
CP-->>U: 302 redirect
else duplicate
R-->>CP: exists (0)
CP-->>U: 302 redirect, drop event
end
Redis TTL bounds memory (10k/s × 600 s ≈ 6M keys — trivial). The
batch layer dedups on eventId over the
full S3 archive as the durable backstop if a key expired or Redis lost
data.
Deep dive · speed layer vs batch reconciliation
The streaming path is fast but approximate (drifts on failures, very-late events, hot-key skew). A batch path recomputes ground truth from the immutable S3 log and fixes the OLAP store.
flowchart LR
K["Kinesis Stream"] --> SPD["Speed layer: Flink"]
K --> S3[("S3 raw archive 7d")]
SPD -->|"approx, realtime"| OLAP[("OLAP serving")]
S3 --> BAT["Batch layer: Spark daily"]
BAT --> RW["Reconciliation Worker"]
RW -->|"exact, corrected"| OLAP
OLAP --> DASH["Advertiser Dashboard"]
| Kappa (stream-only) | Lambda (chosen) | |
|---|---|---|
| Idea | One continuous stream | Batch (historical) + speed (real-time) |
| Pros | Simpler, one code path | Highest accuracy; real-time + correct |
| Cons | Correctness rides entirely on the stream | Two pipelines to maintain |
Billing-grade integrity demands the reconciling batch layer: real-time
numbers appear in seconds; the daily Spark job overwrites any
incorrect (adId, minute) rows so the dashboard
self-heals. Late/out-of-order events are absorbed by watermarks +
allowed lateness; the long tail is swept up by reconciliation.
Deep dive · hot-ad partitioning
A viral ad can exceed a single shard's 1,000 rec/s and create a hot
key that overwhelms one Flink task. For hot ads only,
salt the key adId:0..N to spread load
across N shards/tasks, then sum the partial counts.
flowchart TD
HOT["Hot ad adId=42 (viral)"] --> SUF["Append suffix 0..N to key"]
SUF --> S0["shard key 42:0"]
SUF --> S1["shard key 42:1"]
SUF --> S2["shard key 42:N"]
S0 --> F0["Flink task 0"]
S1 --> F1["Flink task 1"]
S2 --> F2["Flink task N"]
F0 --> M["Merge partial counts"]
F1 --> M
F2 --> M
M --> OLAP[("OLAP: adId=42 total")]
Apply salting selectively (only detected hot keys) to avoid multiplying shard count for cold ads; Kinesis resharding + Flink rescaling handle sustained growth.
Key decisions recap
302 not 301; signed impressionId + Redis
SETNX (+ batch dedup on eventId); Kinesis
sharded by adId with adId:0..N salting;
Flink 1-min tumbling / 10-s flush / event-time / exactly-once via
checkpoints + idempotent upserts; OLAP keyed on
(adId, minute); hybrid Lambda for real-time
and accurate.