Core Systems
Bit.ly — URL Shortener
Convert long URLs into compact, unique short codes and redirect visitors back to the original destination in ~200 ms — at 100M DAU and 1B stored URLs, with optional custom aliases, expiration, and analytics.
Requirements
Functional
- Create a short URL from a long URL.
- Optionally support a custom alias (user-chosen short code).
- Optionally support an expiration time for the short URL.
- Redirect the user to the original URL when they visit the short link.
Non-functional
- Low redirect latency — target ~200 ms (the hot path).
- Scale — 100M DAU, 1B URLs.
- Uniqueness — every short code is globally unique.
- High availability; eventual consistency is acceptable on the write path.
- Heavily read-skewed (~100:1) — redirects dominate.
Below the line (deferred)
User accounts (assumed to exist), rich click analytics, and link editing are treated as extensions.
Scale & back-of-the-envelope
| Metric | Value | Derivation |
|---|---|---|
| Daily active users | 100M (10⁸) | given |
| Total URLs | 1B (10⁹) | given |
| Avg redirect QPS | ~1,000 RPS | 10⁸ ÷ 10⁵ s/day |
| Peak redirect QPS | ~10k–100k RPS | avg × 10–100 peak factor |
| Read : write | ~100 : 1 | reads dominate |
| Storage | ~0.5 TB | 1B rows × ~500 B |
Short-code length (base62 = 0-9 A-Z a-z)
62^5 ≈ 916M— too few for 1B URLs.-
62^6 ≈ 56.8B— comfortably sufficient → target 6–7 chars. 62^7 ≈ 3.5T— ample headroom.
Random generation is subject to the
birthday paradox (~expected collisions grow with
n²/2M), which is why a uniqueness strategy matters (see
deep dive).
Core entities
- Original URL — the long destination.
- Short URL — the generated code or custom alias that maps to the original.
- User — the creator/owner of a short URL.
API design
Shorten
POST /urls
{
"originalUrl": "https://example.com/very/long/path?x=1",
"alias": "my-link", // optional custom alias
"expirationTime": "2026-12-31T00:00:00Z" // optional
}
201 Created → { "shortUrl": "https://bit.ly/abc123" }
409 Conflict → requested custom alias already taken
Redirect
GET /{shortCode}
302 Found
Location: <originalUrl>
404 / 410 → unknown or expired code
Write endpoints sit behind auth + rate limiting to prevent abuse and keyspace scraping.
High-level design
A client hits an API Gateway (TLS, auth, rate
limiting, routing) that splits traffic into a
Read Service (redirects — the hot path) and a
Write Service (creating short URLs). The Read Service
is fronted by a Redis read-through LRU cache keyed
shortCode → longUrl; on a miss it falls back to the
database. The Write Service generates a unique code via an atomic
global counter, persists the mapping, and returns the
short URL. Redirects are returned as 302.
flowchart LR
Client["Client"]
GW["API Gateway"]
Read["Read Service"]
Write["Write Service"]
Cache["Redis Cache (LRU)"]
Counter["Global Counter (INCR)"]
DB[("Database")]
Client -->|"shorten / visit"| GW
GW --> Read
GW --> Write
Read -->|"read-through"| Cache
Read --> DB
Write --> DB
Write -->|"INCR"| Counter
GW -->|"302 redirect"| Client
Read/write split behind a gateway; Redis fronts the redirect hot path.
Deep dives
1 · Short-code generation
Codes must be fast, unique, and short. Four candidates:
| Approach | How | Verdict |
|---|---|---|
| Prefix of long URL | Leading chars | ❌ Not unique — shared prefixes collide. |
| Random → base62 | Random int, encode | ⚠️ Birthday-paradox collisions; needs a DB check + retry. |
| Hash → base62 | md5(url), truncate to 6 |
⚠️ Deterministic dedup, but truncation re-introduces collisions. |
| Counter + base62 ✅ | Atomic INCR, encode |
✅ Guaranteed unique, no check, fast. ❌ Predictable. |
Chosen: counter + base62, mitigating predictability
with a bijective scramble
(e.g. sqids/Hashids) so codes are non-sequential yet still unique
& reversible, plus
rate limiting and a "don't shorten private URLs"
caveat. Scale the counter with Redis INCR and
range/block leasing (each writer leases 1,000 IDs at
a time) to remove per-request contention.
flowchart TD
Start["Need short code"]
Q1{"Custom alias?"}
UseAlias["Use alias, verify unique"]
Gen["Counter + base62"]
Scramble["Bijective scramble (sqids)"]
Done["Store mapping"]
Start --> Q1
Q1 -->|yes| UseAlias
Q1 -->|no| Gen
Gen --> Scramble
Scramble --> Done
UseAlias --> Done
2 · Read path & caching (scaling the hot path)
Redirects carry ~100× the traffic of writes, so the path is cache-first. URL popularity follows a power law, so a modest Redis cache yields a high hit ratio and keeps most redirects in single-digit ms. On a miss, read the DB and populate the cache with a TTL aligned to the URL's expiration; add read replicas and optionally a CDN/edge layer.
sequenceDiagram
participant C as Client
participant R as Read Service
participant Cache as Redis
participant DB as Database
C->>R: GET /shortCode
R->>Cache: GET shortCode
alt cache hit
Cache-->>R: longUrl
else miss
Cache-->>R: nil
R->>DB: SELECT longUrl WHERE shortCode
DB-->>R: longUrl
R->>Cache: SET shortCode longUrl
end
R-->>C: 302 redirect to longUrl
3 · 301 vs 302 redirect
| 301 Permanent | 302 Found ✅ | |
|---|---|---|
| Browser caches redirect | Yes | No |
| Later clicks hit our service | No | Yes |
| Analytics / expiry | ❌ Lost / hard | ✅ Every click counted |
| Server load | Lower | Slightly higher |
Use 302 because expiration and analytics require every click to reach the service. Choose 301 only when raw redirect performance outweighs tracking.
4 · Expiration & analytics
Lazy deletion on read (now > expirationTime
→ 410 + evict from cache) plus a
background purge job. Analytics are emitted
asynchronously (fire-and-forget to Kafka → stream
processing) so they never block the redirect.
Data model
| Column | Type | Notes |
|---|---|---|
short_url |
VARCHAR(7) | Primary key — code or custom alias; redirect is a single point-read by PK |
long_url |
VARCHAR(2048) | Original destination |
expiration_time |
TIMESTAMP NULL | Optional expiry |
creation_time |
TIMESTAMP | When created |
user_id |
BIGINT | Owner (FK → users) |
Indexes: clustered PK on short_url (the
O(1) redirect lookup), secondary on user_id ("my links"),
optional on expiration_time for cleanup.
DB choice: the access pattern is a key-value lookup
by short code, so either relational (Postgres/MySQL with replicas,
shard by hash(short_code)) or a NoSQL KV store
(DynamoDB/Cassandra with native TTL) works. Shard by
short_code for even load.
Bottlenecks & scaling
| Concern | Risk | Mitigation |
|---|---|---|
| Read hot path | 10k–100k peak RPS | Redis read-through + LRU; DB read replicas; CDN/edge |
| Hot keys | Viral links dominate | Cache absorbs; replicate hot keys; edge caching |
| Counter | SPOF / contention |
Redis INCR + per-writer range leasing; replicate
|
| Predictable codes | Enumeration / scraping | Bijective scramble + rate limiting |
| Write scale | DB throughput | Shard by hash(short_code) |
| Expired data | Table bloat | Lazy delete + background purge; cache TTL |
Why split read & write services?
They scale independently — the Read Service scales out for the 100:1 read ratio and 200 ms target, while a Write outage never takes down redirects.