Storage & Media

Dropbox — File Storage & Sync

Upload, download, and automatically sync files across devices. The hard parts: moving very large files (up to 50 GB) reliably over flaky networks, keeping every device eventually consistent, and doing it with low latency. The key move is splitting the data plane (bytes in S3, uploaded directly via presigned URLs) from the control plane (metadata in DynamoDB).

Requirements

Functional

Upload a file; download a file; automatically sync files across devices.

Non-functional

Availability >> consistency — always-on and convergent; sync is eventually consistent.
Low-latency uploads/downloads; support files up to 50 GB → resumable uploads are mandatory.
High data integrity / sync accuracy — device A must faithfully match device B.
Durability via S3 (11 nines); we don't roll our own blob store.

Scale & back-of-the-envelope

A 50 GB file @ 100 Mbps ≈ 72 min of transfer — it will be interrupted, so a monolithic POST is unusable.
Chunk size 5 MB → a 50 GB file = 10,000 chunks, each tracked, fingerprinted, independently retryable.
The app tier must never proxy 50 GB bodies → direct-to-S3 via presigned URLs.
Naive change-polling (100M clients every 30 s) ≈ 3.3M req/s → adaptive polling + push.
Content-hash dedup recovers a large fraction of multi-exabyte raw storage.

API design

POST /files                      # register metadata, get presigned chunk URLs
PUT  {presignedUrl}              # client -> S3 directly, one per 5MB chunk
POST /files/:id/chunks/:cid      # mark a chunk completed after S3 PUT
POST /files/:id:commit           # finalize: status started -> completed
GET  /files/:id                  # metadata + presigned GET URL
GET  /changes?since={cursor}     # delta sync: returns fileIds[]

Why presigned URLs? The 50 GB body never traverses our servers; S3 handles multipart, retries, and integrity; the URL is short-lived and scoped, so security stays at the control plane.

High-level design

The client (with a local DB + watched folder) talks to the gateway (auth, rate limiting, routing). The File Service issues presigned URLs and writes metadata to DynamoDB; the Sync Service computes "what changed since cursor." Bytes move directly between client and S3.

flowchart LR
    subgraph Device["Client Device"]
        App["Client App"]
        LF["Local Folder"]
    end
    App --> GW["LB and API Gateway"]
    GW -->|upload and getFile| FS["File Service"]
    GW -->|getChanges| SS["Sync Service"]
    FS --> SS
    FS -->|presigned URL| S3["Blob Store (S3)"]
    FS -->|write metadata| MD["File Metadata (DynamoDB)"]
    SS --> MD
    App -.->|direct bytes| S3
    S3 -.-> App

Deep dive · chunking + deduplication

The client splits files into 5 MB chunks, fingerprints each (hash(bytes)), and uploads directly to S3. Because a chunk's identity is its content hash, dedup and delta sync fall out for free — editing 1 byte in a 50 GB file touches one chunk.

flowchart TD
    F["File (up to 50GB)"] --> Split["Split into 5MB chunks"]
    Split --> H["Fingerprint each chunk hash(bytes)"]
    H --> Q{"Fingerprint already in S3?"}
    Q -->|Yes| Dedup["Skip upload, reuse object"]
    Q -->|No| Up["PUT chunk directly to S3"]
    Up --> Mark["Mark chunk completed"]
    Dedup --> Mark
    Mark --> Commit{"All chunks done?"}
    Commit -->|No| Q
    Commit -->|Yes| Done["status started to completed"]

Why 5 MB? Small enough that a failed chunk is cheap to retry; large enough that a 50 GB file is "only" 10,000 chunks; aligns with S3 multipart minimums. Trade-off: fixed-size chunking suffers the boundary-shift problem on mid-file inserts — content-defined (Rabin) chunking fixes it at higher CPU cost; fixed-size wins for replace/append workloads.

Deep dive · metadata vs blob split

The most important decision: never store bytes in the database, never store queryable metadata in the blob store.

Concern	Blob Store (S3)	Metadata DB (DynamoDB)
Holds	Raw chunk bytes	FileId, chunk list, name, size, status
Access	Large sequential blob R/W	Tiny key-value lookups, frequent updates
Scale	Exabytes, cheap/GB	Millions of hot items, single-digit ms
Client path	Direct via presigned URL	Via File Service

DynamoDB fits because access is key-based, needs predictable low-latency at high QPS, and its tunable, partition-tolerant model matches "availability >> consistency."

Deep dive · sync & conflict resolution

Two change paths: remote changed → pull & replace; local changed → upload. Local changes are detected with native OS file-watch APIs (FSEvents on macOS, FileSystemWatcher on Windows) — no disk busy-polling.

sequenceDiagram
    participant W as OS Watcher
    participant C as Client App
    participant SS as Sync Service
    participant MD as Metadata DB
    W->>C: File changed
    C->>C: Diff to find changed chunks
    C->>SS: Upload only changed chunks
    loop Adaptive polling
        C->>SS: GET /changes?since=cursor
        SS->>MD: Query fileIds after cursor
        MD-->>SS: fileIds[]
        SS-->>C: fileIds[] + new cursor
    end

Fast: adaptive polling (frequent when active, back off when idle) + delta sync of only changed chunks.
Consistent: a cursor on the folder ("seen up to X"), advanced only after apply; a periodic reconciliation pass compares fingerprint manifests to self-heal missed events.
Conflicts: don't block — keep both and surface a "conflicted copy". Last-writer-wins is simplest but destroys data; the conflicted-copy approach preserves integrity.

Deep dive · large-file resumable upload

The file carries status: started and a chunks[] array of { fingerprint, status, s3Link }. Resume = ask the server which chunks are already completed and upload only the rest; a crash costs at most one in-flight 5 MB chunk.

sequenceDiagram
    participant C as Client App
    participant FS as File Service
    participant MD as Metadata DB
    participant S3 as Blob Store
    C->>FS: POST /files (metadata + fingerprints)
    FS->>MD: Write metadata status=started
    FS->>S3: Request presigned URLs (missing chunks only)
    FS-->>C: Presigned URLs per pending chunk
    loop For each pending chunk
        C->>S3: PUT 5MB chunk bytes (direct)
        S3-->>C: 200 OK (ETag)
        C->>FS: Mark chunk completed
    end
    C->>FS: Commit upload
    FS->>MD: status started to completed

Exposing per-chunk completion state in our own metadata means resumability, parallelism, dedup, and delta sync all share one mechanism. CDN download: because chunks are content-addressed they are immutable and perfectly cacheable — every edge fetch still requires a short-lived presigned URL minted after gateway auth.

Data model

FileMetadata (DynamoDB)        # PK FileId; GSI on OwnerId, FolderId
  FileId, FolderId, Name, MimeType, Size, OwnerId, S3Link
  Status: started -> completed
  Chunks: [ { id=fingerprint, status, s3Link, updatedAt } ]   # dedup key

Folder   { Cursor }            # drives GET /changes?since=cursor
User     { UserId, ... }

A 50 GB file = 10,000 chunk entries; if an item nears DynamoDB's 400 KB limit, spill Chunks[] into a child table keyed by (FileId, chunkIndex). A block_ref(fingerprint → s3Link, refcount) table enables cross-user dedup + safe garbage collection.

Why it scales

Data plane (S3 + CDN) and control plane (gateway → services → DynamoDB) scale independently. Chunking is the unifying primitive delivering resumability, delta sync, dedup, parallelism, and cache-friendliness all at once.