Designing a URL Shortener
A URL shortener converts a long URL into a short alias and redirects visitors to the original URL. This is a classic system design interview question because it touches capacity estimation, hashing, database design, caching, and horizontal scaling — all in one problem.
Requirements
Functional
- Given a long URL, generate a short unique alias (7 characters)
- Redirect
short.url/abc1234to the original long URL - Optional: custom aliases, link expiry
- Optional: analytics (click counts, referrer)
Non-Functional
- 100 million URLs created per day
- 10:1 read-to-write ratio → 1 billion redirects per day
- Redirect latency under 10ms (p99)
- 99.9% availability
- Links are immutable once created
Capacity Estimation
Writes: 100M / 86,400 ≈ 1,160 writes/sec
Reads: 1B / 86,400 ≈ 11,600 reads/sec
Storage: 100M URLs/day × 500 bytes × 365 days × 5 years ≈ 91 TB over 5 years
Bandwidth: 11,600 reads/sec × 500 bytes ≈ 5.6 MB/s read bandwidth
High-Level Design
Encoding Scheme
Base62
Base62 uses characters 0-9, A-Z, and a-z — 62 total characters. A 7-character Base62 string provides 627 ≈ 3.5 trillion combinations, far more than needed for 5 years of URLs.
BASE62 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
def encode(num: int, length: int = 7) -> str:
result = []
while num:
result.append(BASE62[num % 62])
num //= 62
# Pad to desired length
while len(result) < length:
result.append(BASE62[0])
return "".join(reversed(result))
def decode(alias: str) -> int:
result = 0
for char in alias:
result = result * 62 + BASE62.index(char)
return result
Collision Handling
Strategy 1: Use a database auto-increment ID as the number to encode. Each row gets a unique ID, and Base62 of that ID is the alias. No collision possible.
Strategy 2: Hash the long URL (MD5/SHA256) and take the first 7 characters. Check the database for a collision. If found, append an incrementing suffix and retry.
Database Design
CREATE TABLE urls (
id BIGSERIAL PRIMARY KEY,
short_url VARCHAR(8) NOT NULL UNIQUE,
long_url TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ,
click_count BIGINT NOT NULL DEFAULT 0
);
CREATE INDEX idx_urls_short ON urls (short_url);
CREATE INDEX idx_urls_expires ON urls (expires_at) WHERE expires_at IS NOT NULL;
The short_url column is indexed for O(1) redirect lookups. The partial index on expires_at makes cleaning up expired links efficient without scanning the full table.
Caching
With a 10:1 read-to-write ratio, caching is critical. The redirect service checks Redis before hitting the database.
- Cache key: the short alias (e.g.
abc1234) - Cache value: the long URL
- Eviction policy: LRU — popular links stay hot, rarely-accessed links expire naturally
- TTL: match the link's expiry date, or 24h for non-expiring links
- Cache miss: read from database, write to cache, return redirect
A cache hit ratio of 80%+ is realistic for popular URL shorteners — most traffic goes to a small set of viral links. With 11,600 reads/sec and 80% cache hit, only 2,320 reads/sec hit the database.
Scaling
Read Scaling
Add read replicas to PostgreSQL. The redirect service can read from any replica. The shortener service writes to the primary only. This is the first scaling step — read replicas can handle 10–50x the primary's read throughput.
Write Scaling
At 1,160 writes/sec, a single PostgreSQL primary is sufficient (PostgreSQL handles ~10,000 writes/sec for simple inserts). If you need more, shard by the short URL's first character — 62 shards, each handling ~19 writes/sec.
Redirect Service Scaling
The redirect service is stateless — scale horizontally behind the load balancer. Each instance reads from Redis. No shared state between instances.