Distributed Systems

Caching & CDNs

HTTP caching, reverse proxies, CDN architecture, and the hardest problem in computer science — cache invalidation.

Prerequisites: HTTP basics + Client-server model. That's it.
10
Chapters
8+
Simulations
0
Assumed Knowledge

Chapter 0: The Latency Tax

A user in Sydney requests your website. Your origin server is in Virginia. The request travels 16,000 km through undersea fiber: Sydney to Los Angeles (12,000 km), then across the US to Virginia (4,000 km). Round trip at the speed of light in fiber: 160 milliseconds. But light speed is the theoretical minimum — real networks add routing, congestion, and protocol overhead. The actual round trip: 280-350 milliseconds. For a single HTTP request.

Now multiply. A typical web page makes 50-100 HTTP requests: HTML, CSS, JavaScript, fonts, images, API calls. Even with HTTP/2 multiplexing, the browser needs at least 3-5 round trips to render the page (DNS lookup, TCP handshake, TLS handshake, HTML fetch, then asset fetches). That's over a second of latency, just waiting for packets to cross the ocean.

Your competitor serves the same content from a server in Sydney. Their page loads in 200 milliseconds. Research shows that every 100ms of added latency reduces conversion by 1%. You're losing 3-4% of your revenue to physics.

You can't beat the speed of light. But you can change WHERE the data is served from. Caching stores copies of data closer to the user — in the browser, in a nearby edge server, in a reverse proxy in your data center. The fastest request is the one that never crosses the ocean.

The Cache Hit Ratio: The Only Metric That Matters

// Cache hit ratio = fraction of requests served from cache
Hit ratio = cache_hits / (cache_hits + cache_misses)

// Example: 10,000 requests/sec, 95% hit ratio
Origin handles: 10,000 × 0.05 = 500 requests/sec
Cache handles: 10,000 × 0.95 = 9,500 requests/sec

// Without cache: origin needs to handle 10,000 req/sec (20x more!)
// At $0.001/request, cache saves: 9,500 × $0.001 × 86,400 sec/day = $820/day

A 95% cache hit ratio means your origin server handles 20x less traffic. A 99% ratio means 100x less. The difference between 95% and 99% is the difference between needing 10 origin servers and needing 2.

Latency With and Without Caching

Watch requests travel from Sydney to Virginia (no cache) vs. hitting a local edge server (cached). Click to send requests and compare latency.

Click "Send Request" to see how long it takes with and without a cache.
Quick check: Your origin server handles 1000 requests/sec at maximum capacity. You add a cache with a 90% hit ratio. How many requests/sec can your system now handle?

Chapter 1: Cache Fundamentals

A cache is a high-speed storage layer that stores a subset of data, typically transient, so that future requests for that data are served faster. Caches exploit two properties of real-world access patterns:

PropertyWhat it meansExample
Temporal localityData accessed recently is likely to be accessed again soonA trending news article gets millions of views in one hour
Spatial localityData near recently accessed data is likely to be accessed soonAfter reading page 1, the user reads page 2

The Cache Hierarchy

In a web application, caches exist at every layer, from closest-to-user (fastest, smallest) to closest-to-origin (slowest, largest):

Browser Cache
In the user's browser. Stores static assets (JS, CSS, images). Fastest possible — zero network latency. Controlled by HTTP headers.
↓ miss
CDN Edge
Nearest edge server (e.g., Cloudflare Sydney). 5-50ms latency. Caches HTML, assets, and sometimes API responses.
↓ miss
Reverse Proxy
Nginx/Varnish in your data center. Sits in front of your application servers. Caches computed responses.
↓ miss
Application Cache
Redis/Memcached. Caches database query results, computed data, session state.
↓ miss
Database Cache
Query cache, buffer pool (pages in RAM). Internal to the DB.
↓ miss
Disk
Actual durable storage. The "source of truth." Slowest.

Eviction Policies

Caches have limited space. When full, they must choose what to evict. The eviction policy determines cache effectiveness.

PolicyEvictsProsCons
LRU (Least Recently Used)Item not accessed for the longest timeSimple, good for temporal localityFull scan can evict hot items
LFU (Least Frequently Used)Item with fewest accessesKeeps popular itemsSlow to adapt; old popular items stick
FIFO (First In, First Out)Oldest item regardless of accessSimplest to implementIgnores access patterns
RandomRandom itemNo metadata overheadUnpredictable performance
W-TinyLFUWindow-based frequency estimateBest hit ratio in practiceComplex (used by Caffeine library)
LRU is king. In practice, LRU (or its approximation) wins for most workloads. Redis uses an approximate LRU that samples 5 random keys and evicts the one with the oldest access time. This is O(1) per eviction and surprisingly close to true LRU in hit ratio. Most CDNs also use LRU or variants.
Cache Eviction Simulator

A cache with 5 slots. Watch items enter, get accessed, and get evicted under different policies. The hit ratio counter shows effectiveness.

Cache has 5 slots. Access items to see how LRU eviction works.
Check: Your cache has 4 slots and uses LRU eviction. You access items in this order: A, B, C, D, E, A, B. After all accesses, which item was evicted?

Chapter 2: HTTP Cache-Control

The browser and CDN need to know: "Can I cache this? For how long?" HTTP provides the Cache-Control header to answer these questions. This is the most important header in web performance.

Cache-Control Directives

DirectiveMeaningUse case
publicAny cache (browser, CDN, proxy) may store thisStatic assets, public HTML
privateOnly the browser may cache this (not CDN/proxy)User-specific data (dashboard, profile)
max-age=NCache is fresh for N secondsmax-age=3600 = 1 hour
s-maxage=NMax-age for shared caches (CDN/proxy) onlyCDN caches for 60s, browser for 3600s
no-cacheMust revalidate with origin before using cached copyHTML pages that might change
no-storeDon't cache at all — not in memory, not on diskSensitive data (banking, health records)
immutableNever changes — don't even bother revalidatingVersioned assets (main.abc123.js)
stale-while-revalidate=NServe stale for N sec while fetching fresh copy in backgroundNews feeds, product listings

Real-World Header Examples

// Static asset with content hash in filename (best practice)
Cache-Control: public, max-age=31536000, immutable
// 1 year. The filename changes when content changes, so no invalidation needed.
// main.abc123.js → main.def456.js

// HTML page (might change any time)
Cache-Control: public, no-cache
// Browser must check with server before using cached copy (via ETag).
// If unchanged, server responds 304 Not Modified (no body = fast).

// User dashboard (private, per-user data)
Cache-Control: private, max-age=60
// Browser caches for 60 sec. CDN/proxy must NOT cache.

// Banking page
Cache-Control: no-store
// NOTHING caches this. Every request goes to origin.

// Product listing (stale acceptable for a few seconds)
Cache-Control: public, max-age=10, stale-while-revalidate=60
// Fresh for 10s. After that, serve stale for up to 60s while background-refreshing.
Common mistake: no-cache does NOT mean "don't cache." It means "cache it, but always ask the server if it's still valid before using it." This is called revalidation, and it uses ETags (covered next chapter). The directive that actually prevents caching is no-store. Getting these confused causes either security vulnerabilities (caching private data) or unnecessary cache misses (not caching public data).
Cache-Control Header Builder

Toggle directives to build a Cache-Control header. The resulting behavior is shown in real-time.

max-age (seconds) 3600
Toggle directives to build a header.
Check: You serve a JavaScript bundle at /static/main.abc123.js where abc123 is a content hash. What Cache-Control header gives the best performance?

Chapter 3: ETags & Conditional Requests

Cache-Control tells the browser HOW LONG to cache. But what happens when that time expires? The browser needs to check: "Has this content actually changed, or can I keep using my cached copy?" This is revalidation, and it uses ETags.

An ETag (entity tag) is a fingerprint of the resource. It's a string that changes whenever the content changes. When the server sends a response, it includes an ETag header. When the browser revalidates, it sends that ETag back in an If-None-Match header.

The Conditional Request Flow

1. First Request
Browser: GET /page.html
Server: 200 OK, ETag: "abc123", Cache-Control: no-cache
Browser stores response + ETag
2. Revalidation
Browser: GET /page.html, If-None-Match: "abc123"
"I have version abc123. Is it still current?"
3a. Not Modified
Server: 304 Not Modified (NO BODY — saves bandwidth!)
Browser uses cached copy. ~100 bytes vs. 50KB for full response.
or
3b. Modified
Server: 200 OK, ETag: "def456", [new content]
Browser replaces cached copy with new version.

Strong vs. Weak ETags

// Strong ETag: content is byte-for-byte identical
ETag: "abc123"
// Used for: exact content matching, range requests (partial downloads)

// Weak ETag: content is "semantically equivalent" but may differ in bytes
ETag: W/"abc123"
// Used for: HTML pages where whitespace or ad placements change
// but the meaningful content is the same
ETags vs. Last-Modified. HTTP also supports revalidation via Last-Modified + If-Modified-Since headers (date-based). ETags are strictly more powerful — they handle cases where content changes twice in the same second, or where content reverts to an older version. Use ETags. Last-Modified is a fallback for legacy systems.
ETag Conditional Request Flow

Watch the browser send requests with and without ETags. Notice how 304 responses save bandwidth.

Send a first request, then revalidate to see 304 Not Modified.
Check: A browser sends GET /data with If-None-Match: "v5". The content has not changed (ETag is still "v5"). How much data does the server send back?

Chapter 4: Reverse Proxies

A reverse proxy sits between clients and your origin servers. It intercepts every request, checks its cache, and either serves the cached response immediately or forwards the request to the origin. The client doesn't know the proxy exists.

This is different from a forward proxy (like a corporate web filter), which sits between the client and the internet. A reverse proxy sits between the internet and your servers.

What Reverse Proxies Do

FunctionHow it helpsTools
CachingStores responses in RAM; serves without hitting originVarnish, Nginx, HAProxy
Load balancingDistributes requests across multiple origin serversNginx, Envoy, HAProxy
SSL terminationHandles TLS encryption/decryption; origin speaks plain HTTPAll major proxies
CompressionGzip/Brotli compresses responses before sending to clientNginx, Cloudflare
Request collapsingMultiple identical requests during a miss → one origin requestVarnish, Nginx

Request Collapsing (Thundering Herd Protection)

When a popular cached item expires, hundreds of concurrent requests for it arrive at the same instant. Without protection, all of them miss the cache and hit your origin simultaneously — the thundering herd. Request collapsing prevents this: the first request goes to origin, and all subsequent identical requests wait for the first one to return, then they all get the same response.

// Without request collapsing:
Cache expires at t=0
t=0.001s: Request 1 → origin (miss)
t=0.002s: Request 2 → origin (miss)
t=0.003s: Request 3 → origin (miss)
...
t=0.050s: Request 50 → origin (miss)
// Origin gets 50 identical requests. 49 are redundant.

// With request collapsing:
t=0.001s: Request 1 → origin (miss, becomes the "fill")
t=0.002s: Request 2 → waits for Request 1
t=0.003s: Request 3 → waits for Request 1
...
t=0.200s: Request 1 returns. All waiters get the same response.
// Origin gets 1 request. Cache is repopulated.
Varnish is the speed demon. Varnish stores everything in RAM (no disk) and uses a custom configuration language (VCL) for routing logic. It's specifically designed for HTTP caching and can serve 100,000+ requests/sec from a single node. Nginx is more versatile (also does load balancing, SSL, static serving) but its caching is slightly less optimized. In practice, many deployments use Nginx as the load balancer/SSL terminator with Varnish behind it as the cache layer.
Reverse Proxy Caching

Requests arrive at the reverse proxy. Hits are served instantly (green). Misses go to origin (yellow). Watch how the thundering herd is handled with and without request collapsing.

Send requests to see cache hits (green) and misses (yellow).
Check: A cached item expires. In the next 100ms, 200 requests arrive for that item. Request collapsing is enabled. How many requests reach the origin server?

Chapter 5: CDN Architecture

A Content Delivery Network is a geographically distributed network of reverse proxy servers. Instead of one reverse proxy in your data center, a CDN has hundreds or thousands of edge servers (also called Points of Presence, or PoPs) in cities around the world. Each PoP caches your content and serves it to nearby users.

How CDN Routing Works

When a user requests your content, the CDN must route them to the nearest (or best) edge server. Two common approaches:

MethodHow it worksPros/Cons
DNS-based routingCDN controls the DNS resolution. User's DNS resolver gets an IP for the nearest PoP.Simple. But DNS is cached, so changes are slow (TTL-dependent).
Anycast routingMultiple PoPs advertise the same IP via BGP. Internet routing sends packets to the nearest one.Fast, no DNS dependency. Used by Cloudflare, Google.

CDN Request Flow

1. DNS Resolution
User resolves cdn.example.com → CDN's DNS returns IP of nearest PoP (e.g., Sydney)
2. Edge Check
Request hits Sydney PoP. Edge checks its cache. HIT → return cached response (5ms).
↓ miss
3. Shield/Mid-Tier
Edge asks the "shield" (regional parent cache). HIT → return to edge (20ms). MISS → go to origin.
↓ miss
4. Origin Fetch
Shield fetches from your origin server (300ms). Response flows back through shield → edge → user. Both caches store the response.
The shield layer matters. Without a shield, every edge PoP that misses asks the origin directly. With 200 PoPs and a popular item that expires simultaneously, your origin gets 200 requests. With a shield (one per region), only 5-10 shields ask the origin. The shield is request collapsing at a global scale.

What CDNs Actually Cache

// Default: CDNs cache based on Cache-Control headers
// Static assets (JS, CSS, images, fonts): always cached
// HTML: depends on headers (usually no-cache or short max-age)
// API responses: depends on Vary header and cacheability

// Cache key = URL + relevant Vary headers
// GET /api/products?page=1 → one cache entry
// GET /api/products?page=2 → different cache entry
// GET /api/products?page=1 Accept-Encoding: br → SAME entry (Vary: Accept-Encoding handled by CDN)
CDN Global Network

A CDN with edge PoPs around the world and one origin. Click a city to send a request and watch how it routes through the CDN hierarchy.

Click a city to request content from that edge PoP.
Check: A CDN has 100 edge PoPs and 5 regional shields. A popular page expires at the same time everywhere. Without shields, how many requests hit origin? With shields?

Chapter 6: Cache Hierarchies

Real systems don't have one cache — they have many, layered from closest-to-user to closest-to-origin. Understanding how requests flow through these layers is essential for debugging performance issues and setting correct cache headers.

The Full Request Path

// User in Sydney requests https://example.com/api/products

Layer 1: Browser cache
  Checks local cache. Has entry with max-age=60, 45 seconds old.
  FRESH → serve from browser. 0ms latency, 0 bytes transferred.

// 20 seconds later, same request. Entry is now 65 seconds old (expired).

Layer 1: Browser cache
  Entry expired. Sends conditional request with ETag.
Layer 2: CDN edge (Sydney PoP)
  Receives request. Checks its cache. Has fresh entry (s-maxage=300).
  HIT → responds 200 with current data. 5ms latency.
  Browser updates its cache. Sets new max-age timer.

// 10 minutes later. CDN entry also expired.

Layer 1: Browser cache
  Expired → conditional request.
Layer 2: CDN edge
  MISS → forwards to shield.
Layer 3: CDN shield (Singapore)
  Has fresh entry. HIT → responds to edge. 20ms.
  Edge caches the response, responds to browser.

// 30 minutes later. Shield entry also expired.

Layer 4: Reverse proxy (Nginx in Virginia)
  MISS → forwards to application.
Layer 5: Application cache (Redis)
  HIT → responds to Nginx. 1ms.
Layer 6: Database
  Not reached. Saved a 10ms database query.
Each layer has different TTLs. A common pattern: browser caches for 60 seconds (max-age=60), CDN caches for 5 minutes (s-maxage=300), application cache caches for 30 minutes. This means the browser makes a network request every 60 seconds, but the CDN only hits origin every 5 minutes, and origin only hits the database every 30 minutes. Each layer shields the one behind it.

The Vary Header: Cache Key Control

The Vary header tells caches to store separate entries for requests that differ in specific headers. Without Vary, a cache might serve a gzipped response to a client that can't decompress it.

// Response header:
Vary: Accept-Encoding
// Cache stores separate entries for:
// Accept-Encoding: gzip → gzipped response
// Accept-Encoding: br → brotli response
// (no encoding) → uncompressed response

// Vary: Accept-Language
// Separate entries for en-US, ja, de, etc.

// DANGEROUS: Vary: Cookie
// Every unique cookie value = different cache entry
// This effectively disables caching (every user has different cookies)
Cache Hierarchy Simulator

Trace a request through all cache layers. Each layer shows its TTL and hit/miss status. Watch how requests cascade through layers on misses.

Send a request to trace it through all cache layers.
Check: You set Cache-Control: public, max-age=60, s-maxage=300. A user's browser cache expires after 60 seconds. They make a new request. Where does it go?

Chapter 7: Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

Cache invalidation is the process of removing or updating stale data from a cache. It's hard because caches are distributed (browser, CDN edge, shield, proxy) and there's no reliable way to instantly reach all of them. Every invalidation strategy is a trade-off between staleness, complexity, and reliability.

Invalidation Strategies

StrategyHow it worksStaleness windowComplexity
TTL-basedSet max-age; cache expires after N secondsUp to N secondsSimplest
Purge/BanCDN API call to remove specific URLs or patternsSeconds (API propagation delay)Medium
Cache tagsTag cached entries; purge all entries with a tagSecondsMedium-high
Event-drivenDatabase change → event → purge cacheMilliseconds to secondsHigh
Versioned URLsChange URL when content changes (main.abc.js → main.def.js)Zero (new URL = new entry)Requires build pipeline

Purge vs. Ban

// Purge: remove ONE specific URL from cache
curl -X PURGE https://cdn.example.com/api/products/42
// Useful for: updating a single page or resource

// Ban: remove ALL URLs matching a pattern
curl -X BAN https://cdn.example.com/ -H "X-Ban-Pattern: /api/products/.*"
// Useful for: invalidating an entire category after bulk update

// Cache tags (Fastly, Cloudflare): tag responses, then purge by tag
// Response: Surrogate-Key: product-42 category-electronics
// Purge: POST /purge/tag/category-electronics
// → All responses tagged "category-electronics" are purged
The browser cache is the hardest to invalidate. You can purge your CDN in seconds. But you can't reach into a user's browser and delete their cached copy. If you set max-age=86400 (1 day), that user won't check for updates for 24 hours. This is why the versioned URL pattern is so powerful: changing the filename forces the browser to fetch the new version because it's a different URL. For HTML (which you can't version), use short max-age or no-cache with ETags.
Cache Invalidation Strategies

A cache with stale data. Try different invalidation strategies and see how quickly each one removes the stale entries.

Write new data, then use different strategies to invalidate the stale cache.
Check: You deploy a new version of your JavaScript bundle. Old version is cached in users' browsers with max-age=31536000 (1 year). How do you get users to load the new version?

Chapter 8: CDN Routing Simulation

This is the showcase simulation. You're running a CDN with 6 edge PoPs around the world, 2 regional shields, and one origin server. Users from different cities send requests. Watch the DNS resolution, edge cache checks, shield fallbacks, and origin fetches play out in real time. You can flush individual PoPs, create regional outages, and see how the CDN adapts.

Experiment: Send requests from multiple cities. Watch how the first request from each region misses and goes to origin, but subsequent requests hit the edge or shield. Then flush a PoP and see the request cascade. Create an outage to see failover routing.
CDN Request Routing Simulation

6 edge PoPs, 2 shields, 1 origin. Click cities to send requests. Watch the routing cascade. Green = cache hit, yellow = shield hit, red = origin fetch.

Click a city to route a request through the CDN hierarchy.

Statistics to observe:

MetricWhat to watch
Edge hit ratioShould increase as you send more requests to the same PoP
Shield hit ratioFirst request from a new PoP in the same region should hit the shield
Origin loadShould be minimal after initial warm-up
Failover latencyWhen shield is down, edge goes directly to origin (higher latency)

Chapter 9: Connections

Caching is the most impactful performance optimization in distributed systems. A well-configured cache hierarchy can reduce origin load by 100x and cut user-perceived latency from seconds to milliseconds. But it comes with the eternal challenge: keeping cached data fresh.

The Cache Strategy Cheat Sheet

Content typeCache-ControlInvalidation
Versioned assets (JS, CSS)public, max-age=31536000, immutableNew filename = automatic
HTML pagespublic, no-cache (or max-age=60)ETag revalidation
API responses (public)public, s-maxage=60, stale-while-revalidate=300Purge API on data change
API responses (private)private, max-age=30Short TTL only
Sensitive datano-storeN/A (never cached)
The golden rule: cache aggressively at the edge for static assets (versioned URLs + immutable), cache cautiously for dynamic content (short TTL + stale-while-revalidate), and never cache sensitive data (no-store). When in doubt, start with shorter TTLs and increase them after measuring your hit ratio.
The Caching Landscape

Where different caching strategies sit on the freshness-performance spectrum.

Related lessons:

"The fastest I/O is the I/O you don't do." — Anonymous engineer