HTTP caching, reverse proxies, CDN architecture, and the hardest problem in computer science — cache invalidation.
A user in Sydney requests your website. Your origin server is in Virginia. The request travels 16,000 km through undersea fiber: Sydney to Los Angeles (12,000 km), then across the US to Virginia (4,000 km). Round trip at the speed of light in fiber: 160 milliseconds. But light speed is the theoretical minimum — real networks add routing, congestion, and protocol overhead. The actual round trip: 280-350 milliseconds. For a single HTTP request.
Now multiply. A typical web page makes 50-100 HTTP requests: HTML, CSS, JavaScript, fonts, images, API calls. Even with HTTP/2 multiplexing, the browser needs at least 3-5 round trips to render the page (DNS lookup, TCP handshake, TLS handshake, HTML fetch, then asset fetches). That's over a second of latency, just waiting for packets to cross the ocean.
Your competitor serves the same content from a server in Sydney. Their page loads in 200 milliseconds. Research shows that every 100ms of added latency reduces conversion by 1%. You're losing 3-4% of your revenue to physics.
A 95% cache hit ratio means your origin server handles 20x less traffic. A 99% ratio means 100x less. The difference between 95% and 99% is the difference between needing 10 origin servers and needing 2.
Watch requests travel from Sydney to Virginia (no cache) vs. hitting a local edge server (cached). Click to send requests and compare latency.
A cache is a high-speed storage layer that stores a subset of data, typically transient, so that future requests for that data are served faster. Caches exploit two properties of real-world access patterns:
| Property | What it means | Example |
|---|---|---|
| Temporal locality | Data accessed recently is likely to be accessed again soon | A trending news article gets millions of views in one hour |
| Spatial locality | Data near recently accessed data is likely to be accessed soon | After reading page 1, the user reads page 2 |
In a web application, caches exist at every layer, from closest-to-user (fastest, smallest) to closest-to-origin (slowest, largest):
Caches have limited space. When full, they must choose what to evict. The eviction policy determines cache effectiveness.
| Policy | Evicts | Pros | Cons |
|---|---|---|---|
| LRU (Least Recently Used) | Item not accessed for the longest time | Simple, good for temporal locality | Full scan can evict hot items |
| LFU (Least Frequently Used) | Item with fewest accesses | Keeps popular items | Slow to adapt; old popular items stick |
| FIFO (First In, First Out) | Oldest item regardless of access | Simplest to implement | Ignores access patterns |
| Random | Random item | No metadata overhead | Unpredictable performance |
| W-TinyLFU | Window-based frequency estimate | Best hit ratio in practice | Complex (used by Caffeine library) |
A cache with 5 slots. Watch items enter, get accessed, and get evicted under different policies. The hit ratio counter shows effectiveness.
The browser and CDN need to know: "Can I cache this? For how long?" HTTP provides the Cache-Control header to answer these questions. This is the most important header in web performance.
| Directive | Meaning | Use case |
|---|---|---|
| public | Any cache (browser, CDN, proxy) may store this | Static assets, public HTML |
| private | Only the browser may cache this (not CDN/proxy) | User-specific data (dashboard, profile) |
| max-age=N | Cache is fresh for N seconds | max-age=3600 = 1 hour |
| s-maxage=N | Max-age for shared caches (CDN/proxy) only | CDN caches for 60s, browser for 3600s |
| no-cache | Must revalidate with origin before using cached copy | HTML pages that might change |
| no-store | Don't cache at all — not in memory, not on disk | Sensitive data (banking, health records) |
| immutable | Never changes — don't even bother revalidating | Versioned assets (main.abc123.js) |
| stale-while-revalidate=N | Serve stale for N sec while fetching fresh copy in background | News feeds, product listings |
no-store. Getting these confused causes either security vulnerabilities (caching private data) or unnecessary cache misses (not caching public data).Toggle directives to build a Cache-Control header. The resulting behavior is shown in real-time.
Cache-Control tells the browser HOW LONG to cache. But what happens when that time expires? The browser needs to check: "Has this content actually changed, or can I keep using my cached copy?" This is revalidation, and it uses ETags.
An ETag (entity tag) is a fingerprint of the resource. It's a string that changes whenever the content changes. When the server sends a response, it includes an ETag header. When the browser revalidates, it sends that ETag back in an If-None-Match header.
Last-Modified + If-Modified-Since headers (date-based). ETags are strictly more powerful — they handle cases where content changes twice in the same second, or where content reverts to an older version. Use ETags. Last-Modified is a fallback for legacy systems.Watch the browser send requests with and without ETags. Notice how 304 responses save bandwidth.
A reverse proxy sits between clients and your origin servers. It intercepts every request, checks its cache, and either serves the cached response immediately or forwards the request to the origin. The client doesn't know the proxy exists.
This is different from a forward proxy (like a corporate web filter), which sits between the client and the internet. A reverse proxy sits between the internet and your servers.
| Function | How it helps | Tools |
|---|---|---|
| Caching | Stores responses in RAM; serves without hitting origin | Varnish, Nginx, HAProxy |
| Load balancing | Distributes requests across multiple origin servers | Nginx, Envoy, HAProxy |
| SSL termination | Handles TLS encryption/decryption; origin speaks plain HTTP | All major proxies |
| Compression | Gzip/Brotli compresses responses before sending to client | Nginx, Cloudflare |
| Request collapsing | Multiple identical requests during a miss → one origin request | Varnish, Nginx |
When a popular cached item expires, hundreds of concurrent requests for it arrive at the same instant. Without protection, all of them miss the cache and hit your origin simultaneously — the thundering herd. Request collapsing prevents this: the first request goes to origin, and all subsequent identical requests wait for the first one to return, then they all get the same response.
Requests arrive at the reverse proxy. Hits are served instantly (green). Misses go to origin (yellow). Watch how the thundering herd is handled with and without request collapsing.
A Content Delivery Network is a geographically distributed network of reverse proxy servers. Instead of one reverse proxy in your data center, a CDN has hundreds or thousands of edge servers (also called Points of Presence, or PoPs) in cities around the world. Each PoP caches your content and serves it to nearby users.
When a user requests your content, the CDN must route them to the nearest (or best) edge server. Two common approaches:
| Method | How it works | Pros/Cons |
|---|---|---|
| DNS-based routing | CDN controls the DNS resolution. User's DNS resolver gets an IP for the nearest PoP. | Simple. But DNS is cached, so changes are slow (TTL-dependent). |
| Anycast routing | Multiple PoPs advertise the same IP via BGP. Internet routing sends packets to the nearest one. | Fast, no DNS dependency. Used by Cloudflare, Google. |
A CDN with edge PoPs around the world and one origin. Click a city to send a request and watch how it routes through the CDN hierarchy.
Real systems don't have one cache — they have many, layered from closest-to-user to closest-to-origin. Understanding how requests flow through these layers is essential for debugging performance issues and setting correct cache headers.
The Vary header tells caches to store separate entries for requests that differ in specific headers. Without Vary, a cache might serve a gzipped response to a client that can't decompress it.
Trace a request through all cache layers. Each layer shows its TTL and hit/miss status. Watch how requests cascade through layers on misses.
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
Cache invalidation is the process of removing or updating stale data from a cache. It's hard because caches are distributed (browser, CDN edge, shield, proxy) and there's no reliable way to instantly reach all of them. Every invalidation strategy is a trade-off between staleness, complexity, and reliability.
| Strategy | How it works | Staleness window | Complexity |
|---|---|---|---|
| TTL-based | Set max-age; cache expires after N seconds | Up to N seconds | Simplest |
| Purge/Ban | CDN API call to remove specific URLs or patterns | Seconds (API propagation delay) | Medium |
| Cache tags | Tag cached entries; purge all entries with a tag | Seconds | Medium-high |
| Event-driven | Database change → event → purge cache | Milliseconds to seconds | High |
| Versioned URLs | Change URL when content changes (main.abc.js → main.def.js) | Zero (new URL = new entry) | Requires build pipeline |
A cache with stale data. Try different invalidation strategies and see how quickly each one removes the stale entries.
This is the showcase simulation. You're running a CDN with 6 edge PoPs around the world, 2 regional shields, and one origin server. Users from different cities send requests. Watch the DNS resolution, edge cache checks, shield fallbacks, and origin fetches play out in real time. You can flush individual PoPs, create regional outages, and see how the CDN adapts.
6 edge PoPs, 2 shields, 1 origin. Click cities to send requests. Watch the routing cascade. Green = cache hit, yellow = shield hit, red = origin fetch.
Statistics to observe:
| Metric | What to watch |
|---|---|
| Edge hit ratio | Should increase as you send more requests to the same PoP |
| Shield hit ratio | First request from a new PoP in the same region should hit the shield |
| Origin load | Should be minimal after initial warm-up |
| Failover latency | When shield is down, edge goes directly to origin (higher latency) |
Caching is the most impactful performance optimization in distributed systems. A well-configured cache hierarchy can reduce origin load by 100x and cut user-perceived latency from seconds to milliseconds. But it comes with the eternal challenge: keeping cached data fresh.
| Content type | Cache-Control | Invalidation |
|---|---|---|
| Versioned assets (JS, CSS) | public, max-age=31536000, immutable | New filename = automatic |
| HTML pages | public, no-cache (or max-age=60) | ETag revalidation |
| API responses (public) | public, s-maxage=60, stale-while-revalidate=300 | Purge API on data change |
| API responses (private) | private, max-age=30 | Short TTL only |
| Sensitive data | no-store | N/A (never cached) |
Where different caching strategies sit on the freshness-performance spectrum.
Related lessons:
"The fastest I/O is the I/O you don't do." — Anonymous engineer