Caching & CDNs — From Absolute Zero to Mastery

Chapter 0: The Latency Tax

A user in Sydney requests your website. Your origin server is in Virginia. The request travels 16,000 km through undersea fiber: Sydney to Los Angeles (12,000 km), then across the US to Virginia (4,000 km). Round trip at the speed of light in fiber: 160 milliseconds. But light speed is the theoretical minimum — real networks add routing, congestion, and protocol overhead. The actual round trip: 280-350 milliseconds. For a single HTTP request.

Now multiply. A typical web page makes 50-100 HTTP requests: HTML, CSS, JavaScript, fonts, images, API calls. Even with HTTP/2 multiplexing, the browser needs at least 3-5 round trips to render the page (DNS lookup, TCP handshake, TLS handshake, HTML fetch, then asset fetches). That's over a second of latency, just waiting for packets to cross the ocean.

Your competitor serves the same content from a server in Sydney. Their page loads in 200 milliseconds. Research shows that every 100ms of added latency reduces conversion by 1%. You're losing 3-4% of your revenue to physics.

You can't beat the speed of light. But you can change WHERE the data is served from. Caching stores copies of data closer to the user — in the browser, in a nearby edge server, in a reverse proxy in your data center. The fastest request is the one that never crosses the ocean.

The Cache Hit Ratio: The Only Metric That Matters

// Cache hit ratio = fraction of requests served from cache
Hit ratio = cache_hits / (cache_hits + cache_misses)

// Example: 10,000 requests/sec, 95% hit ratio
Origin handles: 10,000 × 0.05 = 500 requests/sec
Cache handles: 10,000 × 0.95 = 9,500 requests/sec

// Without cache: origin needs to handle 10,000 req/sec (20x more!)
// At $0.001/request, cache saves: 9,500 × $0.001 × 86,400 sec/day = $820/day

A 95% cache hit ratio means your origin server handles 20x less traffic. A 99% ratio means 100x less. The difference between 95% and 99% is the difference between needing 10 origin servers and needing 2.

Latency With and Without Caching

Watch requests travel from Sydney to Virginia (no cache) vs. hitting a local edge server (cached). Click to send requests and compare latency.

Click "Send Request" to see how long it takes with and without a cache.

Quick check: Your origin server handles 1000 requests/sec at maximum capacity. You add a cache with a 90% hit ratio. How many requests/sec can your system now handle?

1900 req/sec 10,000 req/sec — the origin handles 10% of 10,000 = 1000 (its capacity), so the cache effectively multiplies throughput by 10x 2000 req/sec

Chapter 1: Cache Fundamentals

A cache is a high-speed storage layer that stores a subset of data, typically transient, so that future requests for that data are served faster. Caches exploit two properties of real-world access patterns:

Property	What it means	Example
Temporal locality	Data accessed recently is likely to be accessed again soon	A trending news article gets millions of views in one hour
Spatial locality	Data near recently accessed data is likely to be accessed soon	After reading page 1, the user reads page 2

The Cache Hierarchy

In a web application, caches exist at every layer, from closest-to-user (fastest, smallest) to closest-to-origin (slowest, largest):

Browser Cache

In the user's browser. Stores static assets (JS, CSS, images). Fastest possible — zero network latency. Controlled by HTTP headers.

↓ miss

CDN Edge

Nearest edge server (e.g., Cloudflare Sydney). 5-50ms latency. Caches HTML, assets, and sometimes API responses.

↓ miss

Reverse Proxy

Nginx/Varnish in your data center. Sits in front of your application servers. Caches computed responses.

↓ miss

Application Cache

Redis/Memcached. Caches database query results, computed data, session state.

↓ miss

Database Cache

Query cache, buffer pool (pages in RAM). Internal to the DB.

↓ miss

Disk

Actual durable storage. The "source of truth." Slowest.

Eviction Policies

Caches have limited space. When full, they must choose what to evict. The eviction policy determines cache effectiveness.

Policy	Evicts	Pros	Cons
LRU (Least Recently Used)	Item not accessed for the longest time	Simple, good for temporal locality	Full scan can evict hot items
LFU (Least Frequently Used)	Item with fewest accesses	Keeps popular items	Slow to adapt; old popular items stick
FIFO (First In, First Out)	Oldest item regardless of access	Simplest to implement	Ignores access patterns
Random	Random item	No metadata overhead	Unpredictable performance
W-TinyLFU	Window-based frequency estimate	Best hit ratio in practice	Complex (used by Caffeine library)

LRU is king. In practice, LRU (or its approximation) wins for most workloads. Redis uses an approximate LRU that samples 5 random keys and evicts the one with the oldest access time. This is O(1) per eviction and surprisingly close to true LRU in hit ratio. Most CDNs also use LRU or variants.

Cache Eviction Simulator

A cache with 5 slots. Watch items enter, get accessed, and get evicted under different policies. The hit ratio counter shows effectiveness.

Cache has 5 slots. Access items to see how LRU eviction works.

Check: Your cache has 4 slots and uses LRU eviction. You access items in this order: A, B, C, D, E, A, B. After all accesses, which item was evicted?

A — it was accessed first D — it was accessed least recently C — when E was accessed, the cache was [A,B,C,D]; LRU evicts A. Then accessing A again evicts B. Accessing B again evicts C. Final cache: [D,E,A,B]

Chapter 2: HTTP Cache-Control

The browser and CDN need to know: "Can I cache this? For how long?" HTTP provides the Cache-Control header to answer these questions. This is the most important header in web performance.

Cache-Control Directives

Directive	Meaning	Use case
public	Any cache (browser, CDN, proxy) may store this	Static assets, public HTML
private	Only the browser may cache this (not CDN/proxy)	User-specific data (dashboard, profile)
max-age=N	Cache is fresh for N seconds	max-age=3600 = 1 hour
s-maxage=N	Max-age for shared caches (CDN/proxy) only	CDN caches for 60s, browser for 3600s
no-cache	Must revalidate with origin before using cached copy	HTML pages that might change
no-store	Don't cache at all — not in memory, not on disk	Sensitive data (banking, health records)
immutable	Never changes — don't even bother revalidating	Versioned assets (main.abc123.js)
stale-while-revalidate=N	Serve stale for N sec while fetching fresh copy in background	News feeds, product listings

Real-World Header Examples

// Static asset with content hash in filename (best practice)
Cache-Control: public, max-age=31536000, immutable
// 1 year. The filename changes when content changes, so no invalidation needed.
// main.abc123.js → main.def456.js

// HTML page (might change any time)
Cache-Control: public, no-cache
// Browser must check with server before using cached copy (via ETag).
// If unchanged, server responds 304 Not Modified (no body = fast).

// User dashboard (private, per-user data)
Cache-Control: private, max-age=60
// Browser caches for 60 sec. CDN/proxy must NOT cache.

// Banking page
Cache-Control: no-store
// NOTHING caches this. Every request goes to origin.

// Product listing (stale acceptable for a few seconds)
Cache-Control: public, max-age=10, stale-while-revalidate=60
// Fresh for 10s. After that, serve stale for up to 60s while background-refreshing.

Common mistake: no-cache does NOT mean "don't cache." It means "cache it, but always ask the server if it's still valid before using it." This is called revalidation, and it uses ETags (covered next chapter). The directive that actually prevents caching is no-store. Getting these confused causes either security vulnerabilities (caching private data) or unnecessary cache misses (not caching public data).

Cache-Control Header Builder

Toggle directives to build a Cache-Control header. The resulting behavior is shown in real-time.

max-age (seconds) 3600

Toggle directives to build a header.

Check: You serve a JavaScript bundle at /static/main.abc123.js where abc123 is a content hash. What Cache-Control header gives the best performance?

public, max-age=31536000, immutable — the filename changes when content changes, so it's safe to cache forever public, max-age=3600 — in case we need to change it no-cache — always revalidate to ensure freshness

Chapter 3: ETags & Conditional Requests

Cache-Control tells the browser HOW LONG to cache. But what happens when that time expires? The browser needs to check: "Has this content actually changed, or can I keep using my cached copy?" This is revalidation, and it uses ETags.

An ETag (entity tag) is a fingerprint of the resource. It's a string that changes whenever the content changes. When the server sends a response, it includes an ETag header. When the browser revalidates, it sends that ETag back in an If-None-Match header.

The Conditional Request Flow

1. First Request

Browser: GET /page.html
Server: 200 OK, ETag: "abc123", Cache-Control: no-cache
Browser stores response + ETag

↓

2. Revalidation

Browser: GET /page.html, If-None-Match: "abc123"
"I have version abc123. Is it still current?"

↓

3a. Not Modified

Server: 304 Not Modified (NO BODY — saves bandwidth!)
Browser uses cached copy. ~100 bytes vs. 50KB for full response.

3b. Modified

Server: 200 OK, ETag: "def456", [new content]
Browser replaces cached copy with new version.

Strong vs. Weak ETags

// Strong ETag: content is byte-for-byte identical
ETag: "abc123"
// Used for: exact content matching, range requests (partial downloads)

// Weak ETag: content is "semantically equivalent" but may differ in bytes
ETag: W/"abc123"
// Used for: HTML pages where whitespace or ad placements change
// but the meaningful content is the same

ETags vs. Last-Modified. HTTP also supports revalidation via Last-Modified + If-Modified-Since headers (date-based). ETags are strictly more powerful — they handle cases where content changes twice in the same second, or where content reverts to an older version. Use ETags. Last-Modified is a fallback for legacy systems.

ETag Conditional Request Flow

Watch the browser send requests with and without ETags. Notice how 304 responses save bandwidth.

Send a first request, then revalidate to see 304 Not Modified.

Check: A browser sends GET /data with If-None-Match: "v5". The content has not changed (ETag is still "v5"). How much data does the server send back?

The full response (same as always) Just a 304 Not Modified header (~100 bytes), no body. The browser uses its cached copy. Nothing — the browser doesn't even need to ask

Chapter 4: Reverse Proxies

A reverse proxy sits between clients and your origin servers. It intercepts every request, checks its cache, and either serves the cached response immediately or forwards the request to the origin. The client doesn't know the proxy exists.

This is different from a forward proxy (like a corporate web filter), which sits between the client and the internet. A reverse proxy sits between the internet and your servers.

What Reverse Proxies Do

Function	How it helps	Tools
Caching	Stores responses in RAM; serves without hitting origin	Varnish, Nginx, HAProxy
Load balancing	Distributes requests across multiple origin servers	Nginx, Envoy, HAProxy
SSL termination	Handles TLS encryption/decryption; origin speaks plain HTTP	All major proxies
Compression	Gzip/Brotli compresses responses before sending to client	Nginx, Cloudflare
Request collapsing	Multiple identical requests during a miss → one origin request	Varnish, Nginx

Request Collapsing (Thundering Herd Protection)

When a popular cached item expires, hundreds of concurrent requests for it arrive at the same instant. Without protection, all of them miss the cache and hit your origin simultaneously — the thundering herd. Request collapsing prevents this: the first request goes to origin, and all subsequent identical requests wait for the first one to return, then they all get the same response.

// Without request collapsing:
Cache expires at t=0
t=0.001s: Request 1 → origin (miss)
t=0.002s: Request 2 → origin (miss)
t=0.003s: Request 3 → origin (miss)
...
t=0.050s: Request 50 → origin (miss)
// Origin gets 50 identical requests. 49 are redundant.

// With request collapsing:
t=0.001s: Request 1 → origin (miss, becomes the "fill")
t=0.002s: Request 2 → waits for Request 1
t=0.003s: Request 3 → waits for Request 1
...
t=0.200s: Request 1 returns. All waiters get the same response.
// Origin gets 1 request. Cache is repopulated.

Varnish is the speed demon. Varnish stores everything in RAM (no disk) and uses a custom configuration language (VCL) for routing logic. It's specifically designed for HTTP caching and can serve 100,000+ requests/sec from a single node. Nginx is more versatile (also does load balancing, SSL, static serving) but its caching is slightly less optimized. In practice, many deployments use Nginx as the load balancer/SSL terminator with Varnish behind it as the cache layer.

Reverse Proxy Caching

Requests arrive at the reverse proxy. Hits are served instantly (green). Misses go to origin (yellow). Watch how the thundering herd is handled with and without request collapsing.

Send requests to see cache hits (green) and misses (yellow).

Check: A cached item expires. In the next 100ms, 200 requests arrive for that item. Request collapsing is enabled. How many requests reach the origin server?

1 — the first request is forwarded; the other 199 wait for its response 200 — collapsing only works for simultaneous requests ~20 — collapsing reduces load by 10x

Chapter 5: CDN Architecture

A Content Delivery Network is a geographically distributed network of reverse proxy servers. Instead of one reverse proxy in your data center, a CDN has hundreds or thousands of edge servers (also called Points of Presence, or PoPs) in cities around the world. Each PoP caches your content and serves it to nearby users.

How CDN Routing Works

When a user requests your content, the CDN must route them to the nearest (or best) edge server. Two common approaches:

Method	How it works	Pros/Cons
DNS-based routing	CDN controls the DNS resolution. User's DNS resolver gets an IP for the nearest PoP.	Simple. But DNS is cached, so changes are slow (TTL-dependent).
Anycast routing	Multiple PoPs advertise the same IP via BGP. Internet routing sends packets to the nearest one.	Fast, no DNS dependency. Used by Cloudflare, Google.

CDN Request Flow

1. DNS Resolution

User resolves cdn.example.com → CDN's DNS returns IP of nearest PoP (e.g., Sydney)

↓

2. Edge Check

Request hits Sydney PoP. Edge checks its cache. HIT → return cached response (5ms).

↓ miss

3. Shield/Mid-Tier

Edge asks the "shield" (regional parent cache). HIT → return to edge (20ms). MISS → go to origin.

↓ miss

4. Origin Fetch

Shield fetches from your origin server (300ms). Response flows back through shield → edge → user. Both caches store the response.

The shield layer matters. Without a shield, every edge PoP that misses asks the origin directly. With 200 PoPs and a popular item that expires simultaneously, your origin gets 200 requests. With a shield (one per region), only 5-10 shields ask the origin. The shield is request collapsing at a global scale.

What CDNs Actually Cache

// Default: CDNs cache based on Cache-Control headers
// Static assets (JS, CSS, images, fonts): always cached
// HTML: depends on headers (usually no-cache or short max-age)
// API responses: depends on Vary header and cacheability

// Cache key = URL + relevant Vary headers
// GET /api/products?page=1 → one cache entry
// GET /api/products?page=2 → different cache entry
// GET /api/products?page=1 Accept-Encoding: br → SAME entry (Vary: Accept-Encoding handled by CDN)

CDN Global Network

A CDN with edge PoPs around the world and one origin. Click a city to send a request and watch how it routes through the CDN hierarchy.

Click a city to request content from that edge PoP.

Check: A CDN has 100 edge PoPs and 5 regional shields. A popular page expires at the same time everywhere. Without shields, how many requests hit origin? With shields?

Without: 100, With: 5 Without: up to 100 (one per PoP), With: up to 5 (one per shield, edges wait for their shield) Same in both cases — all requests eventually reach origin

Chapter 6: Cache Hierarchies

Real systems don't have one cache — they have many, layered from closest-to-user to closest-to-origin. Understanding how requests flow through these layers is essential for debugging performance issues and setting correct cache headers.

The Full Request Path

// User in Sydney requests https://example.com/api/products

Layer 1: Browser cache
  Checks local cache. Has entry with max-age=60, 45 seconds old.
  FRESH → serve from browser. 0ms latency, 0 bytes transferred.

// 20 seconds later, same request. Entry is now 65 seconds old (expired).

Layer 1: Browser cache
  Entry expired. Sends conditional request with ETag.
Layer 2: CDN edge (Sydney PoP)
  Receives request. Checks its cache. Has fresh entry (s-maxage=300).
  HIT → responds 200 with current data. 5ms latency.
  Browser updates its cache. Sets new max-age timer.

// 10 minutes later. CDN entry also expired.

Layer 1: Browser cache
  Expired → conditional request.
Layer 2: CDN edge
  MISS → forwards to shield.
Layer 3: CDN shield (Singapore)
  Has fresh entry. HIT → responds to edge. 20ms.
  Edge caches the response, responds to browser.

// 30 minutes later. Shield entry also expired.

Layer 4: Reverse proxy (Nginx in Virginia)
  MISS → forwards to application.
Layer 5: Application cache (Redis)
  HIT → responds to Nginx. 1ms.
Layer 6: Database
  Not reached. Saved a 10ms database query.

Each layer has different TTLs. A common pattern: browser caches for 60 seconds (max-age=60), CDN caches for 5 minutes (s-maxage=300), application cache caches for 30 minutes. This means the browser makes a network request every 60 seconds, but the CDN only hits origin every 5 minutes, and origin only hits the database every 30 minutes. Each layer shields the one behind it.

The Vary Header: Cache Key Control

The Vary header tells caches to store separate entries for requests that differ in specific headers. Without Vary, a cache might serve a gzipped response to a client that can't decompress it.

// Response header:
Vary: Accept-Encoding
// Cache stores separate entries for:
// Accept-Encoding: gzip → gzipped response
// Accept-Encoding: br → brotli response
// (no encoding) → uncompressed response

// Vary: Accept-Language
// Separate entries for en-US, ja, de, etc.

// DANGEROUS: Vary: Cookie
// Every unique cookie value = different cache entry
// This effectively disables caching (every user has different cookies)

Cache Hierarchy Simulator

Trace a request through all cache layers. Each layer shows its TTL and hit/miss status. Watch how requests cascade through layers on misses.

Send a request to trace it through all cache layers.

Check: You set Cache-Control: public, max-age=60, s-maxage=300. A user's browser cache expires after 60 seconds. They make a new request. Where does it go?

The CDN, which still has a fresh copy (s-maxage=300 = 5 minutes). The CDN responds with the cached data. The origin server — all caches expired together Nowhere — the browser still has the cached copy

Chapter 7: Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

Cache invalidation is the process of removing or updating stale data from a cache. It's hard because caches are distributed (browser, CDN edge, shield, proxy) and there's no reliable way to instantly reach all of them. Every invalidation strategy is a trade-off between staleness, complexity, and reliability.

Invalidation Strategies

Strategy	How it works	Staleness window	Complexity
TTL-based	Set max-age; cache expires after N seconds	Up to N seconds	Simplest
Purge/Ban	CDN API call to remove specific URLs or patterns	Seconds (API propagation delay)	Medium
Cache tags	Tag cached entries; purge all entries with a tag	Seconds	Medium-high
Event-driven	Database change → event → purge cache	Milliseconds to seconds	High
Versioned URLs	Change URL when content changes (main.abc.js → main.def.js)	Zero (new URL = new entry)	Requires build pipeline

Purge vs. Ban

// Purge: remove ONE specific URL from cache
curl -X PURGE https://cdn.example.com/api/products/42
// Useful for: updating a single page or resource

// Ban: remove ALL URLs matching a pattern
curl -X BAN https://cdn.example.com/ -H "X-Ban-Pattern: /api/products/.*"
// Useful for: invalidating an entire category after bulk update

// Cache tags (Fastly, Cloudflare): tag responses, then purge by tag
// Response: Surrogate-Key: product-42 category-electronics
// Purge: POST /purge/tag/category-electronics
// → All responses tagged "category-electronics" are purged

The browser cache is the hardest to invalidate. You can purge your CDN in seconds. But you can't reach into a user's browser and delete their cached copy. If you set max-age=86400 (1 day), that user won't check for updates for 24 hours. This is why the versioned URL pattern is so powerful: changing the filename forces the browser to fetch the new version because it's a different URL. For HTML (which you can't version), use short max-age or no-cache with ETags.

Cache Invalidation Strategies

A cache with stale data. Try different invalidation strategies and see how quickly each one removes the stale entries.

Write new data, then use different strategies to invalidate the stale cache.

Check: You deploy a new version of your JavaScript bundle. Old version is cached in users' browsers with max-age=31536000 (1 year). How do you get users to load the new version?

Purge the CDN Use a versioned URL (main.abc.js → main.def.js). The HTML references the new URL, so the browser fetches it as a new resource. Wait for max-age to expire

Chapter 8: CDN Routing Simulation

This is the showcase simulation. You're running a CDN with 6 edge PoPs around the world, 2 regional shields, and one origin server. Users from different cities send requests. Watch the DNS resolution, edge cache checks, shield fallbacks, and origin fetches play out in real time. You can flush individual PoPs, create regional outages, and see how the CDN adapts.

Experiment: Send requests from multiple cities. Watch how the first request from each region misses and goes to origin, but subsequent requests hit the edge or shield. Then flush a PoP and see the request cascade. Create an outage to see failover routing.

CDN Request Routing Simulation

6 edge PoPs, 2 shields, 1 origin. Click cities to send requests. Watch the routing cascade. Green = cache hit, yellow = shield hit, red = origin fetch.

Click a city to route a request through the CDN hierarchy.

Statistics to observe:

Metric	What to watch
Edge hit ratio	Should increase as you send more requests to the same PoP
Shield hit ratio	First request from a new PoP in the same region should hit the shield
Origin load	Should be minimal after initial warm-up
Failover latency	When shield is down, edge goes directly to origin (higher latency)

Chapter 9: Connections

Caching is the most impactful performance optimization in distributed systems. A well-configured cache hierarchy can reduce origin load by 100x and cut user-perceived latency from seconds to milliseconds. But it comes with the eternal challenge: keeping cached data fresh.

The Cache Strategy Cheat Sheet

Content type	Cache-Control	Invalidation
Versioned assets (JS, CSS)	public, max-age=31536000, immutable	New filename = automatic
HTML pages	public, no-cache (or max-age=60)	ETag revalidation
API responses (public)	public, s-maxage=60, stale-while-revalidate=300	Purge API on data change
API responses (private)	private, max-age=30	Short TTL only
Sensitive data	no-store	N/A (never cached)

The golden rule: cache aggressively at the edge for static assets (versioned URLs + immutable), cache cautiously for dynamic content (short TTL + stale-while-revalidate), and never cache sensitive data (no-store). When in doubt, start with shorter TTLs and increase them after measuring your hit ratio.

The Caching Landscape

Where different caching strategies sit on the freshness-performance spectrum.

Related lessons:

Coordination Avoidance — CRDTs for eventually consistent caches
Partitioning & Storage — blob storage and CDN origin architecture
Database Replication — replication as a form of caching (replicas serve reads)

"The fastest I/O is the I/O you don't do." — Anonymous engineer