Data Storage & Caching — From Absolute Zero to Mastery

Chapter 0: The Problem

You have built a web application. Users sign up, create profiles, post content. Everything lives in one PostgreSQL database on one server. Reads take 2 milliseconds. Life is good.

Then you grow. A million users. Ten million posts. Your database disk is 90% full. Read latency creeps from 2ms to 20ms because the working set no longer fits in RAM and the database starts hitting disk. Writes are even worse — every write must be persisted to disk and update indexes. Users start complaining about slow page loads.

You have three fundamental problems, and each requires a different solution:

Problem	Symptom	Solution
Too much data for one machine	Disk full, storage limit	Partitioning — split data across machines
Too many reads for one machine	High CPU, slow queries	Replication — copy data to read replicas
Reads are too slow (even with capacity)	20ms per query, database as bottleneck	Caching — keep hot data in memory

These are not just theoretical concerns. Every system that grows beyond a single machine must solve all three. And they interact: caching reduces load on replicas, replication provides read capacity that makes cache misses tolerable, and partitioning determines where data lives (which affects cache locality).

Single Database Bottleneck

Watch requests hit a single database. As load increases, latency spikes because the working set exceeds RAM. A cache absorbs the hot reads, bringing latency back down.

Click a mode to begin

With caching, 90% of reads never touch the database. They are served from memory in under a millisecond. The database only handles cache misses (10% of reads) and all writes. This is the fundamental insight of caching: most access patterns follow a power law — a small set of "hot" keys accounts for the majority of reads. If you keep those hot keys in memory, you can reduce database load by an order of magnitude.

This lesson covers the storage layer of distributed systems. We will explore how data is replicated for durability, partitioned for scale, categorized across different NoSQL families, and cached for speed. By the end, you will understand the trade-offs behind every major storage decision in a distributed system.

Your database handles 10,000 reads/sec with 5ms latency. You add 3 read replicas (4 total). What problem have you NOT solved?

Read throughput — reads still go to one machine. Write throughput and latency. All writes still go to the leader. Adding read replicas helps reads but does nothing for writes. If your write load exceeds the leader's capacity, you need partitioning to distribute writes across multiple machines. Fault tolerance — if one replica dies, data is lost.

Chapter 1: Replication

Replication means keeping copies of the same data on multiple machines. It serves two purposes: fault tolerance (if one machine dies, others have the data) and read scaling (spread reads across multiple copies). The fundamental challenge is keeping copies in sync when data changes.

Single-Leader Replication

One node is the leader (also called primary or master). All writes go to the leader. The leader streams changes to followers (replicas, secondaries). Reads can go to any replica. This is how PostgreSQL, MySQL, MongoDB (by default), and most traditional databases work.

Client Write

INSERT INTO users VALUES ('alice', ...)

↓

Leader

Writes to disk, appends to WAL (Write-Ahead Log)

↓

Replication Stream

Leader sends WAL entries to all followers

↓

Followers

Apply WAL entries to their local copies

The critical question: does the leader wait for followers to confirm before acknowledging the write to the client? Synchronous replication waits — the client does not get "OK" until at least one follower has confirmed. Durable but slow. Asynchronous replication acknowledges immediately after the leader writes — fast but risky, because if the leader crashes before followers catch up, those writes are lost.

// Synchronous: write confirmed after follower ack
Client → Leader writes → Follower-1 writes → ACK to client
Latency: leader_write + network_RTT + follower_write ≈ 5ms + 1ms + 5ms = 11ms
Durability: guaranteed on 2 machines

// Asynchronous: write confirmed immediately
Client → Leader writes → ACK to client (followers catch up later)
Latency: leader_write ≈ 5ms
Durability: only on 1 machine until replication catches up

// Semi-synchronous (PostgreSQL default): 1 sync follower, rest async
// Best of both: fast for most, durable on 2 machines

Multi-Leader Replication

In multi-leader (also called master-master), multiple nodes accept writes. Each leader replicates to the others. This is useful for geographically distributed systems: a user in Tokyo writes to the Tokyo leader with low latency, and the change replicates to the Virginia leader asynchronously.

The poison: write conflicts. User A changes a record on leader 1 while user B changes the same record on leader 2. Both succeed locally. When the changes replicate, each leader sees a conflicting update. Which one wins? Common resolution strategies: last-write-wins (LWW, by timestamp — lossy), merge the values (application-specific), or flag the conflict for manual resolution.

Leaderless Replication

Leaderless (used by Cassandra and DynamoDB) has no designated leader. Any node can accept writes. The client writes to multiple nodes simultaneously. Reads also query multiple nodes. Consistency is achieved through quorum: if there are N replicas, a write succeeds if W nodes confirm, and a read succeeds if R nodes respond. As long as W + R > N, you are guaranteed to read your own writes.

// Quorum formula
N = 3 replicas
W = 2 (write to at least 2 nodes)
R = 2 (read from at least 2 nodes)
W + R = 4 > N = 3 → overlapping quorum, guaranteed consistency

// Fast reads (eventual consistency)
W = 3, R = 1 → writes are slow but reads are fast

// Fast writes (eventual consistency)
W = 1, R = 3 → writes are fast but reads are slow

Replication Strategies

Watch how writes propagate under each strategy. Single-leader: all writes go to one node. Multi-leader: writes go to any leader. Leaderless: writes go to a quorum.

Pick a strategy

The replication trade-off triangle. You can optimize for any two: (1) low write latency (do not wait for replicas), (2) strong consistency (read your own writes), (3) high availability (survive node failures). Single-leader with sync replication gives you consistency + availability but sacrifices latency. Leaderless with W=1 gives latency + availability but sacrifices consistency. Multi-leader with conflict resolution gives latency + availability with eventual consistency.

You use leaderless replication with N=5, W=3, R=3. A write updates nodes 1, 2, 3. A read queries nodes 3, 4, 5. Does the read see the latest write?

Yes. The write set {1,2,3} and read set {3,4,5} overlap at node 3. Since W + R = 6 > N = 5, there must always be at least one node in common. Node 3 has the latest value and will be included in the read response. No — nodes 4 and 5 have stale data and will outvote node 3. Only if the read happens after replication completes on all nodes.

Chapter 2: Partitioning

Replication copies data for redundancy. Partitioning (also called sharding) splits data for capacity. Each partition holds a subset of the data on a different machine. Together, all partitions hold the full dataset. This is how you go beyond what one machine can store or process.

Range Partitioning

Assign each partition a continuous range of keys. Keys A-M go to partition 1, N-Z go to partition 2. Efficient for range queries ("all users whose name starts with 'S'") because you only query one or two partitions. But vulnerable to hotspots: if keys are timestamps, all recent writes land on one partition.

Hash Partitioning

Hash each key and assign the hash to a partition. partition = hash(key) % N. Distributes data evenly regardless of key patterns — no hotspots. But range queries are impossible: adjacent keys hash to random partitions, so "all users from A to C" requires querying every partition.

// Range partitioning: good for scans, bad for hotspots
key="2024-06-15" → partition 3 (owns June 2024)
key="2024-06-16" → partition 3 (same month, same partition)
Range query "June 2024": only partition 3 (efficient!)
But: ALL writes in June go to partition 3 (hotspot!)

// Hash partitioning: good for distribution, bad for scans
key="2024-06-15" → hash = 0x4A2B → partition 2
key="2024-06-16" → hash = 0xD103 → partition 0
Adjacent dates land on different partitions (no hotspot!)
But: range query "June 2024" must scan ALL partitions (expensive!)

Compound Partitioning

Some systems combine both. Cassandra uses the first part of the key for hash partitioning (to distribute data) and the second part for range ordering within each partition (to support efficient range queries on that second dimension).

// Cassandra compound key: (user_id, timestamp)
// Partition key: hash(user_id) → distributes users across nodes
// Clustering key: timestamp → sorted within each partition

Query: "all posts by user 42 from June" →
1. hash("user42") → partition 7 (single node!)
2. Range scan on timestamp within partition 7 (efficient!)

// Best of both worlds for this access pattern

Range vs. Hash Partitioning

Watch keys distribute across 4 partitions. Range partitioning creates hotspots with sequential keys. Hash partitioning distributes evenly.

Pick a strategy

The partitioning-replication combination. Real systems use both. You partition for capacity (5 partitions, each holding 20% of the data) and replicate each partition for durability (3 copies of each). That is 5 x 3 = 15 node assignments. A node might hold partition 1's primary copy, partition 3's replica, and partition 5's replica. This is how Kafka, Cassandra, and CockroachDB work.

You partition user data by hash(user_id) across 4 nodes. A query asks for "all users who signed up in January." How many nodes must you query?

1 — the partition holding January data. 2 — the hash will narrow it to 2 partitions. All 4. Hash partitioning distributes users by user_id, not by signup date. January signups are scattered across all partitions. This is a scatter-gather query — you query every partition and merge results. Range partitioning on signup date would require only 1 partition, but would create write hotspots.

Chapter 3: NoSQL Taxonomy

Not all data fits neatly into rows and columns. The NoSQL movement (which really means "not only SQL") emerged because relational databases are a poor fit for some workloads. There are four major families, each optimized for different access patterns.

Key-Value Stores

The simplest model: a key maps to a blob of data. The store does not care what the value contains. GET(key) and PUT(key, value) are the only operations. Because the store does not need to parse or index the value, it can be extremely fast — sub-millisecond reads.

Use cases: session storage, user preferences, shopping carts, caching. Examples: Redis, Memcached, DynamoDB (simple mode), etcd.

Trade-off: No queries on values. You cannot say "find all users where age > 30" without reading every value and checking. If you need to query by multiple attributes, a key-value store forces you to maintain secondary indexes manually.

Document Stores

Like key-value, but the store understands the value structure — it is a JSON (or BSON) document. This means you can create indexes on fields within the document and query on any field. Documents are self-contained: a user document includes their name, address, orders, and preferences in one blob.

Use cases: content management, user profiles, product catalogs — anything with variable structure. Examples: MongoDB, CouchDB, Firestore.

Trade-off: Great for reading a full document, terrible for joins across documents. If a user's order references a product document, you need two reads (or denormalize and risk inconsistency).

Wide-Column Stores

Data is organized by row key, column family, and column name. Think of it as a two-dimensional key: (row_key, column_name) → value. Each row can have a different set of columns — there is no fixed schema. Columns within a family are stored together on disk, enabling efficient reads of specific column subsets.

Use cases: time-series data, analytics, event logs — any workload with massive row counts and queries that touch specific column subsets. Examples: Cassandra, HBase, BigTable.

Graph Databases

Store data as nodes (entities) and edges (relationships). Queries traverse edges: "find all friends of friends who work at company X." Relational databases can do this with joins, but each hop requires a join, and joining 6 levels deep is prohibitively slow. Graph databases index edges natively, making traversals constant-time per hop.

Use cases: social networks, fraud detection, recommendation engines, knowledge graphs. Examples: Neo4j, Amazon Neptune, TigerGraph.

NoSQL Families

Click each family to see its data model, query pattern, and strengths.

Family	Data model	Query style	Best for
Key-Value	key → opaque blob	GET/PUT by key	Caching, sessions, config
Document	key → JSON document	Query any field, index fields	CMS, profiles, catalogs
Wide-Column	(row, column) → value	Scan rows, filter columns	Time-series, analytics, logs
Graph	nodes + edges	Traverse relationships	Social, fraud, recommendations

NoSQL does not mean "no schema." It means "schema on read" instead of "schema on write." A relational database enforces structure when you write (INSERT fails if columns do not match). A document store accepts any JSON — but your application still needs to understand the structure when it reads. The schema exists; it is just enforced in code, not in the database. This is flexible but dangerous: bugs that write malformed documents are not caught until read time.

You need to find all users who are friends-of-friends with user X and also purchased product Y. Which database type handles this most efficiently?

Document store — store friendships inside user documents. Wide-column store — efficient column scans for friend lists. Graph database. "Friends-of-friends" is a 2-hop traversal — trivial in a graph DB (constant time per hop), but requires a 2-level self-join in relational (expensive) or full document scans in document stores. The graph DB also naturally intersects the "purchased Y" edge.

Chapter 4: Cache-Aside Pattern

The most common caching pattern in production systems is cache-aside (also called lazy loading or look-aside). The application code is responsible for managing the cache — the cache and database do not talk to each other directly.

The Flow

Read Request

Application receives GET /user/42

↓

Check Cache

cache.get("user:42") — returns value or NULL

↓ cache miss

Query Database

db.query("SELECT * FROM users WHERE id=42")

↓

Populate Cache

cache.set("user:42", result, ttl=300)

↓

Return to Client

Next read of user:42 hits cache (fast path)

// Cache-aside in code
def get_user(user_id):
  # Step 1: check cache
  cached = redis.get(f"user:{user_id}")
  if cached:
    return json.loads(cached) # Cache HIT — fast path

  # Step 2: cache miss — query database
  user = db.query("SELECT * FROM users WHERE id = %s", user_id)

  # Step 3: populate cache for next time
  redis.setex(f"user:{user_id}", 300, json.dumps(user)) # TTL = 5 min

  return user

Why Cache-Aside Works

Only hot data gets cached. The cache fills organically: data that is read often gets cached automatically (because it is read, found missing, fetched, and cached). Data that is never read is never cached. The cache naturally reflects the access pattern. No upfront work to decide what to cache.

Cache failures are survivable. If the cache goes down, the application falls back to the database. Latency increases but the system does not break. This is a critical advantage over patterns where the cache is in the write path.

The Staleness Problem

When data is updated in the database, the cache still holds the old value. There are two strategies:

TTL-based expiration: Set a TTL (Time To Live) on every cached entry. After 5 minutes, the entry expires and the next read fetches from the database. Staleness window = TTL. Simple but imprecise.

Explicit invalidation: When updating data, also delete the cache entry. The next read will miss, fetch the new value, and repopulate. More consistent but requires discipline — every write path must remember to invalidate.

// Write + invalidate
def update_user(user_id, data):
  db.execute("UPDATE users SET ... WHERE id = %s", user_id, data)
  redis.delete(f"user:{user_id}") # Invalidate cache
  # Next read will miss, fetch from DB, and repopulate

// DANGER: if invalidate fails (network blip), cache holds stale data
// until TTL expires. Always set a TTL as a safety net!

Cache-Aside Pattern

Watch reads flow through the cache. First access = miss (goes to DB). Subsequent accesses = hit (served from cache). Click "Update DB" to see invalidation.

Click Read to start

Cache-aside is the default for 90% of production systems. It is simple, resilient, and self-tuning. The main gotchas: (1) the first read for every key is slow (cold cache / "thundering herd" if many requests hit the same uncached key simultaneously), (2) stale data if invalidation is missed, (3) the application code must handle both hit and miss paths.

Your cache-aside system has a 95% hit rate and 5ms cache latency vs. 50ms DB latency. What is the average read latency?

0.95 x 5ms + 0.05 x (5ms + 50ms) = 4.75ms + 2.75ms = 7.5ms. Cache hits are 5ms. Cache misses pay the cache check (5ms) plus the DB query (50ms), so 55ms. The weighted average is 7.5ms — a 6.7x improvement over the raw 50ms DB latency, achieved by serving 95% of reads from memory. 5ms — the cache is always fast. 27.5ms — average of 5ms and 50ms.

Chapter 5: Write-Through, Write-Behind, Read-Through

Cache-aside puts the application in charge. But there are patterns where the cache itself coordinates with the database, simplifying application code at the cost of more complex infrastructure.

Write-Through

Every write goes to the cache first. The cache synchronously writes to the database before returning. The application writes to the cache, and the cache handles persistence. The cache always has the latest data — no staleness.

// Write-through flow
App writes "user:42" → Cache
Cache writes "user:42" → Database (synchronously)
Cache returns success to App

// Advantage: cache is always up-to-date (no stale reads)
// Disadvantage: every write has DB latency (slow writes)
// Disadvantage: writes data that may never be read (wasted cache space)

Write-Behind (Write-Back)

The cache accepts the write and returns immediately. The cache asynchronously flushes to the database later — either after a delay or in batches. This makes writes extremely fast (memory speed) but introduces a durability risk: if the cache crashes before flushing, those writes are lost.

// Write-behind flow
App writes "user:42" → Cache (returns immediately, ~0.5ms)
Cache queues write → flushes to DB every 5 seconds (in batch)

// Advantage: writes are memory-speed fast
// Advantage: batching reduces DB write load
// Disadvantage: data loss if cache crashes before flush
// Disadvantage: complex failure handling

Read-Through

Read-through is like cache-aside but the cache, not the application, fetches from the database on a miss. The application always reads from the cache. On a miss, the cache itself queries the DB, stores the result, and returns it. This simplifies application code — the application does not need miss-handling logic.

Write Pattern Comparison

Watch how writes flow through each pattern. Notice the latency and durability differences.

Pick a pattern

Pattern	Write latency	Read consistency	Durability	Complexity
Cache-Aside	DB speed (writes bypass cache)	Stale until TTL/invalidation	High (writes go to DB first)	In application
Write-Through	DB speed (cache writes through)	Always fresh	High	In cache layer
Write-Behind	Memory speed (async flush)	Always fresh in cache	Risk of loss	In cache layer
Read-Through	N/A (read pattern)	Fresh after first read	N/A	In cache layer

In practice: cache-aside + write-through is the most common combination. Reads use cache-aside (check cache, fall back to DB). Writes use write-through (write to cache and DB). This gives you always-fresh reads with simple application code. Write-behind is used when write throughput is critical and you can tolerate some data loss risk (e.g., view counters, analytics events).

Your system uses write-behind caching with a 10-second flush interval. The cache server crashes. What happens to the writes made in the last 10 seconds?

They are safe in the database because write-behind flushes continuously. They are lost. Write-behind accepts writes into memory and flushes to the database asynchronously. If the cache crashes before the next flush, all unflushed writes are gone. This is the fundamental durability trade-off of write-behind: fast writes at the risk of data loss on failure. They are recovered from the WAL (Write-Ahead Log) on restart.

Chapter 6: Cache Eviction

Cache memory is finite. Your database might hold 1 TB of data, but your Redis instance has 64 GB of RAM. When the cache is full and a new entry needs to be stored, something must be removed. The eviction policy decides what gets kicked out.

LRU (Least Recently Used)

Remove the entry that has not been accessed for the longest time. The intuition: if you have not read a key in a while, you probably will not read it soon. This is the most popular eviction policy because it works well for most access patterns.

Implementation: A doubly-linked list + hash map. On every access, move the entry to the head of the list. When evicting, remove from the tail. Both operations are O(1). Redis uses an approximated LRU that samples a few random keys and evicts the least recently used among the sample — this avoids the overhead of maintaining a true LRU list.

LFU (Least Frequently Used)

Remove the entry that has been accessed the fewest times. The intuition: entries that are rarely accessed are less valuable. LFU is better than LRU when there are "scan" workloads — a one-time bulk read of many keys would flush the entire LRU cache, but LFU keeps the truly popular keys because their frequency count is high.

Trade-off: LFU has a "cold start" problem. A new entry has a low frequency count and gets evicted quickly, even if it is about to become popular. Some implementations decay frequency counts over time to address this.

TTL (Time To Live)

Entries expire after a fixed duration regardless of access pattern. This is not really an eviction policy — it is an expiration policy. But it serves the same purpose: keeping the cache fresh and bounded. TTL is typically used alongside LRU: entries can be evicted by LRU when the cache is full, or they can expire by TTL even if the cache is not full.

LRU vs. LFU Eviction

A cache with 5 slots. Watch entries arrive and get evicted. LRU evicts the oldest-accessed; LFU evicts the least-frequently-accessed.

Pick a policy

// Redis eviction policies
noeviction # Return error when memory full (never lose data)
allkeys-lru # LRU among all keys (most common)
allkeys-lfu # LFU among all keys
volatile-lru # LRU among keys with TTL set
volatile-lfu # LFU among keys with TTL set
allkeys-random # Random eviction (surprisingly effective)
volatile-ttl # Evict keys with shortest remaining TTL

The scan problem with LRU. Imagine your cache holds the 1,000 most popular items. A batch job reads 10,000 items once. Under LRU, those 10,000 one-time reads flush all 1,000 popular items from the cache. After the batch completes, your hit rate drops from 95% to 0% while the cache refills. LFU avoids this because the popular items have high frequency counts that survive a one-time scan. This is why Redis added LFU support in version 4.0.

Your LRU cache has 5 slots: [A(5min ago), B(1min), C(3min), D(10min), E(2min)]. A new key F arrives. Which is evicted?

A — it was added first (FIFO logic). D. LRU evicts the least RECENTLY used. D was last accessed 10 minutes ago — the longest gap. A was 5 min, C was 3 min, E was 2 min, B was 1 min. D is the oldest access, so it goes. E — it has the shortest name (random tiebreak).

Chapter 7: Local vs. External Cache

When we say "cache," we usually mean an external cache like Redis. But there is an older, simpler kind: keeping cached data right in your application's memory. Each approach has sharp trade-offs.

Local (In-Process) Cache

Store cached data in a hash map inside your application process. Reads are a memory lookup — nanoseconds, no network hop. This is as fast as caching gets.

// Python local cache (simple dict)
_cache = {}
def get_user(user_id):
  if user_id in _cache:
    return _cache[user_id] # ~100ns, no network
  user = db.query(user_id) # ~5ms, database round trip
  _cache[user_id] = user
  return user

The problem: if you have 10 application instances (for horizontal scaling), each has its own cache. Write to one instance's database, and the other 9 still have stale data. There is no shared state. Cache invalidation becomes a distributed coordination problem.

Also, local caches consume application heap memory. A large cache means more garbage collection pressure, potentially causing latency spikes. And when the application restarts (deploy, crash), the cache is lost — every key starts cold.

External (Shared) Cache

A separate cache server (Redis, Memcached) shared by all application instances. When instance 1 caches a value, instance 2 can read it. When the value is invalidated, all instances see the invalidation immediately.

// External cache (Redis)
App-1 writes: redis.set("user:42", data) # 1ms network round trip
App-2 reads: redis.get("user:42") # 1ms, gets the same data
App-3 reads: redis.get("user:42") # 1ms, also hits

// Invalidation is global:
App-1 writes: redis.delete("user:42") # All instances see the miss

The trade-off: every cache access requires a network round trip (~1ms), compared to ~100ns for a local lookup. That is 10,000x slower. For extremely hot keys accessed millions of times per second, this overhead matters.

Local vs. External Cache

3 app instances share data. Watch how local caching leads to inconsistency while external caching stays synchronized.

Pick a mode

The Two-Tier Pattern

In practice, many systems use both: a small local cache for the hottest keys (L1) backed by a shared Redis cache (L2), backed by the database (L3). This gives you nanosecond latency for the top 100 keys, millisecond latency for the top 10,000 keys, and database latency for everything else.

// Two-tier caching
def get_user(user_id):
  # L1: local cache (100ns, small, per-instance)
  if user_id in local_cache:
    return local_cache[user_id]

  # L2: Redis (1ms, large, shared)
  cached = redis.get(f"user:{user_id}")
  if cached:
    local_cache[user_id] = cached # Promote to L1
    return cached

  # L3: database (5-50ms)
  user = db.query(user_id)
  redis.setex(f"user:{user_id}", 300, user)
  local_cache[user_id] = user
  return user

Local caches must have short TTLs. Since local caches cannot be invalidated externally, use TTLs of 5-30 seconds. This bounds staleness: after a write, within 30 seconds every instance's local cache expires and fetches from Redis (which was already invalidated). For most use cases, 30 seconds of staleness is acceptable.

You have 8 app instances with local caching. User 42 updates their profile on instance 1. How many instances still serve stale data?

0 — the update propagates to all instances automatically. 1 — only instance 1 has the old cached value. 7. Local caches are independent. Instance 1 can update its own cache, but the other 7 instances still hold the old value until their TTL expires. This is why local caching alone is dangerous for mutable data — it creates 7 windows of inconsistency.

Chapter 8: Cache-Aside Simulator

Time to see caching in action. Below is an interactive cache-aside simulation. You control the cache size, TTL, and access pattern. Watch the hit rate climb as the cache warms up, then inject writes and see how invalidation affects performance.

Your mission: Achieve a 90%+ hit rate. Experiment with cache sizes and TTLs. Watch what happens when you switch from Zipfian (realistic) to uniform (worst-case) access patterns. Kill the cache and watch the database suffer.

Cache-Aside Pattern Simulator

Requests arrive for keys 1-100. The cache holds a limited number of entries. Watch hits (fast) and misses (slow) in real time.

Cache Size 20 slots

TTL (sec) 10s

Press Play to begin

What to Try

Experiment 1: Zipfian with 20 slots. Watch the hit rate climb above 80% quickly. A Zipfian distribution means a few keys are accessed much more often than others — these "hot" keys stay in cache.

Experiment 2: Switch to uniform access. Every key is equally likely. The hit rate drops dramatically because there are no "hot" keys to cache. This is the worst case for caching.

Experiment 3: Lower TTL to 2 seconds. Entries expire quickly, forcing more database reads. The hit rate drops even with Zipfian access. This is the cost of freshness — short TTLs mean more misses.

Experiment 4: Increase cache to 50 slots. With 50 of 100 possible keys cached, even uniform access gets a ~50% hit rate. More cache memory directly translates to fewer database reads.

Chapter 9: Connections

Data storage and caching are foundational layers. Here is how they connect to the broader distributed systems landscape.

What We Covered

Concept	Key takeaway
Replication	Single-leader for simplicity, multi-leader for geo-distribution, leaderless for availability
Partitioning	Range for scans, hash for even distribution, compound for both
NoSQL families	KV for speed, document for flexibility, wide-column for analytics, graph for relationships
Cache-aside	Application manages cache; only hot data cached; resilient to cache failure
Write patterns	Write-through for consistency, write-behind for speed, read-through for simplicity
Eviction	LRU for general use, LFU for scan-resistant workloads, TTL as safety net
Local vs. external	Local for nanosecond reads; external for consistency; two-tier for both

Where to Go Next

Topic	Connection
Load Balancing	Consistent hashing in load balancers enables stateful caching at the LB layer
Service Architecture	Service meshes add caching at the sidecar level; API gateways often include response caching
Messaging	Event-driven cache invalidation: database publishes changes to a message queue, cache subscribers invalidate entries
Consensus	Leader election in single-leader replication; distributed locks for cache stampede prevention

Storage and caching decisions define your system's personality. A system with aggressive caching and eventual consistency feels fast and forgiving. A system with write-through caching and synchronous replication feels slow but trustworthy. There is no universally correct choice — only the right choice for your access patterns, consistency requirements, and failure tolerance.

You want sub-millisecond reads for the top 100 keys, millisecond reads for the top 10,000, and can tolerate 5ms for everything else. What architecture achieves this?

A single large Redis instance with all data loaded. Two-tier caching: L1 local in-process cache (holds top 100 keys, ~100ns reads), L2 shared Redis cache (holds top 10,000 keys, ~1ms reads), L3 database for everything else (~5ms reads). Each tier absorbs traffic that the tier above cannot serve, progressively reducing load on slower storage. A database with all data in RAM (like MemSQL/SingleStore).