Replication strategies, partitioning, NoSQL taxonomy, caching patterns, and eviction policies.
You have built a web application. Users sign up, create profiles, post content. Everything lives in one PostgreSQL database on one server. Reads take 2 milliseconds. Life is good.
Then you grow. A million users. Ten million posts. Your database disk is 90% full. Read latency creeps from 2ms to 20ms because the working set no longer fits in RAM and the database starts hitting disk. Writes are even worse — every write must be persisted to disk and update indexes. Users start complaining about slow page loads.
You have three fundamental problems, and each requires a different solution:
| Problem | Symptom | Solution |
|---|---|---|
| Too much data for one machine | Disk full, storage limit | Partitioning — split data across machines |
| Too many reads for one machine | High CPU, slow queries | Replication — copy data to read replicas |
| Reads are too slow (even with capacity) | 20ms per query, database as bottleneck | Caching — keep hot data in memory |
These are not just theoretical concerns. Every system that grows beyond a single machine must solve all three. And they interact: caching reduces load on replicas, replication provides read capacity that makes cache misses tolerable, and partitioning determines where data lives (which affects cache locality).
Watch requests hit a single database. As load increases, latency spikes because the working set exceeds RAM. A cache absorbs the hot reads, bringing latency back down.
With caching, 90% of reads never touch the database. They are served from memory in under a millisecond. The database only handles cache misses (10% of reads) and all writes. This is the fundamental insight of caching: most access patterns follow a power law — a small set of "hot" keys accounts for the majority of reads. If you keep those hot keys in memory, you can reduce database load by an order of magnitude.
Replication means keeping copies of the same data on multiple machines. It serves two purposes: fault tolerance (if one machine dies, others have the data) and read scaling (spread reads across multiple copies). The fundamental challenge is keeping copies in sync when data changes.
One node is the leader (also called primary or master). All writes go to the leader. The leader streams changes to followers (replicas, secondaries). Reads can go to any replica. This is how PostgreSQL, MySQL, MongoDB (by default), and most traditional databases work.
The critical question: does the leader wait for followers to confirm before acknowledging the write to the client? Synchronous replication waits — the client does not get "OK" until at least one follower has confirmed. Durable but slow. Asynchronous replication acknowledges immediately after the leader writes — fast but risky, because if the leader crashes before followers catch up, those writes are lost.
In multi-leader (also called master-master), multiple nodes accept writes. Each leader replicates to the others. This is useful for geographically distributed systems: a user in Tokyo writes to the Tokyo leader with low latency, and the change replicates to the Virginia leader asynchronously.
The poison: write conflicts. User A changes a record on leader 1 while user B changes the same record on leader 2. Both succeed locally. When the changes replicate, each leader sees a conflicting update. Which one wins? Common resolution strategies: last-write-wins (LWW, by timestamp — lossy), merge the values (application-specific), or flag the conflict for manual resolution.
Leaderless (used by Cassandra and DynamoDB) has no designated leader. Any node can accept writes. The client writes to multiple nodes simultaneously. Reads also query multiple nodes. Consistency is achieved through quorum: if there are N replicas, a write succeeds if W nodes confirm, and a read succeeds if R nodes respond. As long as W + R > N, you are guaranteed to read your own writes.
Watch how writes propagate under each strategy. Single-leader: all writes go to one node. Multi-leader: writes go to any leader. Leaderless: writes go to a quorum.
Replication copies data for redundancy. Partitioning (also called sharding) splits data for capacity. Each partition holds a subset of the data on a different machine. Together, all partitions hold the full dataset. This is how you go beyond what one machine can store or process.
Assign each partition a continuous range of keys. Keys A-M go to partition 1, N-Z go to partition 2. Efficient for range queries ("all users whose name starts with 'S'") because you only query one or two partitions. But vulnerable to hotspots: if keys are timestamps, all recent writes land on one partition.
Hash each key and assign the hash to a partition. partition = hash(key) % N. Distributes data evenly regardless of key patterns — no hotspots. But range queries are impossible: adjacent keys hash to random partitions, so "all users from A to C" requires querying every partition.
Some systems combine both. Cassandra uses the first part of the key for hash partitioning (to distribute data) and the second part for range ordering within each partition (to support efficient range queries on that second dimension).
Watch keys distribute across 4 partitions. Range partitioning creates hotspots with sequential keys. Hash partitioning distributes evenly.
Not all data fits neatly into rows and columns. The NoSQL movement (which really means "not only SQL") emerged because relational databases are a poor fit for some workloads. There are four major families, each optimized for different access patterns.
The simplest model: a key maps to a blob of data. The store does not care what the value contains. GET(key) and PUT(key, value) are the only operations. Because the store does not need to parse or index the value, it can be extremely fast — sub-millisecond reads.
Use cases: session storage, user preferences, shopping carts, caching. Examples: Redis, Memcached, DynamoDB (simple mode), etcd.
Trade-off: No queries on values. You cannot say "find all users where age > 30" without reading every value and checking. If you need to query by multiple attributes, a key-value store forces you to maintain secondary indexes manually.
Like key-value, but the store understands the value structure — it is a JSON (or BSON) document. This means you can create indexes on fields within the document and query on any field. Documents are self-contained: a user document includes their name, address, orders, and preferences in one blob.
Use cases: content management, user profiles, product catalogs — anything with variable structure. Examples: MongoDB, CouchDB, Firestore.
Trade-off: Great for reading a full document, terrible for joins across documents. If a user's order references a product document, you need two reads (or denormalize and risk inconsistency).
Data is organized by row key, column family, and column name. Think of it as a two-dimensional key: (row_key, column_name) → value. Each row can have a different set of columns — there is no fixed schema. Columns within a family are stored together on disk, enabling efficient reads of specific column subsets.
Use cases: time-series data, analytics, event logs — any workload with massive row counts and queries that touch specific column subsets. Examples: Cassandra, HBase, BigTable.
Store data as nodes (entities) and edges (relationships). Queries traverse edges: "find all friends of friends who work at company X." Relational databases can do this with joins, but each hop requires a join, and joining 6 levels deep is prohibitively slow. Graph databases index edges natively, making traversals constant-time per hop.
Use cases: social networks, fraud detection, recommendation engines, knowledge graphs. Examples: Neo4j, Amazon Neptune, TigerGraph.
Click each family to see its data model, query pattern, and strengths.
| Family | Data model | Query style | Best for |
|---|---|---|---|
| Key-Value | key → opaque blob | GET/PUT by key | Caching, sessions, config |
| Document | key → JSON document | Query any field, index fields | CMS, profiles, catalogs |
| Wide-Column | (row, column) → value | Scan rows, filter columns | Time-series, analytics, logs |
| Graph | nodes + edges | Traverse relationships | Social, fraud, recommendations |
The most common caching pattern in production systems is cache-aside (also called lazy loading or look-aside). The application code is responsible for managing the cache — the cache and database do not talk to each other directly.
Only hot data gets cached. The cache fills organically: data that is read often gets cached automatically (because it is read, found missing, fetched, and cached). Data that is never read is never cached. The cache naturally reflects the access pattern. No upfront work to decide what to cache.
Cache failures are survivable. If the cache goes down, the application falls back to the database. Latency increases but the system does not break. This is a critical advantage over patterns where the cache is in the write path.
When data is updated in the database, the cache still holds the old value. There are two strategies:
TTL-based expiration: Set a TTL (Time To Live) on every cached entry. After 5 minutes, the entry expires and the next read fetches from the database. Staleness window = TTL. Simple but imprecise.
Explicit invalidation: When updating data, also delete the cache entry. The next read will miss, fetch the new value, and repopulate. More consistent but requires discipline — every write path must remember to invalidate.
Watch reads flow through the cache. First access = miss (goes to DB). Subsequent accesses = hit (served from cache). Click "Update DB" to see invalidation.
Cache-aside puts the application in charge. But there are patterns where the cache itself coordinates with the database, simplifying application code at the cost of more complex infrastructure.
Every write goes to the cache first. The cache synchronously writes to the database before returning. The application writes to the cache, and the cache handles persistence. The cache always has the latest data — no staleness.
The cache accepts the write and returns immediately. The cache asynchronously flushes to the database later — either after a delay or in batches. This makes writes extremely fast (memory speed) but introduces a durability risk: if the cache crashes before flushing, those writes are lost.
Read-through is like cache-aside but the cache, not the application, fetches from the database on a miss. The application always reads from the cache. On a miss, the cache itself queries the DB, stores the result, and returns it. This simplifies application code — the application does not need miss-handling logic.
Watch how writes flow through each pattern. Notice the latency and durability differences.
| Pattern | Write latency | Read consistency | Durability | Complexity |
|---|---|---|---|---|
| Cache-Aside | DB speed (writes bypass cache) | Stale until TTL/invalidation | High (writes go to DB first) | In application |
| Write-Through | DB speed (cache writes through) | Always fresh | High | In cache layer |
| Write-Behind | Memory speed (async flush) | Always fresh in cache | Risk of loss | In cache layer |
| Read-Through | N/A (read pattern) | Fresh after first read | N/A | In cache layer |
Cache memory is finite. Your database might hold 1 TB of data, but your Redis instance has 64 GB of RAM. When the cache is full and a new entry needs to be stored, something must be removed. The eviction policy decides what gets kicked out.
Remove the entry that has not been accessed for the longest time. The intuition: if you have not read a key in a while, you probably will not read it soon. This is the most popular eviction policy because it works well for most access patterns.
Implementation: A doubly-linked list + hash map. On every access, move the entry to the head of the list. When evicting, remove from the tail. Both operations are O(1). Redis uses an approximated LRU that samples a few random keys and evicts the least recently used among the sample — this avoids the overhead of maintaining a true LRU list.
Remove the entry that has been accessed the fewest times. The intuition: entries that are rarely accessed are less valuable. LFU is better than LRU when there are "scan" workloads — a one-time bulk read of many keys would flush the entire LRU cache, but LFU keeps the truly popular keys because their frequency count is high.
Trade-off: LFU has a "cold start" problem. A new entry has a low frequency count and gets evicted quickly, even if it is about to become popular. Some implementations decay frequency counts over time to address this.
Entries expire after a fixed duration regardless of access pattern. This is not really an eviction policy — it is an expiration policy. But it serves the same purpose: keeping the cache fresh and bounded. TTL is typically used alongside LRU: entries can be evicted by LRU when the cache is full, or they can expire by TTL even if the cache is not full.
A cache with 5 slots. Watch entries arrive and get evicted. LRU evicts the oldest-accessed; LFU evicts the least-frequently-accessed.
When we say "cache," we usually mean an external cache like Redis. But there is an older, simpler kind: keeping cached data right in your application's memory. Each approach has sharp trade-offs.
Store cached data in a hash map inside your application process. Reads are a memory lookup — nanoseconds, no network hop. This is as fast as caching gets.
The problem: if you have 10 application instances (for horizontal scaling), each has its own cache. Write to one instance's database, and the other 9 still have stale data. There is no shared state. Cache invalidation becomes a distributed coordination problem.
Also, local caches consume application heap memory. A large cache means more garbage collection pressure, potentially causing latency spikes. And when the application restarts (deploy, crash), the cache is lost — every key starts cold.
A separate cache server (Redis, Memcached) shared by all application instances. When instance 1 caches a value, instance 2 can read it. When the value is invalidated, all instances see the invalidation immediately.
The trade-off: every cache access requires a network round trip (~1ms), compared to ~100ns for a local lookup. That is 10,000x slower. For extremely hot keys accessed millions of times per second, this overhead matters.
3 app instances share data. Watch how local caching leads to inconsistency while external caching stays synchronized.
In practice, many systems use both: a small local cache for the hottest keys (L1) backed by a shared Redis cache (L2), backed by the database (L3). This gives you nanosecond latency for the top 100 keys, millisecond latency for the top 10,000 keys, and database latency for everything else.
Time to see caching in action. Below is an interactive cache-aside simulation. You control the cache size, TTL, and access pattern. Watch the hit rate climb as the cache warms up, then inject writes and see how invalidation affects performance.
Requests arrive for keys 1-100. The cache holds a limited number of entries. Watch hits (fast) and misses (slow) in real time.
Experiment 1: Zipfian with 20 slots. Watch the hit rate climb above 80% quickly. A Zipfian distribution means a few keys are accessed much more often than others — these "hot" keys stay in cache.
Experiment 2: Switch to uniform access. Every key is equally likely. The hit rate drops dramatically because there are no "hot" keys to cache. This is the worst case for caching.
Experiment 3: Lower TTL to 2 seconds. Entries expire quickly, forcing more database reads. The hit rate drops even with Zipfian access. This is the cost of freshness — short TTLs mean more misses.
Experiment 4: Increase cache to 50 slots. With 50 of 100 possible keys cached, even uniform access gets a ~50% hit rate. More cache memory directly translates to fewer database reads.
Data storage and caching are foundational layers. Here is how they connect to the broader distributed systems landscape.
| Concept | Key takeaway |
|---|---|
| Replication | Single-leader for simplicity, multi-leader for geo-distribution, leaderless for availability |
| Partitioning | Range for scans, hash for even distribution, compound for both |
| NoSQL families | KV for speed, document for flexibility, wide-column for analytics, graph for relationships |
| Cache-aside | Application manages cache; only hot data cached; resilient to cache failure |
| Write patterns | Write-through for consistency, write-behind for speed, read-through for simplicity |
| Eviction | LRU for general use, LFU for scan-resistant workloads, TTL as safety net |
| Local vs. external | Local for nanosecond reads; external for consistency; two-tier for both |
| Topic | Connection |
|---|---|
| Load Balancing | Consistent hashing in load balancers enables stateful caching at the LB layer |
| Service Architecture | Service meshes add caching at the sidecar level; API gateways often include response caching |
| Messaging | Event-driven cache invalidation: database publishes changes to a message queue, cache subscribers invalidate entries |
| Consensus | Leader election in single-leader replication; distributed locks for cache stampede prevention |