Coordination Avoidance — From Absolute Zero to Mastery

Chapter 0: The Cost of Waiting

Three data centers: Virginia, Frankfurt, Tokyo. A user in Tokyo clicks "Add to Cart." The system needs to update the shopping cart, which is replicated across all three regions. The conventional approach: send the write to a leader in Virginia, wait for acknowledgment, then respond to the user.

Round-trip from Tokyo to Virginia: 180 milliseconds. The user stares at a spinner. They click again. Now you have a duplicate write problem on top of the latency problem. Meanwhile, a user in Frankfurt doing the exact same operation gets a response in 40 milliseconds because they're closer to the leader. Same product, wildly different experience.

The fundamental issue is coordination. Every time a node must ask another node "is this okay?" before proceeding, it pays a latency tax proportional to the distance between them. In a single data center, that tax is microseconds. Across continents, it's hundreds of milliseconds. Across a congested or partitioned network, it's seconds or infinity.

What if we could design data structures and algorithms that never need to coordinate? Every replica processes writes locally, immediately, with zero round-trips to any other node. Later, when replicas sync up, they automatically converge to the same state — regardless of the order messages arrived, regardless of duplicates, regardless of which writes each replica has seen so far.

This is coordination avoidance: the art of building distributed systems where replicas can operate independently and still agree in the end. The key tools are CRDTs (Conflict-free Replicated Data Types), causal broadcast protocols, and a theoretical result called the CALM theorem that tells us exactly when coordination is — and isn't — necessary.

Why not just use strong consistency? You can. But you pay for it: every write must contact a majority of replicas before completing. With 3 replicas across 3 continents, that's at least one cross-continental round-trip per write. For a shopping cart, that's absurd. For a bank transfer, it's essential. Coordination avoidance is about recognizing which operations don't need global agreement and designing them to work without it.

The Coordination Tax

Let's quantify the cost. Consider a system with three replicas and two approaches:

Approach	Write latency (Tokyo)	Availability during partition	Conflict handling
Coordinated (Raft/Paxos)	180-360ms (must contact majority)	Unavailable if majority unreachable	No conflicts (linearizable)
Uncoordinated (CRDTs)	1-5ms (local write)	Available (all replicas accept writes)	Auto-resolved by merge function

The difference is dramatic. Coordinated writes are 50-100x slower and become unavailable during network partitions. Uncoordinated writes are always fast, always available, but they require clever data structures that can merge concurrent updates without losing information.

Coordination Cost Visualizer

Watch writes propagate in a coordinated vs. uncoordinated system. Notice how the coordinated system must wait for acknowledgments, while the uncoordinated system responds immediately.

Click "Send Write" to see the coordination tax in action.

Quick check: A coordinated write system with replicas in Virginia, Frankfurt, and Tokyo requires a majority acknowledgment. A user in Tokyo sends a write. What is the minimum number of cross-region round-trips needed before the write is confirmed?

Zero — the local replica can confirm alone One — Tokyo writes locally and needs one ACK from the nearest other replica (Frankfurt, ~120ms) Two — must contact both Virginia and Frankfurt Three — all replicas must acknowledge

Chapter 1: Eventual Consistency

Before we build coordination-free data structures, we need to understand the consistency model they provide. When replicas operate independently, they will temporarily disagree about the current state. A user writing to Tokyo sees their update immediately, but a user reading from Frankfurt sees the old value until the update propagates. This model is called eventual consistency.

The formal definition: if no new updates are made to an object, all replicas will eventually return the last updated value. "Eventually" is the key weasel word — it could be 5 milliseconds or 5 minutes, and the definition says nothing about what you'll read in the meantime.

That sounds weak. But eventual consistency comes in different strengths, and understanding the hierarchy is crucial for designing real systems.

The Consistency Hierarchy

Level	Guarantee	Example
Strong eventual consistency (SEC)	Replicas that have received the same set of updates are in the same state — regardless of order	CRDTs
Causal consistency	If event A caused event B, everyone sees A before B	Vector clock ordering
Eventual consistency	Replicas converge "eventually" after updates stop	DNS propagation
Strong consistency (linearizability)	Every read returns the most recent write	Raft/Paxos

The critical distinction is between plain eventual consistency and strong eventual consistency (SEC). Plain eventual means "they'll converge somehow, eventually." SEC means "any two replicas that have processed the same set of updates are guaranteed to be in the same state, right now, with no additional communication needed." SEC is what CRDTs provide, and it's dramatically more useful.

Convergence is the hard part. Eventual consistency says replicas converge. But HOW? If two replicas both update the same key concurrently, which value wins? With plain eventual consistency, you need a conflict resolution strategy (last-writer-wins, application-level merge, manual resolution). With SEC, the data structure itself defines the merge, and the merge is guaranteed to be deterministic — the same inputs always produce the same output, regardless of order.

What Makes SEC Work: Mathematical Properties

For a merge function to guarantee convergence without coordination, it must satisfy three algebraic properties:

// A merge function m(a, b) must be:

Commutative: m(a, b) = m(b, a)
// Order doesn't matter — merging A then B gives the same result as B then A

Associative: m(m(a, b), c) = m(a, m(b, c))
// Grouping doesn't matter — you can merge pairwise in any order

Idempotent: m(a, a) = a
// Merging something with itself is a no-op — duplicates are harmless

These three properties form a join-semilattice. That's a fancy name for a structure where merging always moves "upward" toward a common state and never moves backward. Think of it like water flowing downhill — no matter what path it takes, it always ends up at the lowest point. A semilattice guarantees that no matter what order updates arrive, the final state is always the same.

Convergence Visualizer

Three replicas receive updates in different orders. Watch them converge to the same state. Toggle between a commutative merge (CRDT) and a non-commutative one (naive overwrite) to see the difference.

Click Step to deliver updates to replicas in shuffled order.

Check: A merge function m is commutative and associative but NOT idempotent. Replica A receives the same update twice due to a network retry. What happens?

Nothing — commutativity handles duplicates The duplicate is applied twice, causing the state to diverge from replicas that only received the update once The system detects the duplicate and discards it

Chapter 2: The G-Counter

Let's build the simplest possible CRDT: a counter that can only go up. This is the G-Counter (Grow-only Counter). Think of it as a distributed "like" button — every replica can increment it locally, and when replicas merge, the total is always correct.

The naive approach fails immediately. If three replicas each store a single number and increment it locally, merging by taking the maximum loses increments. Merging by adding counts doubles them. There's no single-number representation that works.

The insight: give each replica its own slot in a vector. Replica A only increments slot A. Replica B only increments slot B. The total count is the sum of all slots. When merging, take the component-wise maximum of each slot.

The Data Structure

// G-Counter: each replica has a vector of counts
// Replica IDs: A, B, C

State = { A: 0, B: 0, C: 0 }

// Increment on replica A:
increment(replicaId):
  state[replicaId] += 1

// Read the counter value:
value():
  return ∑(state[i]) for all i

// Merge two replicas:
merge(local, remote):
  for each key k:
    result[k] = max(local[k], remote[k])

Why Component-wise Max Works

Each slot is "owned" by exactly one replica. Only replica A ever increments slot A. So slot A is a monotonically increasing counter. Taking the maximum of two values for the same slot always yields the most recent value — because it can only go up.

Let's trace through a concrete example:

// Initial state at all replicas:
A: {A:0, B:0, C:0} total=0
B: {A:0, B:0, C:0} total=0
C: {A:0, B:0, C:0} total=0

// Replica A increments twice, Replica B increments once (concurrently):
A: {A:2, B:0, C:0} total=2
B: {A:0, B:1, C:0} total=1
C: {A:0, B:0, C:0} total=0

// A merges with B:
merge({A:2,B:0,C:0}, {A:0,B:1,C:0})
= {A:max(2,0), B:max(0,1), C:max(0,0)}
= {A:2, B:1, C:0} total=3 // Correct! 2+1=3

// Now B merges with A (same inputs, reversed):
merge({A:0,B:1,C:0}, {A:2,B:0,C:0})
= {A:2, B:1, C:0} total=3 // Same result! (commutative ✓)

// Merging again (idempotent):
merge({A:2,B:1,C:0}, {A:2,B:1,C:0})
= {A:2, B:1, C:0} total=3 // No change (idempotent ✓)

The semilattice property holds. Component-wise max is commutative (max(a,b) = max(b,a)), associative (max(max(a,b),c) = max(a,max(b,c))), and idempotent (max(a,a) = a). Therefore the G-Counter is a CRDT with strong eventual consistency guaranteed.

python
class GCounter:
    def __init__(self, replica_id, peers):
        self.id = replica_id
        self.counts = {p: 0 for p in peers}

    def increment(self):
        self.counts[self.id] += 1

    def value(self):
        return sum(self.counts.values())

    def merge(self, other):
        for k in self.counts:
            self.counts[k] = max(self.counts[k], other.counts[k])

Interactive G-Counter

Three replicas with independent G-Counters. Increment each locally, then merge pairs to watch convergence. The vector is shown inside each node.

Increment replicas independently, then merge to see convergence.

Check: A G-Counter has 5 replicas. Each replica has incremented its own slot 10 times. No merges have happened yet. What does each replica report as the counter value?

10 — each replica only sees its own increments 50 — the sum of all increments 0 — nothing is confirmed without merging

Chapter 3: The PN-Counter

A G-Counter only goes up. But what about a shopping cart quantity, a vote tally, or a stock level? You need to decrement too. The PN-Counter (Positive-Negative Counter) solves this by maintaining two G-Counters: one for increments (P) and one for decrements (N). The value is P - N.

Think of it as double-entry bookkeeping. Instead of erasing a number and writing a smaller one (which is not monotonic and breaks CRDTs), you record every subtraction as an addition in a separate ledger. The balance is always computable from the two ledgers, and both ledgers only grow.

The Data Structure

// PN-Counter: two G-Counters
P = { A:0, B:0, C:0 } // positive increments
N = { A:0, B:0, C:0 } // negative increments (decrements)

increment(replicaId): P[replicaId] += 1
decrement(replicaId): N[replicaId] += 1

value(): ∑(P[i]) - ∑(N[i])

merge(local, remote):
P_merged[k] = max(local.P[k], remote.P[k]) for all k
N_merged[k] = max(local.N[k], remote.N[k]) for all k

Worked Example

// Replica A increments 3 times, Replica B decrements 1 time (concurrently):
A: P={A:3,B:0} N={A:0,B:0} value = 3-0 = 3
B: P={A:0,B:0} N={A:0,B:1} value = 0-1 = -1

// After merge:
P = {A:max(3,0), B:max(0,0)} = {A:3, B:0}
N = {A:max(0,0), B:max(0,1)} = {A:0, B:1}
value = (3+0) - (0+1) = 2 // Correct: 3 increments - 1 decrement = 2

Can the PN-Counter go negative? Yes. There's no built-in floor. If Replica B decrements 5 times without any increments, the value is -5. If you need a counter that can't go below zero (like inventory), you need a more sophisticated CRDT called a Bounded Counter, which pre-allocates decrement budgets to each replica. The trade-off: bounded counters require occasional coordination to redistribute budgets.

python
class PNCounter:
    def __init__(self, replica_id, peers):
        self.P = GCounter(replica_id, peers)
        self.N = GCounter(replica_id, peers)

    def increment(self):
        self.P.increment()

    def decrement(self):
        self.N.increment()  # Note: incrementing the NEGATIVE counter

    def value(self):
        return self.P.value() - self.N.value()

    def merge(self, other):
        self.P.merge(other.P)
        self.N.merge(other.N)

Interactive PN-Counter

Three replicas with independent PN-Counters. Increment or decrement each, then merge to see both the P and N vectors converge. The bar chart shows each replica's computed value.

Increment and decrement replicas, then merge to converge.

Check: Replica A increments 5 times. Replica B decrements 3 times. Replica C increments 2 times. After a full merge, what is the counter value?

4 (5 + 2 - 3 = 4) 7 (5 + 2) 0 (concurrent writes cancel out)

Chapter 4: LWW-Register & OR-Set

Counters are elegant, but most real data isn't a single number. What about a user's display name? An email address? A set of tags? We need CRDTs for arbitrary values and collections.

LWW-Register: Last Writer Wins

The LWW-Register (Last-Writer-Wins Register) is the simplest CRDT for single values. Each write carries a timestamp. When merging, the value with the higher timestamp wins. That's it.

// LWW-Register state:
{ value: "Alice", timestamp: 1042 }

// Write:
write(v): state = { value: v, timestamp: now() }

// Merge:
merge(local, remote):
  if remote.timestamp > local.timestamp:
    state = remote
  else:
    state = local // keep local (ties broken by replica ID)

LWW is simple but lossy. If two replicas write different values at nearly the same time, one value is silently discarded. User A sets their name to "Alice" at t=1000, User B sets it to "Bob" at t=1001. After merge: "Bob" wins, "Alice" is gone forever. There's no record that "Alice" was ever written. For many use cases this is fine (the user will just re-submit). For others (collaborative editing, shopping carts), it's unacceptable.

OR-Set: Observed-Remove Set

The OR-Set (Observed-Remove Set) is a CRDT for sets that handles concurrent add/remove correctly. The key insight: every add operation generates a unique tag. A remove operation only removes the tags it has observed. If a concurrent add creates a new tag that the remover hasn't seen, that tag survives.

// OR-Set: each element has a set of unique tags
State = { "milk": {tag1, tag2}, "eggs": {tag3} }

// Add: generate a unique tag for this addition
add("milk"): state["milk"].add(unique_tag())

// Remove: delete ALL currently visible tags for this element
remove("milk"): state["milk"] = ∅

// Merge: union of all tags per element
merge(local, remote):
  for each element e:
    result[e] = local[e] ∪ remote[e]
  remove elements with empty tag sets

// An element is "in the set" if it has at least one tag

The Add-Remove Paradox

Why is a set CRDT hard? Consider a plain set with concurrent operations:

// Replica A: add("milk") at t=1
// Replica B: remove("milk") at t=1 (concurrent!)

// After merge: is "milk" in the set or not?
// - If add wins: remove has no effect (surprising)
// - If remove wins: add has no effect (data loss)
// - Neither answer is "obviously correct"

// OR-Set solution: add-wins semantics
// Replica A's add creates tag7 for "milk"
// Replica B's remove deletes all tags it HAS SEEN (none, since tag7 is new)
// After merge: tag7 still exists → "milk" is in the set
// This is "add wins" — concurrent adds survive concurrent removes

Why "add wins" makes sense for shopping carts. If you add milk to your cart from your phone and your partner removes it from the laptop at the same time, the add wins — milk stays in the cart. It's better to have an extra item you can remove later than to silently lose an addition. Amazon's Dynamo shopping cart famously used this approach: deleted items could "resurrect" after merges, but users preferred that to mysteriously disappearing items.

OR-Set Visualizer

Two replicas with an OR-Set. Add and remove items concurrently, then merge to see how tags determine the final set contents.

Add and remove items on each replica, then merge to see OR-Set semantics.

Check: In an OR-Set, Replica A adds "X" (creating tag1), then syncs with Replica B. Replica B now also has "X" with tag1. Now concurrently: A removes "X" (deletes tag1), B adds "X" again (creating tag2). After merge, is "X" in the set?

Yes — B's add created tag2, which A's remove didn't observe, so tag2 survives the merge No — A removed "X" and that takes precedence It depends on which message arrives first

Chapter 5: Broadcast Protocols

CRDTs define what data structures look like and how they merge. But they say nothing about how updates actually get from one replica to another. That's the job of broadcast protocols — the plumbing that delivers updates across the network.

There are three levels of delivery guarantees, each progressively stronger:

1. Best-Effort Broadcast

Send the message to every replica. If you crash, some replicas might not get it. No retries, no guarantees. This is basically UDP multicast. Useless for anything that matters.

2. Reliable Broadcast

If a non-faulty replica receives a message, then every non-faulty replica eventually receives it. Even if the original sender crashes midway through broadcasting, the other replicas re-broadcast the message to ensure everyone gets it.

Implementation: eager reliable broadcast. When a replica receives a message it hasn't seen before, it re-broadcasts it to all other replicas. This guarantees delivery as long as the network is eventually connected, but it's expensive: O(n²) messages for n replicas. A more efficient approach is gossip: each replica periodically picks a random peer and exchanges updates. Gossip achieves reliable delivery with O(n log n) messages — at the cost of higher latency.

3. Causal Broadcast

Reliable broadcast plus ordering: if message A causally precedes message B (meaning B was sent after seeing A), then every replica delivers A before B. Messages that are concurrent (neither caused the other) can be delivered in any order.

Why does order matter? Consider a chat application:

// Alice: "Does anyone want pizza?" (message M1)
// Bob (after seeing M1): "Yes please!" (message M2, caused by M1)

// Without causal ordering, Carol might see:
// Bob: "Yes please!"
// Alice: "Does anyone want pizza?"
// This is confusing — Bob's reply appears before Alice's question.

// With causal broadcast, Carol always sees M1 before M2
// because M2 was CAUSED by M1 (Bob saw M1 before sending M2).

Gossip Protocol: The Epidemic Approach

Gossip protocols (also called epidemic protocols) spread updates the way rumors spread in a crowd. Each node periodically picks a random neighbor and shares its latest updates. This is remarkably effective:

Property	Value
Messages per update	O(n log n) — each node contacts ~log(n) peers
Convergence time	O(log n) rounds to reach all nodes
Fault tolerance	Works even with high packet loss; redundant paths
Drawback	Non-deterministic latency; no ordering guarantee

Gossip Protocol Simulator

Watch an update spread through a network of 12 nodes via gossip. Each round, every informed node contacts one random neighbor. Green = has the update, gray = hasn't received it yet.

Node 0 has a new update. Click "Gossip Round" to spread it.

Check: A gossip protocol runs on a cluster of 1000 nodes. Approximately how many rounds of gossip are needed for an update to reach all nodes (assuming each informed node contacts one random uninformed node per round)?

1000 rounds ~10 rounds (log₂(1000) ≈ 10) ~30 rounds (3 × log₂(1000)) 500 rounds (half the cluster)

Chapter 6: Causal Consistency

CRDTs guarantee convergence. Gossip guarantees delivery. But neither guarantees that updates arrive in an order that makes sense. Causal consistency fills this gap: it ensures that if one event influenced another, every replica sees them in that order.

The tool for tracking causality is the vector clock. Each replica maintains a vector of logical timestamps — one slot per replica, just like the G-Counter. When a replica performs an operation, it increments its own slot. When it receives a message, it merges the incoming vector with its own (component-wise max) and then increments its own slot.

Vector Clock Mechanics

// Three replicas: A, B, C. Vector clock = [A_count, B_count, C_count]

// A sends message m1:
A increments: VC_A = [1, 0, 0]
m1 carries VC = [1, 0, 0]

// B receives m1, then sends m2:
B merges: VC_B = max([0,0,0], [1,0,0]) = [1,0,0]
B increments: VC_B = [1, 1, 0]
m2 carries VC = [1, 1, 0]

// C sends m3 (hasn't seen m1 or m2):
C increments: VC_C = [0, 0, 1]
m3 carries VC = [0, 0, 1]

// Causality detection:
m1 [1,0,0] < m2 [1,1,0] → m1 happened before m2 (m1 CAUSED m2)
m1 [1,0,0] || m3 [0,0,1] → concurrent (neither caused the other)
m2 [1,1,0] || m3 [0,0,1] → concurrent

The Ordering Rules

Given two vector clocks V1 and V2:

V1 < V2 (V1 happened before V2):
  V1[i] ≤ V2[i] for ALL i, AND V1[j] < V2[j] for at LEAST one j

V1 || V2 (concurrent):
  V1[i] > V2[i] for some i, AND V1[j] < V2[j] for some j
  // Neither vector dominates the other

Vector clocks give you exactly what you need. They capture the "happens-before" relationship: if A's operation influenced B's operation (because B saw A's state before acting), the vector clock records this. If two operations happened independently, the vector clocks are incomparable — and you know they're concurrent. This is precisely the information needed to decide when to hold a message (wait for a causal dependency) and when to deliver it immediately.

Causal Delivery Algorithm

A replica holds an incoming message in a buffer until all causally preceding messages have been delivered. The check is simple:

1. Message Arrives

Message m from replica j carries vector clock VC_m

↓

2. Check Dependencies

For all i ≠ j: VC_m[i] ≤ VC_local[i]? (Have we seen everything m depends on?) AND VC_m[j] = VC_local[j] + 1? (Is this the next message from j?)

↓

3. Deliver or Buffer

If yes: deliver m, update VC_local, check buffer for newly deliverable messages. If no: buffer m and wait.

Vector Clock & Causal Delivery

Three replicas send messages. The vector clock is shown at each node. Messages are held in a buffer (yellow) until causal dependencies are met, then delivered (green).

Send messages between replicas and watch causal delivery in action.

Check: Replica B receives message m2 with vector clock [2, 0, 1]. B's current vector clock is [1, 3, 1]. Can B deliver m2 immediately?

Yes — B has seen everything m2 depends on No — m2 depends on [2,_,_] but B has only seen [1,_,_] from A, meaning B is missing a message from A It depends on which replica sent m2

Chapter 7: The CALM Theorem

We've built coordination-free counters, registers, and sets. But how do we know in general which programs can run without coordination? Is there a test? The CALM theorem (Consistency As Logical Monotonicity) provides exactly this answer.

The theorem, proven by Hellerstein and Alvaro in 2010, states:

CALM Theorem: A distributed program has a consistent, coordination-free implementation if and only if it is monotonic. A program is monotonic if adding more input never causes it to retract a previous output.

Let's unpack "monotonic." A function is monotonic if learning more information only adds to its output — it never takes away. Think of it as a one-way ratchet: once you've concluded something, new data can only add new conclusions, never invalidate old ones.

Monotonic vs. Non-Monotonic

Program	Monotonic?	Why
UNION of two tables	Yes	Adding rows to either input only adds rows to the output
SELECT WHERE x > 5	Yes	Adding new rows can only add new matches, never remove existing ones
JOIN of two tables	Yes	New rows in one table can only create new matches
*COUNT()**	No!	The count can change with retractions or corrections
NOT EXISTS	No!	A row that "doesn't exist" could appear later, invalidating the conclusion
MIN/MAX	No!	A new value could become the new min/max
Set difference (A - B)	No!	Adding to B can remove items from the result

Why This Matters for System Design

CALM gives you a compiler-like test for coordination requirements. When designing a distributed system:

1. Write the Logic

Express your distributed computation as a set of rules over data

↓

2. Check Monotonicity

Is every rule monotonic? (Union, join, filter, projection are. Negation, aggregation, threshold are not.)

↓

3. Decide

Monotonic parts run coordination-free. Non-monotonic parts need coordination (consensus, barriers, etc.).

Dynamo-Style Quorum: Coordination You Can Tune

Sometimes you need some coordination but want to minimize it. Quorum systems let you tune the trade-off. In a system with N replicas:

// Quorum parameters:
N = total replicas
W = write quorum (how many replicas must acknowledge a write)
R = read quorum (how many replicas must respond to a read)

// Strong consistency requires:
W + R > N
// This guarantees read and write quorums overlap by at least one replica,
// so every read sees at least one copy of the latest write.

// Examples with N=3:
W=2, R=2: Standard quorum. Overlap=1. Tolerates 1 failure for reads AND writes.
W=3, R=1: Fast reads, slow writes. Every replica has latest data.
W=1, R=3: Fast writes, slow reads. Must read all to find latest.
W=1, R=1: No coordination! But no consistency guarantee (W+R=2 ≤ N=3).

Dynamo's sloppy quorums. Amazon's Dynamo (used for shopping carts) introduced "sloppy quorums": during a network partition, writes can go to ANY W available nodes, even if they're not the designated replicas. This increases availability at the cost of temporary inconsistency. When the partition heals, data is transferred to the correct nodes via "hinted handoff." Sloppy quorums prioritize availability over consistency — the W+R>N guarantee doesn't hold during partitions.

Quorum Math Visualizer

Adjust W and R to see how quorums overlap (or don't). Green = overlap (consistency guaranteed). Red = no overlap (stale reads possible).

W (write quorum) 2

R (read quorum) 2

N (total replicas) 3

W + R > N ensures consistency. Adjust sliders to explore.

Check: You have a system with N=5 replicas. You want fast reads (R=1). What is the minimum W needed to guarantee strong consistency?

W = 5 (must write to ALL replicas so any single read finds the latest value) W = 3 (majority is always enough) W = 4

Chapter 8: CRDT Playground

Time to put everything together. This interactive playground simulates a cluster of three replicas, each running a full set of CRDTs (G-Counter, PN-Counter, LWW-Register, OR-Set). You can perform operations on any replica, create network partitions, and watch how the CRDTs converge when replicas reconnect.

The goal: break the system. Try creating partitions, performing concurrent conflicting operations, and then healing the partition. Watch the CRDTs automatically converge to a consistent state — no coordination needed. Notice how LWW-Register loses data on conflicts while OR-Set preserves both adds.

Full CRDT Playground

Three replicas running CRDTs. Perform operations, create partitions, merge, and watch convergence. The convergence status shows whether all replicas agree.

Perform operations on replicas, create partitions, then merge to see convergence.

Notice what happens when you partition and do conflicting operations:

Scenario	Counter behavior	Set behavior
A increments during partition, B decrements	Both are preserved: net = inc - dec	N/A
A adds "x", B adds "y" during partition	N/A	Both survive after merge: {x, y}
A adds "x", B removes "x" during partition	N/A	"x" survives (add-wins in OR-Set)

Chapter 9: Connections

Coordination avoidance is not a universal solution. It's a powerful tool for specific situations — and understanding when to use it (and when not to) is what separates a good distributed systems engineer from a great one.

When to Use CRDTs

Use Case	CRDT Type	Why It Works
Like/upvote counters	G-Counter	Only increments, high availability needed
Shopping cart quantities	PN-Counter	Add/remove items, availability over accuracy
Collaborative text editing	Sequence CRDT (RGA, LSEQ)	Concurrent inserts/deletes merge automatically
User presence/status	LWW-Register	Only latest status matters
Tag sets, feature flags	OR-Set	Concurrent add/remove with clear semantics

When NOT to Use CRDTs

Use Case	Why CRDTs Fail	Use Instead
Bank transfers	Balance can't go negative — needs coordination	Serializable transactions
Unique username registration	Uniqueness is inherently non-monotonic	Consensus (Raft/Paxos)
Inventory with stock limits	Bounded counters need periodic coordination	Escrow/reservation pattern
Ordered event logs	Total order requires coordination	Replicated log (Kafka, Raft)

The Landscape

Coordination Spectrum

Where different approaches sit on the coordination-consistency spectrum.

The real skill is decomposing a system into parts that need coordination (account balances, unique constraints) and parts that don't (view counters, shopping carts, activity feeds). Minimize the coordinated surface area, and your system stays fast and available even during partitions.

Related lessons:

Distributed Transactions — when you DO need coordination: 2PC, sagas, isolation levels
Database Replication — the replication fundamentals that CRDTs build on
Partitioning & Storage — splitting data across nodes, complementary to replication

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." — Leslie Lamport