Disjoint Sets (CLRS Ch 21) — Engineermaxxing

Chapter 0: The Problem

You are building a social network. Users join one at a time, and friendships form between pairs of users. At any moment, someone asks: "Are Alice and Bob in the same friend group?" meaning — is there a chain of friendships connecting them, however indirect?

The naive approach: every time someone asks, run a BFS or DFS from Alice. If you reach Bob, they are connected. Each query takes O(V + E) time — scanning through every user and every friendship. When your network has millions of users and billions of friendship edges, this is catastrophic. A single "are they connected?" query could take seconds.

But notice something: we do not need shortest paths. We do not need the actual chain of friends. We just need a yes/no answer: same group or different group? This is a much simpler question. Can we exploit that simplicity?

Yes. The Union-Find data structure (also called a disjoint-set or disjoint-set union, DSU) answers connectivity queries in nearly O(1) time — effectively constant for any practical input size. It supports two operations: Union (merge two groups when a friendship forms) and Find (which group does this person belong to?). Checking if Alice and Bob are in the same group is just comparing Find(Alice) == Find(Bob).

With two clever tricks — union by rank and path compression — each operation takes amortized O(α(n)) time, where α is the inverse Ackermann function. We will devote an entire chapter (Chapter 6) to understanding this function, but the punchline is: for all practical values of n (even n = 10⁸⁰, the number of atoms in the observable universe), α(n) ≤ 4. Four pointer hops. That is effectively free.

The genius of Union-Find is that it achieves this bound without any complex machinery. No balanced trees, no hash tables, no heap operations. Just an array of parent pointers and two simple rules. The entire implementation is about 20 lines of code in any language. It is one of the rare algorithms where simplicity and optimality coincide perfectly.

The core idea. Instead of searching the graph from scratch each time, we maintain a data structure that tracks which connected component each node belongs to. When an edge arrives, we merge the two components (Union). When a query arrives, we check if two nodes have the same representative (Find). The entire chapter is about making these two operations as fast as possible.

The simulation below shows the difference. On the left, edges arrive one at a time, forming connected components. On the right, connectivity queries are answered. Watch how naive BFS/DFS restarts from scratch each query, while Union-Find answers instantly.

Connected Components: Naive vs Union-Find

Click "Add Edge" to connect random nodes. Click "Query" to test if two random nodes are connected. Watch the step counts.

Notice the pattern: as the graph grows, BFS/DFS queries get slower (more edges to traverse). Union-Find stays flat — just a few pointer hops, regardless of graph size. For dynamic connectivity problems where edges arrive over time, Union-Find is the right tool.

Operation	BFS/DFS	Union-Find (optimized)
Add edge	O(1) (just append)	O(α(n)) ≈ O(1)
Query: same component?	O(V + E)	O(α(n)) ≈ O(1)
Total for m operations	O(m · (V+E))	O(m · α(n))

For n = 1,000,000 nodes and m = 10,000,000 operations, BFS/DFS might take 10¹³ steps. Union-Find takes about 4 × 10⁷. That is a six-order-of-magnitude speedup. This is why Union-Find appears in Kruskal's MST algorithm, network connectivity, image segmentation, and dozens of interview problems.

Where Union-Find shines vs. BFS/DFS. BFS/DFS computes connected components from scratch each time — it does not "remember" previous computations. Union-Find is incremental: it updates its internal state as edges arrive, so each query leverages all previous work. Think of BFS/DFS as a tourist who retraces the entire city map for each question, while Union-Find is a local who continuously updates their mental model of the neighborhood.

Here is the roadmap: we will build disjoint sets from scratch. First the abstract interface (Chapter 1), then a linked-list implementation with its painful O(n) unions (Chapter 1-2), then the forest representation that changes everything (Chapter 3), union by rank (Chapter 4), path compression (Chapter 5), the mind-bending inverse Ackermann analysis (Chapter 6), real-world applications including a percolation simulator (Chapter 7), and interview patterns (Chapter 8-9).

The Data Flow

Let us be concrete about what flows through Union-Find. The input is a stream of operations, each one of three types:

MakeSet(x) → creates element x in its own set → returns nothing
Find(x) → input: element x → output: representative element (an integer)
Union(x, y) → input: two elements → output: nothing (side effect: merge sets)
// Storage: parent[] array of n integers + rank[] array of n integers = O(n) space

The key invariant: at any point, two elements x and y are in the same set if and only if Find(x) == Find(y). This is the only thing Union-Find guarantees. It does not tell you the path between x and y, or the size of the component (without augmentation), or which elements are in a given component. It answers one question: same or different?

This simplicity is its power. By narrowing the interface to a single yes/no question, we can answer it in nearly constant time. A data structure that tried to do more (maintain actual component lists, support deletion, answer distance queries) would necessarily be slower.

Augmenting Union-Find

While the basic interface is minimal, you can augment Union-Find to track additional information. Common augmentations:

Augmentation	How	Use case
Component size	Maintain size[] array, update in Union	"How big is this group?"
Component count	Maintain count variable, decrement in Union	"How many groups?"
Min/Max element	Track per-component min/max at root	"Smallest ID in this group?"
Component sum	Sum values at root, update in Union	"Total weight of group?"

All augmentations maintain O(α(n)) per operation because they only add O(1) bookkeeping to Union. The key constraint: the augmented information must be mergeable in O(1). Size: sum two sizes. Count: decrement by 1. Min: take the smaller. These are all O(1) merge operations. If merging the auxiliary data takes O(k), then Union becomes O(k + α(n)).

Concept check: You have a graph where edges arrive one at a time, and between edge arrivals you must answer "are nodes u and v connected?" Which approach is best?

Run BFS from u each time — it always gives the correct answer Precompute all connected components once with DFS, then answer queries in O(1) Use Union-Find: Union when an edge arrives, Find to answer queries — nearly O(1) per operation even as edges keep arriving Use an adjacency matrix and check if matrix[u][v] == 1

Chapter 1: Disjoint-Set Operations

Before we build anything, let us define exactly what we need. A disjoint-set data structure manages a collection S = {S₁, S₂, ..., S_k} of disjoint sets. "Disjoint" means no element belongs to more than one set: S_i ∩ S_j = ∅ for i ≠ j. Each set has a representative — a distinguished member that identifies the set. Which member is the representative does not matter, as long as asking twice for the same (unmodified) set gives the same answer.

Three operations define the interface:

MakeSet(x)

Create a new set containing only x. The representative is x itself. Requires: x is not already in any set.

↓

Find(x)

Return the representative of the set containing x. Two elements are in the same set if and only if Find(x) == Find(y).

↓

Union(x, y)

Merge the two sets containing x and y into one. Destroys the old sets, creates one combined set with a single representative.

That is the entire interface. Three operations. The power comes from implementing them efficiently.

Before we build anything, let us trace through a concrete example to make the interface tangible. Suppose we are tracking friendships among 6 people (0 through 5):

MakeSet(0), MakeSet(1), ..., MakeSet(5)
// State: {0} {1} {2} {3} {4} {5} — six singleton sets

Union(0, 1) // State: {0,1} {2} {3} {4} {5}
Union(2, 3) // State: {0,1} {2,3} {4} {5}
Find(0) == Find(1)? // YES — both in set {0,1}
Find(0) == Find(2)? // NO — different sets
Union(0, 3) // State: {0,1,2,3} {4} {5}
Find(0) == Find(2)? // YES — now in same set (transitively via 0-1 and 2-3 and 0-3)
Find(4) == Find(5)? // NO — 4 and 5 still in separate singleton sets

Notice the transitive property: we never explicitly connected 0 and 2, but after Union(0,1), Union(2,3), and Union(0,3), they end up in the same set. Union-Find handles transitivity automatically — that is the whole point.

Now let us build it. We start with the simplest possible implementation: linked lists.

Linked-List Representation

Each set is a linked list. Every node stores: (1) the element value, (2) a pointer to the next node, and (3) a pointer back to the head of the list (the representative). The list also has a tail pointer for appending.

MakeSet(x): create a single-node list. x points to itself as the head. O(1).

Find(x): follow x's head pointer. Return the head. O(1).

Union(x, y): append one list to the other. Say we append y's list to x's list. We set x's tail to point to y's head, then update every node in y's list to point to x's head as the new representative. This is the problem: updating all those head pointers takes O(|y's list|) time.

python
class LinkedListNode:
    def __init__(self, val):
        self.val = val
        self.next = None
        self.head = self       # representative pointer

class LinkedListDisjointSet:
    def __init__(self):
        self.nodes = {}            # val -> node
        self.tails = {}            # head_val -> tail node
        self.sizes = {}            # head_val -> set size

    def make_set(self, x):
        node = LinkedListNode(x)
        self.nodes[x] = node
        self.tails[x] = node       # tail is itself
        self.sizes[x] = 1

    def find(self, x):
        return self.nodes[x].head.val  # O(1) — follow head pointer

    def union(self, x, y):
        hx, hy = self.find(x), self.find(y)
        if hx == hy: return       # already same set
        # Append y's list to x's list
        self.tails[hx].next = self.nodes[hy]
        # Update ALL head pointers in y's list — THIS IS SLOW
        cur = self.nodes[hy]
        updates = 0
        while cur:
            cur.head = self.nodes[hx]
            cur = cur.next
            updates += 1          # O(|y's list|) pointer updates
        self.tails[hx] = self.tails[hy]
        self.sizes[hx] += self.sizes[hy]

The Worst Case

The worst case is devastating. Suppose we create n singleton sets and then union them sequentially: Union(1,2), Union(2,3), Union(3,4), ..., Union(n-1,n). If we always append the longer list to the shorter one (always updating the single-element set's pointer), each union is O(1). But that is the best case! Consider instead: Union(1,2), Union(1,3), Union(1,4), ..., Union(1,n) where we always append x's list (the big one) to y's list (the small one).

In this adversarial sequence, after Union(1,2) we have list [2,1] (updated 1 pointer). After Union(1,3) we append [2,1] to [3], updating 2 pointers. After Union(1,4) we append [3,2,1] to [4], updating 3 pointers. The i-th union updates i pointers. The total cost is 1 + 2 + 3 + ... + (n-1) = Θ(n²).

Let us compute this concretely. For n = 1000: the naive approach might perform up to 1 + 2 + ... + 999 = 499,500 pointer updates. For n = 1,000,000: up to ~5 × 10¹¹ pointer updates. At 1 nanosecond per update, that is 500 seconds — almost 10 minutes for a million elements. Unacceptable.

The costly union. The linked-list representation gives O(1) Find, but Union can be O(n) because we must walk the entire appended list to update head pointers. For m operations on n elements, the worst case is Θ(n²). Can we do better? The answer is in Chapter 2: always append the shorter list to the longer one.

Counting the Cost

Let us count precisely for the adversarial sequence with n = 8. Create sets {0}..{7}, then union in the worst order:

Union(0,1): append {0} to {1}, update 1 pointer. List: [1,0]
Union(0,2): Find(0)→rep is 1. append {1,0} to {2}, update 2 pointers. List: [2,1,0]
Union(0,3): Find(0)→rep is 2. append {2,1,0} to {3}, update 3 pointers. List: [3,2,1,0]
Union(0,4): 4 pointer updates. List: [4,3,2,1,0]
Union(0,5): 5 pointer updates. List: [5,4,3,2,1,0]
Union(0,6): 6 pointer updates. List: [6,5,4,3,2,1,0]
Union(0,7): 7 pointer updates. List: [7,6,5,4,3,2,1,0]
// Total: 1+2+3+4+5+6+7 = 28 pointer updates for just 8 elements!

For n elements, the worst case total is n(n-1)/2 = Θ(n²). The average cost per union is Θ(n). This is why we need the weighted-union heuristic (Chapter 2) — it ensures we always pay the smaller cost by appending the shorter list to the longer one.

The simulation below shows the linked-list union in action. Watch how appending a long list forces a walk through every node to update representatives.

Linked-List Union Animation

Click "Union" to merge two sets. Watch the head-pointer updates cascade through the appended list.

Click Union to merge sets one by one. Total pointer updates: 0

Concept check: In the linked-list representation, Find(x) is O(1) because every node stores a pointer to the list head. Why is Union(x,y) potentially O(n)?

Because finding the tail of the list takes O(n) traversal Because every node in the appended list must have its head pointer updated to the new representative Because we must sort the list after appending

Chapter 2: Weighted Union (Linked List)

The naive linked-list union is O(n) per operation because we might append a long list to a short one, updating many pointers for no good reason. The fix is simple: always append the shorter list to the longer one. This is called the weighted-union heuristic.

Why does this help? Think about it from a single element's perspective. Each time your head pointer gets updated, it means your list was the shorter one and got appended to a longer list. After the merge, your new list is at least twice as long as your old list (because the other list was at least as long). So your list length at least doubles with every pointer update.

Starting from a list of length 1, how many times can it double before reaching length n? At most log₂ n times. So each element's head pointer is updated at most log₂ n times across all operations. For n elements, the total number of pointer updates across all unions is at most n · log₂ n.

The amortized bound. With the weighted-union heuristic, a sequence of m MakeSet, Union, and Find operations on n elements takes O(m + n log n) total time. Each Find is still O(1). The amortized cost per Union drops from O(n) to O(log n). The proof is elegant: each element's pointer changes at most log n times because each change at least doubles the element's set size, and a set can double at most log n times before hitting size n.

// Proof sketch: pointer update count for element x
After update #1: x's set size ≥ 2¹ = 2
After update #2: x's set size ≥ 2² = 4
After update #k: x's set size ≥ 2^k
But set size ≤ n, so 2^k ≤ n, thus k ≤ log₂ n
// Total updates across all n elements: n · log n

Let us trace through a concrete example. Start with 8 singleton sets {1}, {2}, ..., {8}. Perform these unions:

Union(1,2): append {2} to {1} → {1,2}, 1 update
Union(3,4): append {4} to {3} → {3,4}, 1 update
Union(1,3): sizes equal (2=2), append {3,4} to {1,2} → {1,2,3,4}, 2 updates
Union(5,6): append {6} to {5} → {5,6}, 1 update
Union(7,8): append {8} to {7} → {7,8}, 1 update
Union(5,7): sizes equal (2=2), append {7,8} to {5,6} → {5,6,7,8}, 2 updates
Union(1,5): sizes equal (4=4), append {5,6,7,8} to {1,2,3,4} → all 8, 4 updates
// Total: 1+1+2+1+1+2+4 = 12 updates. Naive worst case would be 1+2+3+4+5+6+7 = 28.

With weighted union: 12 updates. Naive sequential: up to 28. The savings grow dramatically with n. For n = 1,000,000: naive worst case is ~5 × 10¹¹, weighted union is ~2 × 10⁷. That is a 25,000x improvement.

Theorem (CLRS 21.1). Using the weighted-union heuristic, a sequence of m MakeSet, Union, and Find operations, n of which are MakeSet operations, takes O(m + n log n) time.

The proof is a beautiful counting argument. Instead of analyzing individual operations, we ask: how many times total can head pointers be updated? Each time element x's pointer is updated, x's set at least doubles in size (because it was the shorter set being appended to a longer one). Starting from size 1, a set can double at most log₂ n times before reaching size n. So each of the n elements has its pointer updated at most log₂ n times. Total pointer updates ≤ n log₂ n. Adding m constant-time Find operations gives O(m + n log n).

python
def union_weighted(self, x, y):
    hx, hy = self.find(x), self.find(y)
    if hx == hy: return
    # Always append SHORTER to LONGER
    if self.sizes[hx] < self.sizes[hy]:
        hx, hy = hy, hx           # swap so hx is the larger set
    # Append hy's list to hx's list
    self.tails[hx].next = self.nodes[hy]
    cur = self.nodes[hy]
    while cur:
        cur.head = self.nodes[hx]  # update head pointer
        cur = cur.next
    self.tails[hx] = self.tails[hy]
    self.sizes[hx] += self.sizes[hy]

O(m + n log n) is good — dramatically better than the naive Θ(n²). But we can do much better still. The linked-list representation is fundamentally limited by a structural constraint: every node must store a direct pointer to the list head. This makes Find O(1) but forces Union to walk the entire appended list to update those pointers. There is no way around it with this representation.

What if we flipped the trade-off entirely? Instead of every node pointing directly to the representative (making Find trivial but Union expensive), what if nodes pointed to their parent in a tree? Union would just change one parent pointer. Find would walk up the tree to the root. The root is the representative.

This reversal of the cost structure is the key insight. It sounds like we are trading one problem for another — and initially, we are. But the tree representation is amenable to optimizations (union by rank and path compression) that the linked-list representation simply cannot support. These optimizations bring the amortized cost down from O(log n) to O(α(n)). That is the forest representation, coming in Chapter 3.

Weighted vs Naive Union: Pointer Update Count

Watch the cumulative pointer updates as 16 elements are merged. Weighted union (teal) vs naive (orange).

Click to compare pointer updates: naive vs weighted union.

Concept check: With the weighted-union heuristic, what is the maximum number of times a single element's head pointer can be updated across ALL union operations?

n times — once per union log₂ n times — because each update at least doubles the element's set size n/2 times — each element is in the shorter half 1 time — the pointer only changes once

Chapter 3: Forest Representation

Linked lists have an annoying asymmetry: Find is O(1) because every node stores a direct pointer to the head, but Union is expensive because we must update all those pointers. What if we used a different representation that made Union trivial?

Here is the idea: represent each set as a rooted tree. Each node stores a parent pointer (not a head pointer). The root of the tree is the representative — its parent pointer points to itself. A collection of such trees is called a disjoint-set forest.

MakeSet(x): create a tree with a single node x. x is its own parent. O(1).

Find(x): follow parent pointers from x up to the root. Return the root. O(depth of x).

Union(x, y): find the roots of x and y. Make one root the child of the other. O(1) after the two Finds.

python
class NaiveForest:
    def __init__(self, n):
        self.parent = list(range(n))  # parent[i] = i means i is a root

    def make_set(self, x):
        self.parent[x] = x            # x is its own parent (root)

    def find(self, x):
        while self.parent[x] != x:   # walk up to root
            x = self.parent[x]
        return x                      # O(depth)

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx != ry:
            self.parent[ry] = rx      # make ry child of rx — O(1)!

Union is now O(1) (just change one pointer) — a dramatic improvement over linked lists. But we have shifted the cost: Find is now O(depth) instead of O(1). In the worst case, trees can degenerate into long chains. If we union 1→2, then 2→3, then 3→4, ..., we get a chain of depth n-1. Find on the bottom element walks the entire chain: O(n).

The trade-off shift. Linked lists: O(1) Find, O(n) Union. Naive forest: O(1) Union, O(n) Find. Neither is satisfactory alone. But the forest representation has a crucial advantage: we can add two heuristics — union by rank (Chapter 4) and path compression (Chapter 5) — that bring Find down to effectively O(1) amortized while keeping Union at O(1). This is impossible with linked lists.

Implementation: It Is Just an Array

The most striking thing about the forest representation is how little memory it uses. The entire data structure is a single array:

python
parent = list(range(n))  # parent[i] = i means i is a root
# That's the entire data structure. One array. n integers.
# Find: chase parent pointers. Union: set one root's parent.

No pointers, no nodes, no dynamic memory allocation, no linked-list traversal, no garbage collection. Just an array of integers that fits in L1 cache for reasonable n. This is why Union-Find is so fast in practice — the memory access pattern is cache-friendly. Each Find chases parent pointers, which are all in the same contiguous array.

Compare this to the linked-list implementation: each node is a separate object on the heap, connected by pointers that can scatter across memory. Cache misses dominate the runtime. The forest representation, stored in a flat array, avoids this entirely.

Memory Layout Comparison

// Linked list: scattered heap objects
Node@0x1A00 → {val:0, next:0x7F20, head:0x1A00}
Node@0x7F20 → {val:1, next:0x3B40, head:0x1A00}
Node@0x3B40 → {val:2, next:null, head:0x1A00}
// Three objects at random heap locations. Each pointer chase = potential L1 cache miss (~100 cycles)

// Forest: contiguous array
parent = [0, 0, 1] // 3 integers, 12 bytes, fits in one cache line
// Find(2): parent[2]=1, parent[1]=0, parent[0]=0 → root=0. Two array lookups, same cache line.

On modern CPUs, an L1 cache miss costs ~4 ns while a cache hit costs ~1 ns. For a tree of depth 20, the linked-list version might cause 20 cache misses (80 ns), while the array version likely hits in L1 for the entire traversal (20 ns). This 4x constant-factor improvement compounds over millions of operations.

This is also why Union-Find is a favorite in competitive programming: the entire implementation is 15-20 lines of code, uses two arrays, and beats any alternative for dynamic connectivity problems.

Trace: Forest Operations Step by Step

Let us trace all three operations on a concrete example with n = 6 elements, showing exactly what happens to the parent array:

// Initial: MakeSet for all 6 elements
parent = [0, 1, 2, 3, 4, 5] // every element is its own root

// Union(0, 1): Find(0)=0, Find(1)=1. Make 1 child of 0.
parent = [0, 0, 2, 3, 4, 5] // parent[1] changed from 1 to 0

// Union(2, 3): Find(2)=2, Find(3)=3. Make 3 child of 2.
parent = [0, 0, 2, 2, 4, 5] // parent[3] changed from 3 to 2

// Union(4, 5): Find(4)=4, Find(5)=5. Make 5 child of 4.
parent = [0, 0, 2, 2, 4, 4] // parent[5] changed from 5 to 4

// Union(1, 3): Find(1)→parent[1]=0→root. Find(3)→parent[3]=2→root. Make 2 child of 0.
parent = [0, 0, 0, 2, 4, 4] // parent[2] changed from 2 to 0

// Find(3): parent[3]=2, parent[2]=0, parent[0]=0→root. Answer: 0
// WITH path compression: parent[3] = 0
parent = [0, 0, 0, 0, 4, 4] // parent[3] compressed from 2 to 0

// Union(3, 5): Find(3)=0 (instant after compression). Find(5)→parent[5]=4→root.
parent = [0, 0, 0, 0, 0, 4] // parent[4] changed from 4 to 0. All in one set!

Notice how path compression on Find(3) saved work for the subsequent Union(3,5) — Find(3) returned 0 immediately instead of traversing 3→2→0. This is the self-optimizing property: each Find makes future operations cheaper.

The simulation below lets you build a disjoint-set forest interactively. Create sets, union them, and run Find to see the tree structure. Notice how naive unions can create deep chains.

Interactive Disjoint-Set Forest

Click nodes to select, then "Union" to merge, or "Find" to highlight the path to the root. Watch the tree structure evolve.

Add nodes, then click on canvas to select them for Union/Find.

Play with the simulation. Create 8 nodes, then union them in a chain: union(0,1), union(1,2), union(2,3), etc. — always attaching the new node at the bottom. You will see the tree degenerate into a linked list. Now Find on node 7 walks 7 edges. This motivates union by rank: keep the trees shallow.

Why the Forest Wins

The forest representation looks worse at first glance: Find is O(depth) instead of O(1). But it has two critical advantages that the linked-list representation lacks.

First, Union is trivially O(1) — just change one pointer. No walking through a list. The linked list must walk the entire appended list to update head pointers; the forest just redirects one root.

Second, and more importantly, the forest representation is amenable to optimization. We can add two heuristics — union by rank and path compression — that together bring the amortized cost down to O(α(n)) per operation. This is impossible with linked lists because their structure is too rigid. In a linked list, every node must point to the head; there is no room for cleverness. In a tree, the path from node to root is flexible — we can reshape it.

The entire data structure fits in a single array. For n elements, we need exactly one array of size n: parent[]. Element i's parent is parent[i]. If parent[i] == i, then i is a root. That is it. No pointers, no linked list nodes, no dynamic memory allocation. Just an array of integers. This is why Union-Find is so cache-friendly and fast in practice.

// The entire data structure: one array
parent = [0, 0, 1, 1, 4, 4, 5, 6]
// parent[0]=0 → 0 is a root (representative)
// parent[1]=0 → 1's parent is 0
// parent[2]=1 → 2's parent is 1 (grandparent is 0)
// parent[4]=4 → 4 is a root (different set)
// Find(2): 2→1→0 (root), returns 0
// Find(7): 7→6→5→4 (root), returns 4
// Connected(2,7)? Find(2)=0 ≠ Find(7)=4 → NO

Concept check: In the forest representation, Union is O(1) because we just change one parent pointer. Why can Find be as bad as O(n)?

Because without any balancing, unions can create a tree that degenerates into a chain of depth n-1, and Find must walk from leaf to root Because Find must visit every node in the set Because the tree must be rebalanced after each Find

Chapter 4: Union by Rank

Chapter 3 showed that naive forest unions can create chains of depth n-1. The problem is tree depth: if we carelessly make one root the child of the other, we might attach a tall tree under a short tree, increasing maximum depth unnecessarily. Imagine a tree of height 10 being attached under a tree of height 3 — the result has height 11. But if we had attached the height-3 tree under the height-10 tree, the result would still have height 10. Same merge, but the max depth difference is 1 vs 11.

The fix: always attach the shorter tree under the root of the taller tree. This is called union by rank.

We store a rank for each node — an upper bound on the height of the subtree rooted at that node. Initially, every node has rank 0. When we union two trees:

Case 1: rank[rx] < rank[ry]

Make rx a child of ry. Ranks unchanged — the taller tree just gained a shorter subtree, so its height does not increase.

↓

Case 2: rank[rx] > rank[ry]

Make ry a child of rx. Ranks unchanged — same reasoning.

↓

Case 3: rank[rx] == rank[ry]

Make either the child of the other. Increment the new root's rank by 1 — two trees of equal height joined, so the result is one taller.

python
class UnionByRank:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n          # upper bound on height

    def find(self, x):
        while self.parent[x] != x:
            x = self.parent[x]
        return x

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry: return
        if self.rank[rx] < self.rank[ry]:
            self.parent[rx] = ry       # attach shorter under taller
        elif self.rank[rx] > self.rank[ry]:
            self.parent[ry] = rx
        else:
            self.parent[ry] = rx
            self.rank[rx] += 1        # only case where rank increases

The Key Guarantee

With union by rank, a tree with root rank r has at least 2^r nodes. Why? Prove it by induction. Base case: rank 0 tree has 1 = 2⁰ nodes. Inductive step: rank r is only created by merging two rank r-1 trees. Each has at least 2^r-1 nodes. Together: 2^r-1 + 2^r-1 = 2^r nodes.

Since a tree with n nodes has at most 2^r ≤ n, the maximum rank is log₂ n. Therefore Find takes O(log n) — the tree height is at most logarithmic. This alone brings the total time for m operations down to O(m log n).

Let us verify with concrete numbers. For n = 1,000,000: max rank ≤ log₂ 10⁶ ≈ 20. Every Find visits at most 20 nodes. For 10 million operations: at most 200 million node visits. At ~1 ns per node visit: about 0.2 seconds. Compare to naive forest: up to 10¹³ node visits, or about 10,000 seconds. Union by rank alone gives a 50,000x speedup.

But wait — it gets even better. With rank, the trees are not just shallow but also wide. A rank-4 tree with 16 nodes has at most 4 levels, so on average each level has 4 nodes. This means most elements are close to the root already. Path compression (next chapter) will exploit this by moving even the few deep nodes directly to the root.

Rank is not height. After path compression (Chapter 5), the actual height of a node may be less than its rank. Rank is an upper bound that was accurate at the time of the union. We do not bother updating ranks during path compression — the bound is still valid, and updating would add complexity for no asymptotic gain. Think of rank as a "frozen height estimate."

The simulation below shows the dramatic difference between naive union (which can degenerate to a chain) and union by rank (which stays balanced). Both start with 16 nodes and perform the same unions.

Naive Union vs Union by Rank

Left: naive union creates chains. Right: union by rank keeps trees shallow. Watch the max depth.

Naive max depth: 0 | Rank max depth: 0

After all 15 unions on 16 elements, the naive tree can reach depth 15 (a chain). The union-by-rank tree has depth at most 4 (since log₂ 16 = 4). That is the difference between O(n) and O(log n) per Find.

Strategy	Find	Union	m operations total
Naive forest	O(n)	O(1)	O(m · n)
Union by rank	O(log n)	O(log n)*	O(m log n)

*Union calls Find twice, so its cost is dominated by Find.

O(m log n) is a huge improvement over O(m · n). But we are not done. Path compression will bring this down to O(m · α(n)), which is effectively O(m).

Rank Properties (Important for Proofs)

Several properties of ranks are worth memorizing — they appear in the full amortized analysis and in interview discussions:

Property 1: Ranks only increase along paths to the root. If y is an ancestor of x, then rank[y] > rank[x]. This is because a node's rank only increases when it is a root during a Union operation, and once a node becomes a non-root, its rank never changes again.

Property 2: A node with rank r has at least 2^r descendants. We proved this above by induction. Consequence: at most n/2^r nodes can have rank r. For rank r = log n, there is at most one such node.

Property 3: The maximum rank is at most ⌊log₂ n⌋. Since a rank-r tree has at least 2^r nodes and we have only n nodes total, r ≤ log₂ n.

Property 4: Ranks never decrease. Once assigned, a node's rank only increases (and only while it is a root). Path compression changes parent pointers but never touches ranks. This is a deliberate design choice — maintaining accurate heights after compression would cost more than it saves.

Union by Size (Alternative)

Instead of rank, you can track the size (number of elements) of each tree. Attach the smaller tree under the larger tree's root. This gives the same O(log n) height bound. Some implementations prefer size because it is useful for answering "how many elements are in this set?" without extra traversal.

python
class UnionBySize:
    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n          # number of elements in tree

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry: return False
        if self.size[rx] < self.size[ry]:
            rx, ry = ry, rx          # rx is now the larger tree
        self.parent[ry] = rx
        self.size[rx] += self.size[ry]
        return True

    def get_size(self, x):
        return self.size[self.find(x)]  # size of x's component

In interviews, union by size is slightly more practical because you get component sizes for free. In CLRS, the formal analysis uses rank. Both achieve O(m · α(n)) when combined with path compression.

Hand Calculation: Building a Rank-4 Tree

Let us trace the construction of a tree with the maximum possible rank for 16 elements. We need to merge equal-rank trees at every step:

// Round 1: merge 16 singletons into 8 pairs (all rank 1)
Union(0,1) → {0,1} rank=1 Union(2,3) → rank=1 ... Union(14,15) → rank=1

// Round 2: merge 8 pairs into 4 quads (all rank 2)
Union(0,2) → {0,1,2,3} rank=2 Union(4,6) → rank=2 ...

// Round 3: merge 4 quads into 2 octets (all rank 3)
Union(0,4) → {0..7} rank=3 Union(8,12) → {8..15} rank=3

// Round 4: merge 2 octets into 1 set of 16 (rank 4)
Union(0,8) → {0..15} rank=4

// Result: rank 4 = log₂(16). Maximum height = 4. Exactly as predicted.

This is the worst case for union by rank — every merge is between equal-rank trees. In practice, many merges are between unequal-rank trees (which do not increase the rank), so actual depths are usually much less than log n.

Concept check: With union by rank, a tree whose root has rank 5 contains at least how many nodes?

5 nodes 10 nodes 32 nodes (2⁵) — because rank r requires merging two rank-(r-1) trees, each with at least 2^r-1 nodes 16 nodes

Chapter 5: Path Compression

We have union by rank keeping trees at O(log n) height. Good, but not great. For n = 1,000,000, log₂ n ≈ 20. Every Find walks up to 20 hops. Over 10 million operations, that is 200 million pointer traversals. Can we do better?

Yes. Path compression makes trees flatter over time. The idea is beautifully simple: during Find(x), after walking up to the root, make every node along the path point directly to the root. Future Finds on any of these nodes will take O(1).

Think of it like asking for directions. You start at node x and ask: "Who is my representative?" You walk through intermediate nodes, each saying "I don't know, ask my parent." You finally reach the root. On your way back, you tell every node you visited: "The root is directly above you. Skip the middlemen." Next time anyone asks, they jump straight to the root.

python
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        # Path compression: make every node point directly to root
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # recursive
        return self.parent[x]

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry: return
        if self.rank[rx] < self.rank[ry]:
            self.parent[rx] = ry
        elif self.rank[rx] > self.rank[ry]:
            self.parent[ry] = rx
        else:
            self.parent[ry] = rx
            self.rank[rx] += 1

The recursive Find is elegant: it walks to the root, then on the way back (unwinding recursion), sets every node's parent to the root. Here is the iterative version for those who prefer it (and for avoiding stack overflow on very deep trees):

python
def find_iterative(self, x):
    # Phase 1: find root
    root = x
    while self.parent[root] != root:
        root = self.parent[root]
    # Phase 2: compress — point everyone to root
    while self.parent[x] != root:
        next_x = self.parent[x]
        self.parent[x] = root
        x = next_x
    return root

The combined power. Union by rank alone: O(m log n). Path compression alone: O(m log n) (via a different analysis). But together, they achieve O(m · α(n)), where α is the inverse Ackermann function. For any conceivable input size, α(n) ≤ 4. This is one of the most beautiful results in algorithm analysis: two simple heuristics, each giving O(log n) alone, combine to give effectively O(1).

Tracing Path Compression Step by Step

Let us trace through a detailed example. Start with this tree (built by union by rank):

parent = [0, 0, 1, 2, 3, 4, 5, 6]
// Tree: 0 ← 1 ← 2 ← 3 ← 4 ← 5 ← 6 ← 7 (a chain of depth 7)

Find(7):
Step 1: 7 → parent is 6
Step 2: 6 → parent is 5
Step 3: 5 → parent is 4
Step 4: 4 → parent is 3
Step 5: 3 → parent is 2
Step 6: 2 → parent is 1
Step 7: 1 → parent is 0
Step 8: 0 → parent is 0 (ROOT FOUND!)

// Now compress: set parent of 7,6,5,4,3,2,1 all to 0
parent = [0, 0, 0, 0, 0, 0, 0, 0]
// Tree: 0 ← {1,2,3,4,5,6,7} (star shape, max depth 1)

Find(7) again:
Step 1: 7 → parent is 0 (ROOT!) // Done in 1 step instead of 7!

That first Find(7) cost 7 steps but restructured the tree so that ALL subsequent Finds on ANY of those nodes cost just 1 step. The upfront work pays for itself many times over. This is the essence of amortized analysis: individual operations may be expensive, but the average over many operations is cheap.

The simulation below is the showcase. Build a tree, then run Find on deep nodes. Watch path compression flatten the tree in real time. Every node on the path jumps directly to the root. Subsequent Finds on those nodes are instant.

Path Compression — THE Showcase

Build a deep tree, then click "Find" on a deep node. Watch all nodes on the path jump to point directly at the root. Run Find again — it is instant.

Build a tree first, then run Find to see compression.

Path Compression Variants

Full path compression (every node points to root) is the strongest. There are two weaker variants that are simpler but nearly as effective:

Path splitting: during Find, make every node point to its grandparent. Each node on the path moves up one level. No second pass needed — done in a single upward walk.

Path halving: during Find, make every other node point to its grandparent. Half the nodes skip a level. Even simpler, still achieves O(m · α(n)).

python
def find_path_splitting(self, x):
    while self.parent[x] != x:
        self.parent[x] = self.parent[self.parent[x]]  # point to grandparent
        x = self.parent[x]                              # advance to (old) grandparent
    return x

def find_path_halving(self, x):
    while self.parent[x] != x:
        self.parent[x] = self.parent[self.parent[x]]  # point to grandparent
        x = self.parent[x]                              # skip ahead (halving)
    return x

Wait — path splitting and path halving look identical in code? Yes! Both do "point to grandparent, then advance." The conceptual difference is: path splitting updates every node (making each point to its grandparent), while path halving updates every other node. But since "advance to the new parent" after redirecting to grandparent is the same in both cases, the code is literally the same loop. The formal analysis differs slightly, but both achieve O(m · α(n)).

In competitive programming and interviews, use the recursive version with full path compression. It is the shortest to write and the easiest to get right:

// The one-liner that every competitive programmer memorizes:
def find(self, x):
    if self.parent[x] != x:
        self.parent[x] = self.find(self.parent[x])
    return self.parent[x]
// That's it. Three lines. O(α(n)) amortized.

Why Both Heuristics Together?

Here is the subtle point: union by rank alone gives O(log n) per operation. Path compression alone also gives O(log n) amortized per operation. So why does combining them give O(α(n)) instead of O(log n)?

The intuition: union by rank ensures that trees never get too deep in the first place (O(log n) height). Path compression then flattens these already-shallow trees even further. After a few Find operations, most paths are nearly flat. The two heuristics are synergistic: rank keeps trees manageable so that compression has less work to do, and compression prevents the modest depth from being re-traversed repeatedly.

Tarjan and van Leeuwen proved (1984) that the amortized cost per operation is Θ(α(n)). Not O(α(n)), but Θ(α(n)) — tight both above and below. This is one of the most celebrated results in the analysis of algorithms.

A Visual Summary of the Journey

Let us summarize the entire progression from naive to optimal with a concrete scenario: 1 million elements, 10 million operations.

Strategy	Worst-case total operations	Time at 1ns/op
Naive linked list	~5 × 10¹¹	~500 seconds
Weighted union (linked list)	~2 × 10⁸	~0.2 seconds
Naive forest	~10¹³	~10,000 seconds
Union by rank	~2 × 10⁸	~0.2 seconds
Rank + path compression	~4 × 10⁷	~0.04 seconds

From 10,000 seconds down to 0.04 seconds. A factor of 250,000. That is the power of the right data structure with the right heuristics. And the implementation is 20 lines of Python.

Strategy	Find (amortized)	Total for m ops
Naive forest	O(n)	O(m · n)
Union by rank only	O(log n)	O(m log n)
Path compression only	O(log n) amortized	O(m log n)
Both combined	O(α(n))	O(m · α(n))

Concept check: After calling Find(x) with full path compression on a node x at depth 5, what is x's depth on the next Find(x) call?

1 — path compression made x a direct child of the root, so the next Find is a single hop 5 — Find does not modify the tree 3 — path compression halves the depth 0 — x becomes the root

Chapter 6: The Inverse Ackermann Function

We keep saying O(α(n)) is "effectively O(1)." Let us make this precise. What is α, and why is it so absurdly slow-growing?

Start with the Ackermann function A(i, j), defined recursively:

A(0, j) = j + 1
A(i, 0) = A(i-1, 1) for i ≥ 1
A(i, j) = A(i-1, A(i, j-1)) for i, j ≥ 1

Level 0 is just "add 1." Level 1 turns out to be "add 2" (roughly). Level 2 is "multiply by 2" (roughly). Level 3 is "exponentiate" (roughly). Level 4 is "tower of powers" (tetration). Each level applies the previous level's operation a number of times, creating a hierarchy of increasingly explosive growth.

Let us compute some values by hand:

A(0, j) = j + 1 // level 0: successor
A(1, j) = j + 2 // level 1: adds 2
A(2, j) = 2j + 3 // level 2: roughly doubles
A(3, j) = 2^j+3 - 3 // level 3: exponential
A(4, 0) = A(3, 1) = 2⁴ - 3 = 13
A(4, 1) = A(3, A(4, 0)) = A(3, 13) = 2¹⁶ - 3 = 65533
A(4, 2) = A(3, 65533) = 2⁶⁵⁵³⁶ - 3 // a number with ~19,729 digits
A(4, 3) = 2^{2⁶⁵⁵³⁶} - 3 // ... incomprehensibly large

A(4, 2) already has about 19,729 digits. A(4, 4) is so large that writing it in decimal would require more digits than there are atoms in the observable universe — by an incomprehensible margin. The Ackermann function does not merely grow fast. It defines a boundary beyond which "fast-growing" needs new words.

The inverse. The inverse Ackermann function α(n) is the smallest i such that A(i, i) ≥ n. Since A(4, 4) is unimaginably larger than 10⁸⁰ (the number of atoms in the universe), we have α(n) ≤ 4 for any n we could ever encounter. That means O(α(n)) = O(1) for all practical purposes. The Union-Find structure with both heuristics takes O(m · α(n)) total time — which is, for all intents and purposes, O(m).

Let us be very precise about the formal definition. In CLRS, the inverse Ackermann function is defined as:

α(n) = min { k ≥ 1 : A_k(1) ≥ log₂ n }

where A_k is the k-th row of the Ackermann-like function defined with specific initial conditions. Different papers use slightly different definitions (some use A(k,k) instead of A_k(1)), but they all agree on the essential property: α grows so slowly that it is effectively constant.

Here is α(n) for various values of n:

n	α(n)	Context
1	0	A single element
2-3	1	A pair or triple
4-7	2	A handful
8-2047	3	Up to 2K elements
2048 to A(4,4)	4	Everything you will ever see
> A(4,4)	5	Will never happen in practice

The gap between α(n) = 4 and α(n) = 5 is absurd. α(n) = 4 covers every number up to A(4,4), which is a tower of 2s that is 65,536 levels tall. To give a sense of scale: a tower of 2s just 5 levels tall is 2^{2^{2^2²}} = 2⁶⁵⁵³⁶, a number with about 19,729 digits. A tower of 2s 65,536 levels tall is... there are no words. The number of digits in the number of digits in the number would itself have more digits than atoms in the observable universe.

Your computer, your data center, every computer that will ever exist — they all live in the α(n) = 4 regime. The difference between O(m · α(n)) and O(m) is, practically speaking, nothing. This is why in most contexts (textbooks, interviews, practical discussions) we say Union-Find is "O(1) amortized" and leave the α(n) as a footnote for the theoretically curious.

The simulation below visualizes the growth of the Ackermann function. Watch how quickly it escapes any reasonable bound.

Ackermann Function Growth

Each level grows incomprehensibly faster than the last. Heights capped for visualization — actual values overflow the universe.

Level i 2

Understanding the Hierarchy

Let us build intuition for why Ackermann grows so fast by thinking about levels of recursion:

Level 0: Addition

A(0, j) = j + 1. One step up from j. Linear growth.

↓ iterate level 0

Level 1: Repeated addition = multiplication-like

A(1, j) ≈ j + 2. Still linear (but applying level 0 repeatedly).

↓ iterate level 1

Level 2: Repeated "level 1" = multiplication

A(2, j) = 2j + 3. Roughly doubling. Polynomial growth.

↓ iterate level 2

Level 3: Repeated doubling = exponentiation

A(3, j) = 2^j+3 - 3. Exponential growth.

↓ iterate level 3

Level 4: Repeated exponentiation = tetration

A(4, j) = 2 ↑↑ (j+3) - 3. Tower of 2s, (j+3) levels high.

Each level applies the previous level's operation a number of times. Level 0 is addition. Iterating addition gives multiplication (level 1). Iterating multiplication gives exponentiation (level 2). Iterating exponentiation gives tetration (level 3). Iterating tetration gives... something so fast that it has no standard name outside of Knuth's up-arrow notation.

The key realization: the Ackermann function exhausts the entire hierarchy of standard operations. No polynomial, exponential, or tower function can keep up with it. And yet its inverse grows so slowly that it barely moves — α(n) = 4 covers more values than you will ever encounter.

This makes the Union-Find bound remarkable: the data structure is nearly optimal (proven by Tarjan's lower bound), and the gap between "optimal" and "O(1)" is so small that it disappears for all practical purposes.

Why Tight?

This bound is not just an upper bound — it is tight. Tarjan proved in 1975 that any sequence of m Union-Find operations on n elements requires Ω(m · α(n)) time in the worst case using pointer-based data structures. The proof constructs an adversarial sequence of operations that forces any pointer-based scheme to do α(n) work per operation on average. You cannot do better within this model. The O(α(n)) per operation is optimal.

For comparison, here is the full hierarchy of how "close to O(1)" various Union-Find strategies get:

Strategy	Amortized per-op	m ops on n elements
Naive linked list	O(n)	O(m · n)
Weighted union (linked list)	O(log n)	O(m + n log n)
Naive forest	O(n)	O(m · n)
Union by rank only	O(log n)	O(m log n)
Path compression only	O(log n) amortized	O(m log n)
Both combined	O(α(n))	O(m · α(n))
Lower bound (Tarjan)	Ω(α(n))	Ω(m · α(n))

Concept check: For n = 10⁸⁰ (atoms in the observable universe), what is α(n)?

4 — because A(4,4) dwarfs 10⁸⁰, so the inverse at 10⁸⁰ is still at most 4 80 — roughly the log 5 — the next level up It depends on the number of operations

Chapter 7: Applications

Union-Find appears everywhere in computer science. Let us survey the major applications, then build the showcase: a percolation simulator.

Kruskal's MST Algorithm

Kruskal's algorithm finds the Minimum Spanning Tree of a weighted graph. Sort edges by weight, then process each edge: if it connects two different components, add it to the MST. If it connects nodes already in the same component, skip it (it would form a cycle). The "same component" check is exactly Union-Find.

python
def kruskal(n, edges):
    edges.sort(key=lambda e: e[2])  # sort by weight
    uf = UnionFind(n)
    mst = []
    for u, v, w in edges:
        if uf.find(u) != uf.find(v):    # different components?
            uf.union(u, v)               # merge them
            mst.append((u, v, w))        # add edge to MST
    return mst
# Time: O(E log E) for sorting + O(E · α(V)) for union-find
# Total: O(E log E) since sorting dominates

Without Union-Find, the cycle check would require a DFS per edge: O(V + E) per edge, O(E(V+E)) total. With Union-Find: O(E · α(V)) for all cycle checks combined — essentially free compared to the sort.

Let us trace Kruskal's on a small example with 5 nodes and 7 edges:

Edges sorted by weight: (0,1,1), (2,3,2), (1,3,3), (0,2,4), (1,2,5), (3,4,6), (2,4,7)

Process (0,1,1): Find(0)=0, Find(1)=1, different → Union(0,1), ADD to MST
Process (2,3,2): Find(2)=2, Find(3)=3, different → Union(2,3), ADD to MST
Process (1,3,3): Find(1)=0, Find(3)=2, different → Union(0,2), ADD to MST
Process (0,2,4): Find(0)=0, Find(2)=0, SAME → SKIP (would create cycle)
Process (1,2,5): Find(1)=0, Find(2)=0, SAME → SKIP
Process (3,4,6): Find(3)=0, Find(4)=4, different → Union(0,4), ADD to MST
// MST complete: 4 edges for 5 nodes. Total weight: 1+2+3+6 = 12

Each cycle check (the "SAME" cases) was detected instantly by Union-Find. Without it, we would need to run BFS/DFS to check if adding edge (0,2) creates a cycle — traversing through 0→1→3→2 to discover they are already connected.

Kruskal's MST with Union-Find

Edges sorted by weight. Green = MST edge. Red dashed = rejected (cycle). Yellow dashed = next candidate.

Click to process edges in weight order.

Dynamic Connectivity

Any problem where elements are grouped over time and you need to query group membership: social networks (friend groups), computer networks (connected machines), image processing (connected pixel regions). Union-Find handles the incremental version perfectly: edges arrive but never leave.

Concrete example: you are monitoring a network of 10,000 servers. Connections come online over time. At any moment, an operator asks: "Can server A reach server B?" With Union-Find, the answer takes O(α(10000)) ≤ 4 pointer hops. Without it, you would need to run BFS/DFS through potentially thousands of network links.

For the fully dynamic version (edges can be added AND deleted), Union-Find does not suffice — you need link-cut trees (Sleator-Tarjan, 1983) for O(log n) per operation, or offline algorithms that process deletions in reverse as insertions.

Image Segmentation

Given an image of W × H pixels, group adjacent pixels with similar colors into regions. The algorithm:

python
def segment_image(pixels, W, H, threshold):
    uf = UnionFind(W * H)
    for y in range(H):
        for x in range(W):
            for dx, dy in [(1,0),(0,1)]:
                nx, ny = x + dx, y + dy
                if nx < W and ny < H:
                    diff = color_distance(pixels[y][x], pixels[ny][nx])
                    if diff < threshold:
                        uf.union(y*W+x, ny*W+nx)
    return uf  # each component = one segment

For a 1920×1080 image, this processes ~4 million pixel pairs. With Union-Find: ~4M × O(α(2M)) operations — effectively 16 million simple pointer operations, done in milliseconds. Without it (using BFS per pair): potentially hours. This is a simplified version of Felzenszwalb's algorithm (2004), which additionally considers the internal variation within segments to set adaptive thresholds.

Equivalence Classes

Given pairs of equivalent items (e.g., "accounts with the same email belong to the same person"), Union-Find groups all transitively equivalent items. If Alice has emails {a, b} and Bob has emails {b, c}, they share email b, so they are the same person — and all three emails belong to one account. This is the "Accounts Merge" problem on LeetCode (LC 721) — one of the most common interview applications of Union-Find.

The pattern: create a Union-Find over the items (emails, usernames, etc.). For each equivalence pair, call Union. Then group by Find to collect equivalence classes. The transitive closure is handled automatically — if a~b and b~c, then Find(a) == Find(b) == Find(c) after the unions.

Least Common Ancestor (Offline)

Least Common Ancestor (Offline). Tarjan's offline LCA algorithm uses Union-Find to answer all LCA queries in O(n + q · α(n)) time, where q is the number of queries. During a DFS traversal, when you finish processing a subtree rooted at v, you union v with its parent. To answer "LCA(u, v)?", wait until both u and v have been visited. At that point, Find(u) gives the LCA. This is one of the most elegant applications of Union-Find — using it during a tree traversal to track ancestry relationships.

Minimum Spanning Forest. Kruskal's algorithm generalizes naturally to disconnected graphs. Process all edges sorted by weight, and use Union-Find for cycle detection. The result is a minimum spanning forest — a minimum spanning tree for each connected component. The algorithm does not need to know how many components there are; Union-Find handles it automatically.

Maze Generation. Start with a grid where every cell is walled off from its neighbors. Randomly knock down walls between adjacent cells. Before knocking down a wall, check if the two cells are already connected (using Union-Find). If they are, skip this wall (knocking it down would create a cycle). If they are not, knock it down and union the cells. The result is a random maze with exactly one path between any two cells — a spanning tree of the grid graph. The Union-Find version is equivalent to randomized Kruskal's.

Percolation

The showcase application. Imagine a grid of cells, initially all blocked. You randomly open cells one at a time. Question: does an open path exist from the top row to the bottom row? This is percolation — a fundamental problem in physics (fluid flow through porous material), material science, and epidemiology (disease spread).

Model it with Union-Find: each open cell is an element. When a cell opens, union it with any open neighbors. Add two virtual nodes: one connected to all open cells in the top row, one connected to all open cells in the bottom row. The system percolates when the two virtual nodes are in the same set.

Percolation threshold. On an n × n grid, there is a sharp phase transition at approximately p* ≈ 0.593 — the fraction of cells that must be open for percolation to occur with high probability. Below p*, the system almost never percolates. Above p*, it almost always does. The transition is abrupt, not gradual. Union-Find lets us simulate this efficiently.

The simulation below lets you watch percolation happen. Cells open randomly. Blue = open, dark = blocked. When a path forms from top to bottom, the system percolates and the connected path lights up green.

Percolation Simulator

Open cells randomly until the grid percolates (top-to-bottom path). The threshold is around 59.3%.

Grid size 20

Grid: 20x20 | Open: 0/400 (0.0%) | Not percolating

Try running "Run Until Percolation" several times with a 20x20 grid. You will notice it consistently percolates when about 55-65% of cells are open — clustering tightly around the theoretical threshold of 59.3%. This is a beautiful example of a phase transition: a tiny change in the fraction of open cells flips the system from "almost certainly blocked" to "almost certainly percolating."

The Virtual Node Trick

The percolation simulation uses a clever optimization: virtual top and bottom nodes. Instead of checking whether any top-row cell is connected to any bottom-row cell (which would require O(n) Find calls), we add two virtual nodes:

virtual_top = n*n // connected to all open cells in row 0
virtual_bottom = n*n+1 // connected to all open cells in row n-1
// Percolates ⟺ Find(virtual_top) == Find(virtual_bottom)
// One Find call instead of n²

When a cell in the top row opens, union it with virtual_top. When a cell in the bottom row opens, union it with virtual_bottom. To check percolation: just check if virtual_top and virtual_bottom are connected. One single Find pair instead of scanning all top-row and bottom-row cells. This is a common Union-Find pattern: add sentinel nodes to simplify boundary conditions.

Why Union-Find Matters for Percolation

The percolation problem is central to computational physics. Scientists need to simulate millions of random configurations to estimate the percolation threshold. Each configuration opens ~60% of an n × n grid and checks connectivity. For n = 1000, that is 600,000 cell openings per trial, each requiring neighbor unions. Without Union-Find, each connectivity check would require BFS over potentially hundreds of thousands of cells. With Union-Find, the entire trial completes in milliseconds. This is how Monte Carlo simulations of phase transitions become computationally feasible.

python
import random

def estimate_threshold(n, trials=1000):
    """Monte Carlo estimation of percolation threshold on n×n grid."""
    total_frac = 0.0
    for _ in range(trials):
        uf = UnionFind(n * n + 2)     # +2 for virtual top/bottom
        top, bot = n * n, n * n + 1
        grid = [False] * (n * n)
        order = list(range(n * n))
        random.shuffle(order)
        for count, cell in enumerate(order):
            grid[cell] = True
            r, c = divmod(cell, n)
            for dr, dc in [(-1,0),(1,0),(0,-1),(0,1)]:
                nr, nc = r + dr, c + dc
                if 0 <= nr < n and 0 <= nc < n:
                    if grid[nr * n + nc]:
                        uf.union(cell, nr * n + nc)
            if r == 0: uf.union(cell, top)
            if r == n - 1: uf.union(cell, bot)
            if uf.connected(top, bot):
                total_frac += (count + 1) / (n * n)
                break
    return total_frac / trials
# estimate_threshold(100, 1000) ≈ 0.593

Running 1000 trials on a 100×100 grid takes about 2 seconds with Union-Find. Without it (using BFS per step), the same simulation would take about 20 minutes. That is the difference between a quick experiment and an overnight job.

Concept check: In Kruskal's MST algorithm, why does Union-Find help?

It sorts the edges by weight It efficiently answers "are u and v in the same component?" to detect cycles — without Union-Find, each such check would require a graph traversal It finds shortest paths between nodes

Chapter 8: Interview Arsenal

Union-Find is one of the most interview-friendly data structures because: (1) it is simple to implement from scratch, (2) recognizing when to use it is a skill that impresses interviewers, and (3) many seemingly complex graph problems reduce to union-find in disguise.

The Template

This is the production-grade UnionFind class you should have memorized. It handles all edge cases and is ready for any interview or contest.

python
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.count = n              # number of disjoint sets

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry: return False  # already connected
        if self.rank[rx] < self.rank[ry]:
            rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]:
            self.rank[rx] += 1
        self.count -= 1
        return True               # successfully merged

    def connected(self, x, y):
        return self.find(x) == self.find(y)

Let us walk through this template line by line, because getting the details wrong in an interview is easy:

Line 1: __init__. parent[i] = i makes every element its own root. rank[i] = 0 because a single node has height 0. count = n because we start with n singleton sets.

Lines 5-7: find with path compression. The recursion is the critical trick. The call self.find(self.parent[x]) recursively finds the root, and then self.parent[x] = ... assigns that root as x's direct parent. Every node on the path gets compressed. In Python, the recursion depth limit is 1000 by default — for very large inputs, use the iterative version or increase sys.setrecursionlimit.

Lines 10-14: union by rank. We swap so rx always has the higher rank. Then ry becomes rx's child. Rank only increases when equal — this is the case where the tree actually gets taller. The return value (True/False) is essential for cycle detection problems.

Key implementation details to remember:

Interview checklist.
• union returns bool — True if a merge happened, False if already connected. This is crucial for cycle detection (Kruskal's, redundant connection).
• count tracks components — decremented on each successful merge. Avoids a separate traversal to count components.
• find uses path compression — the recursive one-liner. Interviewers know and expect this.
• union by rank, not size — both work, but rank is the CLRS-standard approach.

Five Interview Dimensions

1. Number of Connected Components. Given n nodes and a list of edges, how many components? Answer: make n sets, union along each edge, return uf.count.

python
def count_components(n, edges):
    uf = UnionFind(n)
    for u, v in edges:
        uf.union(u, v)
    return uf.count

2. Number of Islands (2D grid). Given a grid of '1' (land) and '0' (water), count islands. Classic BFS/DFS problem, but Union-Find works too: treat each land cell as a node, union with adjacent land cells.

python
def num_islands(grid):
    if not grid: return 0
    rows, cols = len(grid), len(grid[0])
    uf = UnionFind(rows * cols)
    water = 0
    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '0':
                water += 1
                continue
            for dr, dc in [(1,0),(0,1)]:  # right and down only
                nr, nc = r + dr, c + dc
                if nr < rows and nc < cols and grid[nr][nc] == '1':
                    uf.union(r * cols + c, nr * cols + nc)
    return uf.count - water

The key insight: we index the 2D grid as a 1D array. Cell (r, c) maps to index r * cols + c. We only union rightward and downward to avoid double-processing. We subtract water cells from the total count. This runs in O(rows × cols × α(rows × cols)) — effectively linear in the number of cells.

3. Redundant Connection (LC 684). Given a tree with one extra edge (creating exactly one cycle), find the extra edge. Process edges in order; the first edge that connects two already-connected nodes is the answer.

python
def find_redundant(edges):
    n = len(edges)
    uf = UnionFind(n + 1)            # 1-indexed nodes
    for u, v in edges:
        if not uf.union(u, v):       # already connected = cycle!
            return [u, v]

This is a beautiful example of cycle detection: a tree with n nodes has exactly n-1 edges. If you are given n edges (one too many), one edge must create a cycle. Union-Find detects it on the spot. The key insight is that union returns False precisely when the edge is redundant — both endpoints already belong to the same connected component, meaning a path between them already exists.

Variant: Redundant Connection II (LC 685) is the directed version — much harder because you must handle two cases (node with two parents, or a cycle). The solution uses Union-Find with careful case analysis. This is a Google favorite interview question.

4. Accounts Merge (LC 721). Given accounts where each account is [name, email1, email2, ...], merge accounts that share any email. The trick: create a Union-Find over emails (not accounts). For each account, union all its emails together. Then group emails by their root representative.

python
from collections import defaultdict

def accounts_merge(accounts):
    # Map each email to an integer ID
    email_to_id = {}
    email_to_name = {}
    idx = 0
    for name, *emails in accounts:
        for email in emails:
            if email not in email_to_id:
                email_to_id[email] = idx
                idx += 1
            email_to_name[email] = name

    uf = UnionFind(idx)
    for name, *emails in accounts:
        for i in range(1, len(emails)):
            uf.union(email_to_id[emails[0]], email_to_id[emails[i]])

    # Group by root
    groups = defaultdict(list)
    for email, eid in email_to_id.items():
        groups[uf.find(eid)].append(email)

    return [[email_to_name[g[0]]] + sorted(g) for g in groups.values()]

5. Earliest Time Everyone Knows Each Other (LC 1101). Given friendship logs [(timestamp, u, v), ...], find the earliest time when all n people are in one connected component. Process friendships in chronological order, union on each. When uf.count == 1, return the timestamp.

python
def earliest_all_connected(logs, n):
    logs.sort()                       # sort by timestamp
    uf = UnionFind(n)
    for t, u, v in logs:
        uf.union(u, v)
        if uf.count == 1:
            return t              # everyone connected!
    return -1                        # never fully connected

Time and Space Complexity Summary

Operation	Time (amortized)	Notes
MakeSet	O(1)	Initialize parent[x]=x, rank[x]=0
Find	O(α(n))	With path compression
Union	O(α(n))	Two Finds + O(1) pointer update
Connected	O(α(n))	Two Finds + comparison
Space	O(n)	Two arrays: parent[] and rank[]

For m operations on n elements: O(n + m · α(n)) total. Since α(n) ≤ 4 for all practical n, this is O(n + 4m) = O(n + m). Linear in the number of operations. You cannot ask for more.

Common Mistakes in Interviews

Gotchas to avoid.
• Forgetting to call Find before comparing. Do NOT compare parent[x] == parent[y]. Parent pointers may not point to roots. Always compare Find(x) == Find(y).
• Unioning values instead of roots. Union must operate on roots: rx=Find(x), ry=Find(y), then parent[ry]=rx. Setting parent[y]=x is wrong — y might not be a root.
• Off-by-one with 1-indexed nodes. Many LeetCode problems use 1-indexed nodes. Initialize UnionFind(n+1) and use indices 1..n.
• Not returning bool from Union. For cycle detection, you need to know whether the union actually merged two different sets. Return False if already connected.

C++ Template (for competitive programming)

cpp
struct DSU {
    vector<int> parent, rank_;
    int components;

    DSU(int n) : parent(n), rank_(n, 0), components(n) {
        iota(parent.begin(), parent.end(), 0);  // parent[i] = i
    }

    int find(int x) {
        if (parent[x] != x)
            parent[x] = find(parent[x]);  // path compression
        return parent[x];
    }

    bool unite(int a, int b) {
        a = find(a); b = find(b);
        if (a == b) return false;
        if (rank_[a] < rank_[b]) swap(a, b);
        parent[b] = a;
        if (rank_[a] == rank_[b]) rank_[a]++;
        components--;
        return true;
    }
};

In C++, use iota to initialize parent[i] = i. Name the method unite not union since union is a reserved keyword in C/C++. The rest is identical to the Python version.

Java Template

java
class UnionFind {
    int[] parent, rank;
    int count;

    UnionFind(int n) {
        parent = new int[n];
        rank = new int[n];
        count = n;
        for (int i = 0; i < n; i++) parent[i] = i;
    }

    int find(int x) {
        if (parent[x] != x)
            parent[x] = find(parent[x]);  // path compression
        return parent[x];
    }

    boolean union(int a, int b) {
        a = find(a); b = find(b);
        if (a == b) return false;
        if (rank[a] < rank[b]) { int t = a; a = b; b = t; }
        parent[b] = a;
        if (rank[a] == rank[b]) rank[a]++;
        count--;
        return true;
    }
}

The Java version is essentially the same as C++ — allocate arrays in the constructor, path compression in find, union by rank in union. Java's default stack depth handles recursion up to ~5000 frames, which is sufficient for most inputs (since rank never exceeds ~20 for n ≤ 10⁶).

Complexity Cheat Sheet for Interviews

Memorize this.
• Space: O(n) — two arrays of size n
• MakeSet: O(1)
• Find: O(α(n)) amortized ≈ O(1)
• Union: O(α(n)) amortized ≈ O(1)
• Connected: O(α(n)) amortized ≈ O(1)
• m operations on n elements: O(n + m · α(n)) ≈ O(n + m)
• α(n) ≤ 4 for n ≤ 10⁸⁰
• Lower bound: Ω(m · α(n)) — optimal for pointer-based structures

When to Recognize Union-Find

Pattern recognition. Use Union-Find when you see:
• "Connected components" or "groups" that merge over time
• "Same group / different group" queries
• Edges arriving incrementally (but never deleted)
• Cycle detection in an undirected graph
• Equivalence classes (transitive closure of pairwise relations)
• Any problem where BFS/DFS is called repeatedly and you need it faster

Interactive Union-Find Drill

Build and query a union-find structure. Enter node IDs (0-15) and perform operations.

Components: 16 | Enter two node IDs and click an operation.

Concept check: You are processing edges of an undirected graph. Edge (u, v) arrives, and uf.find(u) == uf.find(v). What does this tell you?

Adding edge (u, v) would create a cycle — u and v are already connected through previously added edges u and v are the same node The graph is fully connected

Chapter 9: Connections

Union-Find does not exist in a vacuum. It is a building block for graph algorithms, a tool for dynamic connectivity, and a gateway to deeper ideas in amortized analysis.

What We Covered

Concept	Key Idea	Complexity
Linked-list representation	Head pointers for O(1) Find	O(n) Union
Weighted union heuristic	Append short to long	O(log n) amortized
Forest representation	Parent pointers, tree structure	O(n) Find worst case
Union by rank	Attach short tree under tall	O(log n) Find
Path compression	Flatten on every Find	O(α(n)) combined
Inverse Ackermann	α(n) ≤ 4 for all practical n	Optimal (Tarjan '75)

Where Union-Find Appears

Algorithm/Problem	Role of Union-Find
Kruskal's MST (CLRS Ch 23)	Cycle detection during edge processing
Connected components (CLRS Ch 22)	Alternative to BFS/DFS for incremental connectivity
LCA (offline Tarjan's algorithm)	Merging subtrees during DFS
Percolation	Detecting top-to-bottom connectivity
Image segmentation	Merging similar adjacent pixels
Network redundancy	Finding redundant connections (cycles)

Limitations

Union-Find handles incremental connectivity: you can add edges (union) but not remove them. If you need to delete edges and maintain connectivity, you need more advanced structures: link-cut trees (Sleator and Tarjan, 1983) support fully dynamic connectivity in O(log n) per operation. For offline problems where all queries are known in advance, offline deletion via time-reversed union-find can help.

Union-Find also does not support finding the actual path between two nodes — only whether a path exists. For path queries, use BFS/DFS or a shortest-path algorithm.

When NOT to Use Union-Find

Union-Find is not always the right tool. Here are common mistakes:

Wrong tool for the job:
• Shortest paths: Union-Find tells you if nodes are connected, not HOW. Use Dijkstra/BFS.
• Directed graphs: Union-Find treats edges as undirected. For directed connectivity (reachability), use DFS/BFS or strongly connected components (Tarjan's SCC, ironically by the same Tarjan).
• Weighted components: "What is the minimum weight edge between two nodes?" requires augmented structures, not basic Union-Find.
• Static graphs: If the graph does not change, precompute connected components with a single DFS. Union-Find shines when edges arrive dynamically.
• Edge deletions: Standard Union-Find cannot handle deletions. Do not try to "un-union" — it does not work with path compression.

A good rule of thumb: if the problem says "edges are added over time" or "groups merge over time" or "find if two elements are equivalent," think Union-Find. If the problem says "find the shortest path" or "edges can be removed" or "directed graph," think other tools.

Decision Flowchart: Should You Use Union-Find?

Is the graph directed?

YES → Use SCC (Tarjan/Kosaraju) for reachability. NO → continue.

↓

Do edges get deleted?

YES → Use link-cut trees or offline tricks. NO → continue.

↓

Do you need paths or distances?

YES → Use BFS/Dijkstra. NO → continue.

↓

Is the graph static (no changes)?

YES → Use DFS once, then answer queries O(1). NO → continue.

↓

Dynamic undirected connectivity?

YES → USE UNION-FIND. O(α(n)) per operation.

Related Lessons

Continue learning.
• CLRS Ch 23 — MST: Kruskal's algorithm uses Union-Find for cycle detection. The MST chapter is the natural next step.
• CLRS Ch 22 — Graph Algorithms: BFS/DFS for connected components — the "slow" alternative that Union-Find replaces for dynamic problems.
• Amortized Analysis (CLRS Ch 17): The formal framework behind the O(α(n)) bound. Understanding potential functions and aggregate analysis deepens your appreciation of why union by rank + path compression work so well together.

The Implementation Spectrum

In practice, there are several ways to implement Union-Find, depending on your needs:

Variant	When to use	Extra features
Rank + path compression	CLRS-standard, proofs	Optimal bound proven
Size + path compression	Interviews, contests	Component sizes for free
Rank + path halving	Iterative (no stack overflow)	Safe for n > 10⁶
Weighted quick-union	Sedgewick/Princeton style	Same as size variant
Rollback-capable	Offline/divide-and-conquer	Undo unions (no compression)

The rollback variant is worth noting: if you need to undo unions (for divide-and-conquer on queries), you cannot use path compression (it is destructive). Instead, use union by rank without compression, and maintain a stack of changes. Each union pushes the old parent/rank values; rollback pops and restores them. The cost is O(log n) per operation instead of O(α(n)), but the undo capability is essential for offline dynamic connectivity.

python
class RollbackUF:
    """Union-Find with undo capability. No path compression."""
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.history = []           # stack of (node, old_parent, old_rank)

    def find(self, x):
        while self.parent[x] != x:  # NO path compression!
            x = self.parent[x]
        return x

    def union(self, a, b):
        a, b = self.find(a), self.find(b)
        if a == b: return
        if self.rank[a] < self.rank[b]:
            a, b = b, a
        self.history.append((b, self.parent[b], self.rank[a]))
        self.parent[b] = a
        if self.rank[a] == self.rank[b]:
            self.rank[a] += 1

    def rollback(self):
        if not self.history: return
        b, old_parent, old_rank_a = self.history.pop()
        a = self.parent[b]
        self.rank[a] = old_rank_a
        self.parent[b] = old_parent

Union-Find is a masterclass in algorithm design. Two observations — "keep trees short" and "flatten paths as you go" — combine to produce a data structure that is asymptotically optimal and trivial to implement. Tarjan's 1975 analysis showed the bound is tight: no pointer-based structure can do better. And yet the implementation fits in 20 lines of Python.

Historical Context

The Union-Find data structure has a rich history in algorithm design:

Year	Contribution	Complexity
1964	Galler and Fischer: first forest-based union-find	Introduced the idea
1973	Hopcroft and Ullman: union by rank analysis	O(m log* n)
1975	Tarjan: tight bound using inverse Ackermann	O(m · α(n))
1984	Tarjan and van Leeuwen: simplified proof	Θ(m · α(n))
1989	Fredman and Saks: lower bound for cell-probe model	Ω(m · α(n))

The 1975 paper by Robert Tarjan is particularly notable. It introduced the use of the Ackermann function in algorithm analysis — a connection that seemed absurd at the time. A function that grows faster than any computable function in the "fast-growing hierarchy" appeared in the analysis of a 10-line algorithm. This connection between combinatorics and logic (the Ackermann function originated in mathematical logic as an example of a total computable function that is not primitive recursive) remains one of the most surprising results in theoretical computer science.

Disjoint Sets