Kruskal, Prim, Union-Find — connecting everything at minimum cost.
You are an engineer for a telecommunications company. Five towns need to be connected with fiber-optic cable. You have surveyed every possible route between every pair of towns and know the exact cost of laying cable along each route. Your budget is tight. You need every town to be reachable from every other town, but you want to spend as little money as possible on cable.
There are potentially ten routes between five towns (every pair). You do not need all ten. In fact, you know from graph theory that connecting five towns requires exactly four cables — one fewer than the number of towns. Any fewer and some town is disconnected. Any more and you are paying for a redundant connection.
But which four? There are many possible sets of four edges that keep all towns connected. Some cost much more than others. You want the cheapest set. This is the minimum spanning tree problem.
A spanning tree of a connected graph G = (V, E) is a subgraph that includes every vertex and is a tree — meaning it is connected and has no cycles. A spanning tree always has exactly |V| - 1 edges. Think of it as the skeleton of the graph: the minimum wiring to keep everything connected.
A minimum spanning tree (MST) is a spanning tree whose total edge weight is as small as possible. If all edge weights are distinct, the MST is unique. If some weights tie, there may be multiple MSTs, but they all have the same total weight.
The canvas below shows five cities with all possible connections. Each edge has a cost. Click "Find MST" to watch the minimum spanning tree emerge. Notice: we use only 4 of the 10 possible edges, and their total cost is the smallest possible.
Click "Find MST" to highlight the cheapest set of edges connecting all cities. Click "Reset" to try again.
A tempting first idea: sort all edges by weight and pick the four cheapest. But this can fail. The four cheapest edges might all connect the same three towns, leaving two towns stranded. Or the four cheapest might form a cycle, wasting a connection.
Let us see this concretely. Suppose five cities have these edge costs: A-B=1, A-C=1, B-C=2, C-D=5, D-E=6, A-D=7, B-E=8, A-E=9, C-E=10. The four cheapest edges are A-B(1), A-C(1), B-C(2), C-D(5). But A-B, A-C, and B-C form a triangle — a cycle. Adding all three wastes one edge on a redundant connection. D and E both end up disconnected or poorly served.
We need a strategy that is greedy — always picking cheap edges — but also careful — never creating a cycle, never leaving a town disconnected. This is exactly what Kruskal's and Prim's algorithms do, and the theoretical foundation for why they work is the cut property, which we will prove in Chapter 1.
Cayley's formula tells us a complete graph on n vertices has nn-2 spanning trees. For 5 vertices: 53 = 125 spanning trees. For 10 vertices: 108 = 100 million. For 20 vertices: 2018 ≈ 2.6 × 1023. Brute force is impossible even for small graphs. We need an algorithm that finds the minimum without examining all candidates. Greedy does this in O(E log E) time.
A tree on n vertices is a connected acyclic graph. The following statements are ALL equivalent (any one implies all the others):
| # | Property |
|---|---|
| 1 | T is connected and has no cycles |
| 2 | T is connected and has exactly n - 1 edges |
| 3 | T has no cycles and has exactly n - 1 edges |
| 4 | There is exactly one path between every pair of vertices |
| 5 | T is connected, but removing any edge disconnects it |
| 6 | T has no cycles, but adding any edge creates exactly one cycle |
Property 6 is crucial for MST algorithms. When Kruskal considers adding an edge and both endpoints are in the same component, that edge would create a cycle (property 6). When we add an edge to an MST for the exchange argument, it creates exactly one cycle (property 6 again). These properties are the hidden scaffolding behind every MST proof.
An MST only makes sense for weighted graphs — graphs where each edge has a numerical weight (cost, distance, bandwidth, etc.). For unweighted graphs, all spanning trees have the same "cost" (V - 1 edges), so any spanning tree is an MST. The interesting case is when weights differ.
Weights can be negative. The MST still makes sense — we minimize total weight, so negative edges are "free money" (they reduce cost). All MST algorithms work correctly with negative weights. This is different from shortest paths, where negative weights cause problems for Dijkstra (but not Bellman-Ford).
This is the single most common misconception, so let us nail it down with a concrete example before we go any further.
Consider three cities: A, B, C. Direct routes: A-B costs 1, B-C costs 1, A-C costs 3.
| Question | Answer | Edges Used | Total Cost |
|---|---|---|---|
| MST (cheapest to connect all) | Use A-B and B-C | A-B(1), B-C(1) | 2 |
| Shortest path A to C | Go A-B-C via MST | Path weight: 1+1 = 2 | 2 |
| Shortest-path tree from A | A-B(1), A-C(3) gives shortest individual paths | A-B(1), A-C(3) | 4 |
The MST total cost is 2. The shortest-path tree from A total cost is 4 — worse! But the shortest-path tree guarantees each individual path (A-to-B, A-to-C) is optimal. The MST does not make this guarantee. In this example, the MST path from A to C (A→B→C = 2) happens to be shorter than the direct route (A-C = 3). But that is a coincidence. With different weights, the MST path can be much longer than the shortest path.
Change the weights: A-B = 10, B-C = 10, A-C = 3. Now the MST is A-C(3) + B-C(10) = 13. The MST path from A to B is A→C→B = 13. But the shortest path from A to B is the direct edge A-B = 10. The MST path is 30% longer. MST minimizes total infrastructure cost. Shortest-path trees minimize individual route lengths. Different optimization objectives, different answers.
Before we learn any MST algorithm, we need to understand WHY greedy works here. Not all optimization problems yield to greedy strategies — knapsack, for instance, does not. The reason MST does is a beautiful theorem called the cut property.
A cut of a graph G = (V, E) is a partition of the vertices into two non-empty sets S and V \ S. Think of drawing a line across the graph, putting some vertices on one side and the rest on the other. Every edge with one endpoint in S and the other in V \ S is a crossing edge.
A cut respects a set of edges A if no edge in A crosses the cut. A crossing edge (u, v) is a light edge if its weight is the minimum among all crossing edges.
Let A be a subset of edges contained in some MST. Let (S, V \ S) be any cut that respects A. Let (u, v) be a light edge crossing this cut. Then (u, v) is safe for A — meaning A ∪ {(u, v)} is still contained in some MST.
In plain English: if you draw a line across the graph and no edge you have already chosen crosses that line, then the cheapest edge crossing that line is always a safe choice for the MST.
CLRS presents a beautiful abstraction: the generic MST algorithm. Both Kruskal and Prim are special cases of this single template.
The theorem guarantees that at every step, there EXISTS such a cut and such a light edge. The specific algorithms differ only in HOW they choose the cut:
| Algorithm | How It Chooses the Cut |
|---|---|
| Kruskal | S = the component containing one endpoint of the lightest remaining edge |
| Prim | S = the vertices already in the growing tree |
| Boruvka | S = each component independently (all cuts simultaneously) |
This unification is elegant because it separates the correctness proof (the cut property) from the efficiency analysis (which depends on data structures). Any algorithm that follows the generic template and uses the cut property correctly will produce an MST. The only question is how fast it can find the right cut and the light edge.
The proof is elegant and uses a technique you will see again and again in algorithm design: the exchange argument. Here is how it works.
Assume A ⊆ T for some MST T. Let (u, v) be a light edge crossing cut (S, V \ S) that respects A. We want to show A ∪ {(u, v)} is contained in some MST.
Case 1: (u, v) ∈ T. Then A ∪ {(u, v)} ⊆ T, and we are done.
Case 2: (u, v) ∉ T. Adding (u, v) to T creates exactly one cycle (because T is a tree, so adding any edge creates exactly one cycle). This cycle must contain another edge (x, y) that crosses the same cut (S, V \ S), because the cycle enters S through one crossing edge and must leave through another.
Since (u, v) is a light edge, w(u, v) ≤ w(x, y). Remove (x, y) from T and add (u, v). Call the result T'. Then:
Since T was an MST, w(T') ≥ w(T). Combined with w(T') ≤ w(T), we get w(T') = w(T). So T' is also an MST, and A ∪ {(u, v)} ⊆ T'. Done.
If the light edge crossing a cut is unique (no ties), then it is in EVERY MST. The proof is the same exchange argument: if some MST T does not contain it, we swap it in and get a strictly lighter tree (w(u,v) < w(x,y) since there is no tie), contradicting T being an MST. This is why distinct edge weights guarantee a unique MST.
There is a dual result: the cycle property. For any cycle C in the graph, the heaviest edge in C is NOT in any MST (assuming distinct weights). Proof: suppose the heaviest edge e = (u,v) in cycle C is in some MST T. Remove e from T, splitting it into two components S and V\S. The cycle C must contain another edge e' crossing this same cut (the cycle enters and leaves S). Since e is the heaviest edge in C, w(e') < w(e). But then T - {e} + {e'} is a lighter spanning tree, contradicting T being an MST.
Consider a graph with vertices {A, B, C, D} and edges: A-B(2), A-C(3), B-C(1), B-D(4), C-D(5).
Start with A = {} (no MST edges yet). Consider the cut ({A}, {B, C, D}). It respects A (trivially, since A is empty). Crossing edges: A-B(2), A-C(3). The light edge is A-B(2). By the cut property, A-B is safe. Add it: A = {A-B}.
Now consider the cut ({A, B}, {C, D}). Does it respect A? The only edge in A is A-B, which has both endpoints in {A, B}, so yes. Crossing edges: A-C(3), B-C(1), B-D(4), C-D(5). Wait — C-D has both endpoints in {C, D}, so it does not cross. Crossing edges: A-C(3), B-C(1), B-D(4). Light edge: B-C(1). Add it: A = {A-B, B-C}.
Now consider the cut ({A, B, C}, {D}). Respects A? A-B has both ends in the left set. B-C has both ends in the left set. Yes. Crossing edges: B-D(4), C-D(5). Light edge: B-D(4). Add it: A = {A-B, B-C, B-D}.
We have 3 = V-1 edges. MST complete. Total weight = 2+1+4 = 7. Each step was justified by the cut property applied to a different cut.
The canvas below shows a graph with a cut (orange dashed line). Vertices on each side are colored differently. All crossing edges are highlighted. The lightest crossing edge is marked. Try different cuts with the buttons to see how the light edge changes.
Click "Next Cut" to cycle through different cuts. The lightest crossing edge (safe to add) is shown in green.
Cut 1 of 4Kruskal's algorithm is beautifully simple. Sort all edges by weight. Go through them from lightest to heaviest. For each edge, if it connects two different components (does not create a cycle), add it to the MST. Otherwise, skip it. Stop when you have V - 1 edges.
When Kruskal considers edge (u, v), suppose u and v are in different components Cu and Cv. Define the cut as (Cu, V \ Cu). This cut respects all previously chosen MST edges (because no chosen edge crosses component boundaries — each component is a tree of chosen edges). Edge (u, v) crosses this cut. Because edges are sorted by weight, (u, v) is the lightest edge we have seen that crosses this cut. By the cut property, it is safe to add.
The bottleneck is sorting. Let us count operations precisely:
| Operation | Count | Cost Each | Total |
|---|---|---|---|
| Sort edges | 1 | O(E log E) | O(E log E) |
| MakeSet | V | O(1) | O(V) |
| Find (cycle check) | 2E (two per edge) | O(α(V)) | O(E α(V)) |
| Union | V - 1 | O(α(V)) | O(V α(V)) |
| Total | O(E log E) | ||
Since E ≤ V2 for simple graphs, log E ≤ 2 log V, so O(E log E) = O(E log V). Since E ≥ V - 1 for connected graphs, the sorting dominates all other costs.
Sometimes the edges come pre-sorted (from a Delaunay triangulation, for instance, or from reading a sorted file). In that case, Kruskal's sorting step is free, and the total runtime is O(E α(V)) — essentially linear. This makes Kruskal the fastest known algorithm for pre-sorted edge lists.
Claim: every edge Kruskal adds is in some MST. Every edge Kruskal skips is in no MST.
Edge added: When Kruskal adds edge (u, v), vertices u and v are in different components Cu and Cv. The cut (Cu, V \ Cu) respects all previously chosen edges (all chosen edges have both endpoints within the same component). Edge (u, v) is the lightest crossing edge (if there were a lighter crossing edge, Kruskal would have processed it first, since edges are sorted). By the cut property, (u, v) is safe.
Edge skipped: When Kruskal skips edge (u, v), both u and v are in the same component. The previously added edges form a tree within this component. Adding (u, v) would create a cycle. The cycle contains (u, v) and some subset of already-added edges. Since all previously added edges have weight ≤ w(u, v) (Kruskal processes in sorted order), (u, v) is the heaviest edge in the cycle. By the cycle property, (u, v) is in no MST.
This is a complete correctness proof: every edge is either added (correctly) or skipped (correctly). No edge is mislabeled.
Consider this graph with 6 vertices (A-F) and 9 edges:
| Edge | Weight | Action | Components After |
|---|---|---|---|
| A-B | 1 | Add (A≠B) | {A,B} {C} {D} {E} {F} |
| C-E | 2 | Add (C≠E) | {A,B} {C,E} {D} {F} |
| D-F | 2 | Add (D≠F) | {A,B} {C,E} {D,F} |
| A-C | 3 | Add (A≠C) | {A,B,C,E} {D,F} |
| B-C | 4 | Skip (B=C, same component) | {A,B,C,E} {D,F} |
| E-F | 4 | Add (E≠F) | {A,B,C,E,D,F} |
| Stop — 5 edges added (V-1 = 5). MST weight = 1+2+2+3+4 = 12. | |||
Click "Step" to add the next edge (if valid) or skip it (if it creates a cycle). Watch the Union-Find state below the graph.
Each step considers the next lightest edge. Green = added, red = skipped (would create cycle).
Ready. 0 / 5 edges in MST.python class UnionFind: def __init__(self, n): self.parent = list(range(n)) self.rank = [0] * n def find(self, x): if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) # path compression return self.parent[x] def union(self, x, y): rx, ry = self.find(x), self.find(y) if rx == ry: return False if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx self.parent[ry] = rx # union by rank if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1 return True def kruskal(n, edges): """n = number of vertices, edges = [(w, u, v), ...]""" edges.sort() # sort by weight uf = UnionFind(n) mst = [] for w, u, v in edges: if uf.union(u, v): mst.append((w, u, v)) if len(mst) == n - 1: break return mst
If the graph is not connected, no spanning tree exists (you cannot span all vertices with a tree if some are unreachable). Kruskal will add fewer than V-1 edges and stop. You can detect this: if len(mst) < n - 1 after processing all edges, the graph is disconnected. The edges Kruskal DID add form a minimum spanning forest — a collection of MSTs, one per connected component.
When multiple edges share the same weight, the sort order among them is arbitrary. Different orderings can produce different MSTs, but ALL will have the same total weight. This is because the multiset of edge weights in ANY MST of a graph is unique (even though the specific edges may differ). You can prove this using the matroid intersection theorem, but the intuition is: if you could get a lighter multiset, you would have a lighter MST.
While Kruskal thinks about edges ("which edge is cheapest globally?"), Prim thinks about vertices ("which vertex is cheapest to connect next?"). Prim grows a single tree from a starting vertex, always adding the cheapest edge that connects a tree-vertex to a non-tree-vertex.
At each step, the vertices already extracted form a set S. The vertices still in the queue form V \ S. This is a cut. The edges from S to V \ S have their cheapest tracked in the priority queue via key[v]. When we extract the minimum, we are choosing the light edge crossing the cut (S, V \ S). The cut respects the MST edges chosen so far (all chosen edges connect vertices within S). By the cut property, this light edge is safe.
Let us count operations for Prim with a binary heap:
| Operation | Count | Binary Heap Cost | Fibonacci Heap Cost |
|---|---|---|---|
| Insert | V | O(log V) | O(1) |
| Extract-Min | V | O(log V) | O(log V) amortized |
| Decrease-Key | E | O(log V) | O(1) amortized |
| Total | O((V+E) log V) | O(E + V log V) | |
For dense graphs where E = Θ(V2), Prim with a Fibonacci heap gives O(V2 + V log V) = O(V2), which beats Kruskal's O(V2 log V). For sparse graphs where E = O(V), Kruskal's O(V log V) matches Prim's O(V log V). In practice, binary heap Prim and Kruskal perform similarly because Fibonacci heaps have high constant factors.
Same graph as Kruskal (6 vertices A-F). Start from vertex A:
| Step | Extract | Key | Edge Added | Queue Updates |
|---|---|---|---|---|
| 1 | A (key=0) | 0 | — (start) | B:1, C:3 updated |
| 2 | B (key=1) | 1 | A-B (w=1) | C: min(3,4)=3, D:6 updated |
| 3 | C (key=3) | 3 | A-C (w=3) | E:2 updated, D: min(6,5)=5 |
| 4 | E (key=2) | 2 | C-E (w=2) | F:4 updated |
| 5 | F (key=4) | 4 | E-F (w=4) | D: min(5,2)=2 |
| 6 | D (key=2) | 2 | F-D (w=2) | Queue empty |
MST edges: A-B(1), A-C(3), C-E(2), E-F(4), F-D(2). Total = 12. Same total weight as Kruskal, but possibly different edges if weights tied.
This is the conceptual core of Prim. The key[v] array stores the cheapest single edge connecting v to any vertex already in the tree. It does NOT store the total path cost from the start vertex to v. This is the fundamental difference from Dijkstra.
When vertex u is extracted and we examine its neighbor v, we ask: "Is the direct edge (u,v) cheaper than v's current best connection to the tree?" If yes, update key[v] = w(u,v) and remember parent[v] = u. This is a comparison of single edge weights, not accumulated distances.
This means key[v] can decrease when we find a better single-edge connection, but it has nothing to do with how far v is from the starting vertex. A vertex at the "opposite end" of the graph might have key = 1 if there is a cheap edge connecting it to the tree.
Click "Step" to extract the next vertex and grow the tree. The frontier edges (candidates for next addition) are shown as dashed lines. The cheapest frontier edge is highlighted.
The tree grows from vertex A. Frontier edges shown dashed. Cheapest frontier edge in green.
Ready. Tree: {A}. 0 edges.python import heapq def prim(n, adj): """n = vertices, adj = {u: [(w, v), ...]} adjacency list""" in_mst = [False] * n key = [float('inf')] * n parent = [-1] * n key[0] = 0 pq = [(0, 0)] # (key, vertex) mst_edges = [] while pq: k, u = heapq.heappop(pq) if in_mst[u]: continue # already extracted in_mst[u] = True if parent[u] != -1: mst_edges.append((k, parent[u], u)) for w, v in adj[u]: if not in_mst[v] and w < key[v]: key[v] = w parent[v] = u heapq.heappush(pq, (w, v)) # lazy decrease-key return mst_edges
if in_mst[u]: continue line). This is the "lazy" approach. It uses O(E) space in the heap but keeps the code simple. The time complexity remains O(E log E) = O(E log V). In production C++ code, you would use an indexed priority queue for true O(E log V).python def prim_dense(n, weight): """weight[u][v] = edge weight (inf if no edge). O(V^2).""" in_mst = [False] * n key = [float('inf')] * n parent = [-1] * n key[0] = 0 for _ in range(n): # Find min-key vertex not yet in MST (linear scan) u = -1 for v in range(n): if not in_mst[v] and (u == -1 or key[v] < key[u]): u = v in_mst[u] = True # Relax all neighbors for v in range(n): if not in_mst[v] and weight[u][v] < key[v]: key[v] = weight[u][v] parent[v] = u # Reconstruct MST edges mst = [(key[v], parent[v], v) for v in range(1, n)] return mst
If you have already studied Dijkstra's shortest-path algorithm, Prim will look eerily familiar. Both algorithms:
The difference is in step 3. Prim asks: "Is the edge weight w(u,v) less than key[v]?" Dijkstra asks: "Is the path distance d[u] + w(u,v) less than d[v]?" Prim cares about the cheapest single edge. Dijkstra cares about the cheapest total path. One builds an MST. The other builds a shortest-path tree. Same skeleton, different bone density.
Prim starts from an arbitrary vertex. Different starting vertices may produce different MSTs (when weights tie), but they always produce the same total weight. If all edge weights are distinct, the MST is unique regardless of starting vertex. This is because the MST is a property of the graph, not the algorithm.
At each step, Prim maintains a tree T rooted at the start vertex. The cut (V(T), V \ V(T)) respects T (all edges of T have both endpoints in V(T)). The vertex extracted from the priority queue is connected to V(T) by the lightest crossing edge (that is what key[v] stores). By the cut property, this edge is safe.
The loop invariant: at the start of each iteration, the set of edges connecting extracted vertices forms a tree that is a subset of some MST. Since we add V-1 edges total and each is safe, the final result is an MST. This is the same invariant as the generic MST algorithm, just instantiated with the "growing tree" cut.
| Algorithm | Graph Storage | Extra Space | Total |
|---|---|---|---|
| Kruskal | O(E) edge list | O(V) Union-Find | O(V + E) |
| Prim (heap) | O(V + E) adj list | O(V) key/parent + O(E) lazy heap | O(V + E) |
| Prim (dense) | O(V2) adj matrix | O(V) key/parent | O(V2) |
Kruskal requires the edges as a list (for sorting). Prim requires an adjacency list or matrix (for neighbor iteration). If you are given one format and need the other, the conversion is O(V + E). In interviews, mention which format your algorithm expects and whether the input needs conversion.
Both algorithms produce optimal MSTs. They differ in perspective, data structures, and performance characteristics. Knowing when to use which is an interview favorite.
| Property | Kruskal | Prim |
|---|---|---|
| Perspective | Edge-centric ("cheapest edge globally") | Vertex-centric ("cheapest to connect next") |
| Growth pattern | Forest of trees that merge | Single tree that grows |
| Data structure | Union-Find | Min-priority queue |
| Time (binary heap) | O(E log E) | O((V+E) log V) |
| Time (Fibonacci heap) | O(E log E) | O(E + V log V) |
| Best for sparse | Yes — sort dominates, E is small | Equal |
| Best for dense | Slower — E ≈ V2, sorting V2 edges | Yes — with Fib heap, O(V2) |
| Needs edge list | Yes (natural input format) | No (adjacency list) |
| Parallelizable | Somewhat — sort in parallel | Harder — sequential extract-min |
| Implementation | Simpler (Union-Find is short) | Slightly more complex |
The canvas below runs both algorithms on the same graph simultaneously. Watch how Kruskal processes edges globally while Prim grows a tree locally. Same result, different journey.
Click "Step Both" to advance both algorithms. Left = Kruskal, Right = Prim.
Ready.If all edge weights are distinct, both algorithms produce the same unique MST. When weights tie, they may produce different MSTs (same total weight, different edge sets). Kruskal's tie-breaking depends on sort stability and input order. Prim's depends on starting vertex and tie-breaking in the priority queue.
To see the fundamental difference, let us trace both algorithms on the same graph with 4 vertices {A,B,C,D} and edges AB(1), AC(4), AD(3), BC(2), BD(5), CD(6).
Kruskal's order (by edge weight): AB(1)→BC(2)→AD(3)→AC(4)→BD(5)→CD(6)
| Edge | Weight | Find(u), Find(v) | Action | Components |
|---|---|---|---|---|
| A-B | 1 | A, B | Add | {A,B} {C} {D} |
| B-C | 2 | A, C (A is root of B) | Add | {A,B,C} {D} |
| A-D | 3 | A, D | Add | {A,B,C,D} |
| Done. MST = {AB(1), BC(2), AD(3)}. Weight = 6. | ||||
Prim's order (starting from A, by key value):
| Extract | Key | Edge Added | Queue After Updates |
|---|---|---|---|
| A | 0 | — | B:1, C:4, D:3 |
| B | 1 | A-B(1) | C: min(4,2)=2, D: min(3,5)=3 |
| C | 2 | B-C(2) | D: min(3,6)=3 |
| D | 3 | A-D(3) | (empty) |
Same MST! Both found {AB(1), BC(2), AD(3)} with weight 6. But look at the processing order: Kruskal processed edges globally sorted. Prim grew from A outward, processing vertices in order of their connection cost.
Let us compare the two algorithms on concrete graph sizes to build intuition for when each is faster:
| Graph Type | V | E | Kruskal O(E log E) | Prim-heap O((V+E)log V) | Prim-dense O(V2) | Winner |
|---|---|---|---|---|---|---|
| Sparse tree-like | 1000 | 2000 | ~22K | ~30K | 1M | Kruskal |
| Medium density | 1000 | 50K | ~780K | ~510K | 1M | Prim-heap |
| Dense (complete) | 1000 | 500K | ~9.5M | ~5M | 1M | Prim-dense |
| Huge sparse | 1M | 3M | ~65M | ~60M | 1012 | Kruskal |
The pattern: Kruskal wins for sparse graphs. Prim-heap wins for medium density. Prim-dense (no heap) wins for complete or near-complete graphs. Knowing all three gives you the right tool for every graph density.
Kruskal needs to answer one question fast: "Are u and v in the same component?" If they are, the edge creates a cycle and we skip it. If not, we add the edge and merge the components. The data structure that does this nearly instantaneously is Union-Find (also called the disjoint-set data structure).
MakeSet(x): Create a new set containing just x. Each element starts as its own singleton. Think: every town is its own island.
Find(x): Return the representative (root) of the set containing x. Two elements are in the same set if and only if they have the same representative. Think: "Who is the mayor of x's island?"
Union(x, y): Merge the sets containing x and y into one. Think: build a bridge between two islands, making them one.
The simplest version: each element has a parent pointer. The root of each tree is the representative (its own parent). Find follows parent pointers to the root. Union makes one root point to the other.
Problem: without care, the trees become tall chains. If you Union 1→2→3→4→...→n, then Find(1) takes O(n) time. After m operations, total cost is O(mn). Terrible.
Keep a rank for each root (roughly, the height of the tree). When unioning, make the shorter tree point to the taller tree. This guarantees tree height is at most O(log n).
During Find(x), after we walk up to the root, make every node on the path point directly to the root. Next time we call Find on any of those nodes, it returns in O(1).
With both union by rank and path compression, any sequence of m operations on n elements takes O(m · α(n)) time, where α is the inverse Ackermann function. This function grows so slowly that α(n) ≤ 4 for any n up to 265536. For all practical purposes, each operation is O(1).
Union by rank alone gives O(log n) per operation. Path compression alone (without union by rank) gives O(log n) amortized per operation. But together they achieve O(α(n)) amortized — better than either alone. The reason is subtle: path compression flattens the tree, making future finds faster, while union by rank prevents the tree from getting deep in the first place. Rank limits the "damage" and compression repairs it.
| Optimizations | Find Time | m operations on n elements |
|---|---|---|
| Neither | O(n) worst case | O(mn) |
| Union by rank only | O(log n) | O(m log n) |
| Path compression only | O(log n) amortized | O(m log n) |
| Both | O(α(n)) amortized | O(m α(n)) |
An alternative to union by rank is union by size: always attach the smaller tree under the larger tree. Both give O(log n) height guarantees, and both combine with path compression to give O(α(n)). Union by size is sometimes preferred because the size is always accurate (ranks become stale after path compression, though this does not affect correctness). In interviews, either is fine.
python # Union by size variant class UnionFindSize: def __init__(self, n): self.parent = list(range(n)) self.size = [1] * n # size instead of rank def find(self, x): if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) return self.parent[x] def union(self, x, y): rx, ry = self.find(x), self.find(y) if rx == ry: return False if self.size[rx] < self.size[ry]: rx, ry = ry, rx self.parent[ry] = rx self.size[rx] += self.size[ry] # size is always accurate return True
Click two nodes to Union them. Watch the trees merge. Notice how path compression flattens the tree after each Find.
Click a node to select it (yellow). Click another to Union them. Press "Find" to run Find on the selected node and see path compression.
Click a node to select it.python class UnionFind: def __init__(self, n): self.parent = list(range(n)) # each element is its own root self.rank = [0] * n # rank starts at 0 self.count = n # number of disjoint sets def find(self, x): # Path compression: make every node point to root if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) return self.parent[x] def union(self, x, y): rx, ry = self.find(x), self.find(y) if rx == ry: return False # already in same set # Union by rank: attach shorter tree under taller if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx self.parent[ry] = rx if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1 self.count -= 1 return True def connected(self, x, y): return self.find(x) == self.find(y)
Sometimes you need not just "are x and y connected?" but "what is the weight/distance/relation between x and y within the component?" This is the weighted Union-Find, where each node stores a weight relative to its parent. Find returns both the root and the accumulated weight from x to root. Union adjusts weights so the invariant is maintained.
python class WeightedUnionFind: def __init__(self, n): self.parent = list(range(n)) self.rank = [0] * n self.weight = [0] * n # weight[x] = weight from x to parent[x] def find(self, x): if self.parent[x] == x: return x, 0 root, w = self.find(self.parent[x]) self.parent[x] = root # path compression self.weight[x] += w # accumulate weight to root return root, self.weight[x] def union(self, x, y, w): """Assert weight(y) - weight(x) = w""" rx, wx = self.find(x) ry, wy = self.find(y) if rx == ry: return wy - wx == w # consistency check # weight[ry] = wx + w - wy self.weight[ry] = wx + w - wy self.parent[ry] = rx return True
This appears in problems like "evaluate division" (LeetCode 399), where edges have multiplicative weights and you need to compute the product along paths.
The recursive Find can overflow the stack for very deep trees (before compression flattens them). Here is the iterative version:
python def find_iterative(self, x): # Phase 1: walk to root root = x while self.parent[root] != root: root = self.parent[root] # Phase 2: compress — point everything to root while self.parent[x] != root: nxt = self.parent[x] self.parent[x] = root x = nxt return root
Same asymptotic complexity. Two passes instead of one. No recursion. Use this in production code or when n can exceed the recursion limit (Python defaults to 1000).
Instead of full path compression (all nodes point to root), path splitting makes each node point to its grandparent during Find. Simpler, no second pass, and still achieves O(α(n)) amortized:
python def find_splitting(self, x): while self.parent[x] != x: self.parent[x] = self.parent[self.parent[x]] # skip to grandparent x = self.parent[x] return x
This is the version used in many competitive programming implementations because it is one line of compression with no need for a separate root-finding pass.
What if you need to answer queries about component structure at different points in time? For example: "After adding the first k edges, how many components are there?" You can use offline processing: sort queries by k, process edges in order, answer queries as you go. For fully persistent Union-Find (query any past state), use a persistent array with O(log n) overhead per operation.
Why does union by rank guarantee O(log n) height? Each time a node's depth increases by 1, its subtree size at least doubles (because it was unioned with a subtree of equal or larger size). A subtree of n elements can double at most log2(n) times before it contains all elements. Therefore, maximum height = log2(n).
The proof that path compression + union by rank gives O(α(n)) amortized is one of the most beautiful results in algorithm analysis. It uses a potential function that assigns a "bank account" to each node based on its rank relative to its parent's rank. Each Find pays for the compression work by spending saved potential. The inverse Ackermann function α emerges from counting how many "levels" of rank compression are needed. The full proof is in CLRS Chapter 21 and is worth reading for anyone interested in amortized analysis.
This is the payoff. Below is a fully interactive graph editor. Place vertices, draw weighted edges, then run Kruskal or Prim with step-by-step animation. Modify the graph and see how the MST changes.
Build: Click empty space to place a vertex. Click vertex then vertex to add an edge (enter weight in prompt). Run: Choose algorithm and click "Run" to animate MST construction. Delete: Right-click a vertex or edge to remove it.
Adding vertices: Click on empty space. Each vertex gets a sequential number (0, 1, 2, ...). You can place up to 15 vertices.
Adding edges: Click on a vertex to select it (turns yellow). Click on another vertex. A prompt appears asking for the edge weight. Enter a number between 1 and 99. The edge appears between the two vertices with the weight label.
Deleting: Right-click on a vertex to remove it and all its edges. Right-click near an edge to remove just that edge. This lets you experiment with how the MST changes when the graph structure changes.
Running algorithms: Select Kruskal or Prim from the dropdown. Click "Run" for automatic animation or "Step" to advance one step at a time. Watch how the algorithms process edges differently — Kruskal considers edges globally by weight, while Prim grows a tree from vertex 0.
"Random Graph" generates 6-8 vertices with a connected spanning tree plus extra random edges. The edge weights are proportional to Euclidean distance. This gives you a realistic graph to test on without manual construction.
Try building a graph where Kruskal and Prim process edges in very different orders but end up with the same MST. Then try building one where they produce different MSTs (same total weight). The key: tie-breaking. With distinct weights, both always produce the identical MST.
Build a "star" graph (one central vertex connected to all others) and observe that the MST is the star itself — every edge is a bridge (removing it disconnects the graph), so every edge must be in any spanning tree. Build a "ring" graph (cycle) and observe that the MST removes the single heaviest edge — the cycle property in action.
Build a complete graph on 4 vertices (K4) with weights 1, 2, 3, 4, 5, 6 and predict which 3 edges the MST will use before running the algorithm. The answer: the three lightest edges (1, 2, 3) IF they do not form a cycle. K4 has vertices {0,1,2,3} and 6 edges. Assign weight 1 to edge 0-1, weight 2 to edge 0-2, weight 3 to edge 1-2, weight 4 to edge 0-3, weight 5 to edge 1-3, weight 6 to edge 2-3. Kruskal adds 0-1(1), 0-2(2), then skips 1-2(3) because it forms a cycle, then adds 0-3(4). MST weight = 1+2+4 = 7.
| Test Case | Expected Behavior | What It Tests |
|---|---|---|
| Single vertex, no edges | Empty MST, weight 0 | Base case handling |
| Two vertices, one edge | That edge is the MST | Minimal case |
| Disconnected graph | MST impossible (or forest) | Connectivity check |
| All equal weights | Any spanning tree is MST | Tie-breaking |
| Negative weights | Negative edges always in MST | Negative weight handling |
| Self-loops | Always skipped (cannot be in tree) | Input validation |
| Parallel edges (multigraph) | Keep only lightest per pair | Duplicate handling |
MSTs are not just textbook exercises. They appear in network design, clustering, approximation algorithms, and even image segmentation. Here are the most important applications.
The original motivation: connecting cities with minimum cable. This generalizes to any infrastructure network — roads, power lines, water pipes, computer networks. The vertices are locations, edges are possible connections, weights are construction costs. The MST gives the cheapest network that connects everything.
In practice, real networks add redundancy (you do not want a single cable failure to partition your network). But the MST is the starting point: it tells you the minimum cost of connectivity, and any additional edges you add for redundancy cost extra.
Here is a beautiful idea: to partition n points into k clusters, compute the MST and delete the k-1 heaviest edges. The remaining k connected components are your clusters.
Why does this work? The MST connects nearby points with cheap edges and distant points with expensive edges. Deleting the most expensive edges breaks the graph at its natural "seams" — the largest gaps between clusters.
python def mst_cluster(points, k): """Cluster n points into k groups using MST.""" n = len(points) # Build complete graph edges edges = [] for i in range(n): for j in range(i + 1, n): dx = points[i][0] - points[j][0] dy = points[i][1] - points[j][1] edges.append(((dx*dx + dy*dy) ** 0.5, i, j)) # Find MST via Kruskal mst_edges = kruskal(n, edges) # Sort MST edges by weight (descending) mst_edges.sort(reverse=True) # Remove k-1 heaviest edges uf = UnionFind(n) for w, u, v in mst_edges[k - 1:]: uf.union(u, v) # Return cluster labels labels = [uf.find(i) for i in range(n)] return labels
The spacing of a k-clustering is the minimum distance between any two points in different clusters. The MST-based algorithm maximizes spacing: the k-1 edges we remove are the heaviest in the MST, so the minimum inter-cluster distance (= the (k-1)-th heaviest MST edge) is as large as possible. CLRS proves this optimality in Exercise 23.2-8.
We want to show: the MST-based k-clustering maximizes spacing over all possible k-clusterings.
Let C* be the MST-based clustering (remove k-1 heaviest MST edges). Let d* be its spacing (the weight of the (k-1)-th heaviest MST edge). Let C be any other k-clustering. We will show spacing(C) ≤ d*.
Since C* and C both have k clusters but the underlying vertex set is the same, there must be two points p, q that are in the same cluster in C* but in different clusters in C. Consider the MST path from p to q. All edges on this path have weight ≤ d* (because the path is within a single component of C*, and we only removed edges with weight > d*).
Since p and q are in different clusters in C, the MST path from p to q must cross a cluster boundary in C at some edge (u, v). This edge has weight ≤ d*. Therefore, points u and v are in different clusters in C with distance ≤ d*. So spacing(C) ≤ d*. QED.
The canvas shows scattered points. The MST connects them all. Use the slider to remove the heaviest edges and watch clusters emerge.
Slide to remove heavy edges. Colors show clusters.
The Travelling Salesman Problem (TSP) is NP-hard: find the shortest tour visiting every city exactly once. MSTs give a 2-approximation: compute the MST, do a DFS preorder walk, and the resulting tour is at most twice the optimal. This works because the MST weight is a lower bound on the optimal tour (any tour minus one edge is a spanning tree).
Felzenszwalb's algorithm (2004) uses a variant of MST for image segmentation. Treat each pixel as a vertex and neighboring pixels as edges weighted by color difference. Run a modified Kruskal: merge two components if the cheapest edge between them is "internal" (its weight is less than the minimum internal difference of either component). The result is a set of segments that correspond to visually coherent regions.
There is a third MST algorithm we have not covered: Boruvka's algorithm (1926, predates both Kruskal and Prim). Each component simultaneously picks its cheapest outgoing edge. All these edges are added in one round. Repeat until one component remains. Runtime: O(E log V). Boruvka is highly parallelizable — each round can be done in parallel — and is the basis for modern parallel MST algorithms.
A popular way to generate random mazes: create a grid graph where each cell is a vertex and walls are edges. Assign random weights to all walls. Compute the MST. The MST edges become passages (walls removed). The result is a perfect maze — exactly one path between any two cells, no loops, no unreachable areas. Different random weights produce different mazes. This works because the MST is a spanning tree: connected (you can reach any cell) and acyclic (no loops, so every path is unique).
The 2-approximation via MST can be improved. Christofides' algorithm (1976) gives a 1.5-approximation for metric TSP:
This remained the best polynomial-time approximation for metric TSP for 45 years until Karlin, Klein, and Gharan (2020) improved it by an epsilon. MST is the foundation of both algorithms.
Consider 4 cities in a square: A(0,0), B(1,0), C(1,1), D(0,1). All six edge weights are the Euclidean distances: AB=1, BC=1, CD=1, DA=1, AC=√2≈1.41, BD=√2≈1.41.
MST: pick edges AB(1), BC(1), CD(1). Total = 3. This is the cheapest way to connect all cities.
Optimal TSP tour: A→B→C→D→A. Total = 1+1+1+1 = 4.
The MST-based 2-approximation: double the MST edges (A-B, B-C, C-D, C-B, B-A, A-B... wait, that creates a multigraph). Walk the doubled MST in DFS preorder: A→B→C→D. Shortcut back: D→A. Tour = 1+1+1+1 = 4. In this case, the approximation gives the optimal tour.
The guarantee: MST weight (3) ≤ OPT (4) ≤ 2 × MST weight (6). The actual tour is 4, well within the bound. For some pathological cases the tour can approach 2 × MST, but never exceed it.
MST problems are interview favorites because they test graph fundamentals, greedy reasoning, and Union-Find implementation all at once. Here is your cheat sheet and drill set.
| Algorithm | Time | Space | Data Structure | Best For |
|---|---|---|---|---|
| Kruskal | O(E log E) | O(V + E) | Union-Find | Sparse graphs, edge lists |
| Prim (binary) | O((V+E) log V) | O(V + E) | Binary heap | Dense with adj list |
| Prim (Fibonacci) | O(E + V log V) | O(V + E) | Fibonacci heap | Very dense graphs |
| Boruvka | O(E log V) | O(V + E) | Union-Find | Parallel MST |
LeetCode 1584. Given n points in 2D, connect all points with minimum total Manhattan distance. Every pair is a potential edge.
python def minCostConnectPoints(points): n = len(points) # Build edge list: O(n^2) edges edges = [] for i in range(n): for j in range(i + 1, n): d = abs(points[i][0] - points[j][0]) + abs(points[i][1] - points[j][1]) edges.append((d, i, j)) edges.sort() # Kruskal uf = UnionFind(n) total = 0 count = 0 for w, u, v in edges: if uf.union(u, v): total += w count += 1 if count == n - 1: break return total
LeetCode 1489. An edge is critical if removing it increases MST weight. An edge is pseudo-critical if it appears in some (but not all) MSTs.
python def findCriticalAndPseudoCritical(n, edges): # Add original index to each edge edges_idx = [(w, u, v, i) for i, (u, v, w) in enumerate(edges)] edges_idx.sort() base_weight = kruskal_weight(n, edges_idx) critical, pseudo = [], [] for i in range(len(edges_idx)): # Test critical: remove edge i, check if MST weight increases w_without = kruskal_weight(n, edges_idx[:i] + edges_idx[i+1:]) if w_without > base_weight: critical.append(edges_idx[i][3]) else: # Test pseudo-critical: force-include edge i, check if MST weight equals base w_with = kruskal_weight_force(n, edges_idx, i) if w_with == base_weight: pseudo.append(edges_idx[i][3]) return [critical, pseudo]
Common variant: "Find MST but edge (a, b) must be included." Force-include the edge first (Union a and b), then run Kruskal on the rest. If the result is disconnected, it is impossible.
python def mst_with_forced_edge(n, edges, forced_u, forced_v, forced_w): """MST that must include edge (forced_u, forced_v).""" uf = UnionFind(n) uf.union(forced_u, forced_v) # force-include this edge mst = [(forced_w, forced_u, forced_v)] edges.sort() for w, u, v in edges: if (u == forced_u and v == forced_v) or \ (u == forced_v and v == forced_u): continue # skip the forced edge (already added) if uf.union(u, v): mst.append((w, u, v)) if len(mst) == n - 1: break return mst
LeetCode 684. Given a tree with one extra edge (n vertices, n edges), find the edge that, when removed, leaves a tree. This is the edge that creates the cycle.
python def findRedundantConnection(edges): """edges = [[u, v], ...], 1-indexed""" n = len(edges) uf = UnionFind(n + 1) # 1-indexed for u, v in edges: if not uf.union(u, v): return [u, v] # this edge closes a cycle
LeetCode 1319. Given n computers and a list of connections, find the minimum number of operations to connect all computers. An operation moves one cable from an existing connection to a new one.
python def makeConnected(n, connections): if len(connections) < n - 1: return -1 # not enough cables uf = UnionFind(n) for u, v in connections: uf.union(u, v) return uf.count - 1 # need (components - 1) operations
LeetCode 721. Each account is a name + list of emails. Two accounts belong to the same person if they share any email. Merge all accounts by person. This is a Union-Find problem in disguise: emails are vertices, accounts define edges (all emails in one account are connected).
python from collections import defaultdict def accountsMerge(accounts): uf = UnionFind(len(accounts) * 10) # generous upper bound email_to_id = {} email_to_name = {} uid = 0 for account in accounts: name = account[0] for email in account[1:]: if email not in email_to_id: email_to_id[email] = uid uid += 1 email_to_name[email] = name uf.union(email_to_id[account[1]], email_to_id[email]) # Group emails by root groups = defaultdict(list) for email, eid in email_to_id.items(): groups[uf.find(eid)].append(email) return [[email_to_name[emails[0]]] + sorted(emails) for emails in groups.values()]
For completeness, here is Boruvka implemented. It is rarely asked in interviews but understanding it deepens your MST knowledge and it has the best parallel complexity.
python def boruvka(n, edges): """n = vertices, edges = [(w, u, v), ...]""" uf = UnionFind(n) mst = [] num_components = n while num_components > 1: # For each component, find cheapest outgoing edge cheapest = [None] * n # cheapest[root] = (w, u, v) for w, u, v in edges: ru, rv = uf.find(u), uf.find(v) if ru == rv: continue # same component if cheapest[ru] is None or w < cheapest[ru][0]: cheapest[ru] = (w, u, v) if cheapest[rv] is None or w < cheapest[rv][0]: cheapest[rv] = (w, u, v) # Add all cheapest edges (avoid duplicates) for i in range(n): if cheapest[i] is not None: w, u, v = cheapest[i] if uf.union(u, v): mst.append((w, u, v)) num_components -= 1 return mst
| Mistake | Why It's Wrong | How to Avoid |
|---|---|---|
| Confusing MST with shortest-path tree | MST minimizes total weight, not individual paths | Trace both on a small example; show they differ |
| Forgetting path compression in Union-Find | Correct but slow — O(n log n) instead of O(n α(n)) | Always write the two-line path compression in Find |
| Using DFS for cycle detection instead of Union-Find | Works but O(V+E) per edge check = O(VE) total | Union-Find gives O(α(n)) per check = O(E α(n)) total |
| Assuming MST is unique | Only unique when all edge weights are distinct | Note: same total weight always, but edges may differ |
| Running Prim on disconnected graph | Prim finds MST of one component only | Check if all vertices were extracted; if not, run Prim again from an unvisited vertex |
| Off-by-one: V edges instead of V-1 | Trees have V-1 edges, not V | Stop Kruskal at V-1 edges; check count before returning |
Beyond the standard MST, interviewers love these twists:
Degree-constrained MST: No vertex can have more than d edges. This is NP-hard for d ≥ 2. In interviews, mention it as a limitation of standard MST.
Euclidean MST: Points in 2D/3D, edges are Euclidean distances. The complete graph has O(n2) edges, but the Euclidean MST can be found in O(n log n) using the Delaunay triangulation (MST is a subgraph of the Delaunay triangulation, which has O(n) edges). This is the optimal approach for LeetCode 1584 when n is large.
Online MST: Edges arrive one at a time. Maintain MST dynamically. When a new edge arrives, add it; if it creates a cycle, remove the heaviest edge in the cycle. This maintains the MST invariant. With a link-cut tree, each update is O(log n).
These are properties you should be able to state and justify in an interview:
| Property | Statement | Proof Idea |
|---|---|---|
| Uniqueness | Distinct weights ⇒ unique MST | Cut property: light edge is unique, so every safe edge is forced |
| Weight multiset | All MSTs have the same sorted weight sequence | Matroid theory / exchange argument |
| Cut property | Light edge crossing a respecting cut is safe | Exchange: swap it in, swap out a heavier crossing edge |
| Cycle property | Heaviest edge in a cycle is not in MST (distinct weights) | Exchange: remove it, add a lighter cycle edge |
| Bottleneck | MST minimizes max-weight edge on paths | Any non-MST edge is heaviest in its fundamental cycle |
| V-1 edges | Every MST has exactly V-1 edges | It is a tree (connected + acyclic) |
| Adding non-MST edge | Creates exactly one cycle | Tree + one edge = exactly one cycle (tree property) |
| Max-weight edge | The max-weight MST edge = (k-1)th heaviest for k=2 clusters | Clustering interpretation: MST gives optimal 2-clustering |
When your MST code gives the wrong answer, check these in order:
if in_mst[u]: continue causes double-counting.| Signal in Problem | Algorithm / Approach |
|---|---|
| "Connect all nodes with minimum cost" | Direct MST (Kruskal or Prim) |
| "Minimum cost to make graph connected" | MST on components |
| "Redundant connection" / "find the extra edge" | Union-Find: edge that closes a cycle |
| "Number of connected components" | Union-Find (or BFS/DFS) |
| "Group into k clusters minimizing max inter-cluster distance" | MST + remove k-1 heaviest edges |
| "Can we connect all with cost ≤ budget?" | MST weight ≤ budget |
| "Which edges are critical / must-use?" | Try removing each edge, check MST weight |
When should you use which algorithm? Here is a decision tree for interviews:
This canvas shows a fixed graph. Drag the slider to change one edge's weight and watch the MST change in real time. See how a single weight change can restructure the entire MST.
Slide to change the weight of edge A-D. Watch which edges are in the MST.
| Dimension | What They Ask | Staff-Level Answer |
|---|---|---|
| Concept | "Prove Kruskal is correct" | Cut property + exchange argument. Show which cut Kruskal uses at each step. |
| Design | "Design a network optimizer for 1M nodes" | Kruskal with external sort + batched Union-Find. Discuss I/O complexity. |
| Code | "Implement Kruskal from scratch" | Union-Find with both optimizations, clean edge list, handle disconnected graphs. |
| Debug | "MST returns wrong weight" | Check: graph connected? Weights correct? Union-Find find() has path compression? Off-by-one in edge count? |
| Frontier | "How would you MST on a billion-node graph?" | Boruvka for parallelism. External-memory MST. Approximate MST via random sampling (Karger-Klein-Tarjan). |
MSTs sit at a crossroads of graph theory, greedy algorithms, and data structures. Here is how this chapter connects to the broader algorithmic landscape.
| Topic | Relationship to MST | Link |
|---|---|---|
| Graph Algorithms (Ch 22) | MST requires graph representation (adjacency lists) and graph traversal (BFS/DFS for connectivity). MST IS a graph algorithm. | Ch 22 lesson |
| Greedy (Ch 16) | MST algorithms are the canonical examples of greedy correctness. The cut property is the greedy-choice property for MST. Activity selection is to scheduling as Kruskal is to MST. | Ch 16 lesson |
| Shortest Paths (Ch 24) | Prim looks like Dijkstra with one key difference: Prim's key[v] = min edge weight to tree, Dijkstra's d[v] = total distance from source. Same algorithm structure, different relaxation. MST ≠ shortest-path tree. | Ch 24 lesson |
| Heaps (Ch 6) | Prim's efficiency depends on the priority queue. Binary heap gives O((V+E) log V). Fibonacci heap gives O(E + V log V). Understanding heaps is prerequisite to understanding Prim's complexity. | Ch 6 lesson |
This is the number one source of confusion and the most common interview follow-up. Consider a triangle with vertices A, B, C and edges A-B(1), B-C(1), A-C(3). The MST uses A-B(1) and B-C(1), total weight 2. The shortest path from A to C in the original graph is A-C(3), weight 3. But in the MST, the path from A to C goes A→B→C with weight 2. Sometimes the MST path is shorter, sometimes longer. The MST minimizes total wire, not individual paths.
Here is an important property that IS true: the MST path between any two vertices minimizes the bottleneck — the maximum edge weight along the path. The MST is a minimum bottleneck spanning tree. So while the MST path from u to v might be longer than the shortest path in total weight, the heaviest single edge on the MST path is as light as possible. This is why MSTs are useful in communication networks where the bottleneck link determines throughput.
if w(u,v) < key[v] (compare edge weight). Dijkstra: if d[u] + w(u,v) < d[v] (compare path length). This single difference — edge weight vs cumulative distance — produces fundamentally different trees.Matroid theory: MST can be generalized using matroids — a mathematical structure that captures exactly when greedy works. A matroid is a pair (S, I) where S is a finite set and I is a collection of "independent" subsets satisfying: (1) the empty set is independent, (2) subsets of independent sets are independent, (3) if A and B are independent and |A| < |B|, then some element of B can be added to A to keep it independent. The graphic matroid (where S = edges and independent sets = forests) explains MST. The greedy algorithm for matroids is exactly Kruskal's algorithm generalized.
Steiner trees: MST connects ALL vertices. What if you only need to connect a subset? This is the Steiner tree problem, and it is NP-hard. MST gives a 2-approximation for Steiner trees in general graphs. The problem arises in VLSI chip design (connect specific pins), network design (connect specific offices), and phylogenetics (connect specific species).
Dynamic MST: What if edges are added or deleted over time? Maintaining the MST dynamically is an active research area. The best known algorithms achieve O(log4 n) per update (Holm et al., 2001). For the insertion-only case (edges only added, never deleted), you can maintain the MST in O(α(n)) amortized per edge using the cycle property: when adding edge e, if it connects two components, add it. If it creates a cycle, remove the heaviest edge in the cycle (which might be e itself).
Randomized MST in O(V + E): Karger, Klein, and Tarjan (1995) showed that MST can be computed in expected O(V + E) time using random sampling. The algorithm: (1) sample edges with probability 1/2, (2) recursively find MST of the sample, (3) use the sample MST to identify "heavy" edges in the full graph that cannot be in the MST (an edge is heavy if it is the heaviest edge in its fundamental cycle with respect to the sample MST), (4) remove heavy edges and recurse. This is theoretically optimal but rarely used in practice due to high constant factors.
Minimum bottleneck spanning tree: Minimize the maximum-weight edge (rather than total weight). It turns out every MST is a minimum bottleneck spanning tree (but not vice versa). So computing the MST solves the bottleneck problem for free. This arises in network design where you want to minimize the worst link, not the total cost.
Second-best MST: Find the spanning tree with the smallest total weight that is NOT the MST. The algorithm: for each non-MST edge (u,v), find the heaviest edge on the MST path from u to v (using LCA queries in O(log V) after O(V) preprocessing). The second-best MST is obtained by swapping in the non-MST edge that gives the smallest increase: min over all non-MST edges of (w(u,v) - max-weight-on-MST-path(u,v)). Total time: O(V2) or O(E log V) with sparse table LCA.
| Domain | Vertices | Edges | Weight | Scale |
|---|---|---|---|---|
| Internet backbone | Routers | Physical links | Latency or cost | ~100K vertices |
| Power grid | Substations | Possible transmission lines | Construction cost | ~50K vertices |
| VLSI routing | Pins/pads | All pairs | Wire length | ~10M vertices |
| Phylogenetics | Species/genes | All pairs | Genetic distance | ~100K vertices |
| Social network clustering | Users | Connections | 1 / similarity | ~1B vertices |
At billion-vertex scale, no single-machine algorithm suffices. Distributed MST uses Boruvka rounds across machines (each machine handles a partition of edges, communicates cheapest crossing edges). MapReduce MST and GraphX (Spark) implement this pattern.
| Year | Who | What | Complexity |
|---|---|---|---|
| 1926 | Boruvka | First MST algorithm (for electrical network design) | O(E log V) |
| 1930 | Jarník | Prim's algorithm (independently discovered) | O(V2) |
| 1956 | Kruskal | Sort-and-merge approach | O(E log E) |
| 1957 | Prim | Rediscovery of Jarník's algorithm | O(V2) |
| 1975 | Tarjan | Union-Find with path compression + union by rank | O(m α(n)) |
| 1987 | Fredman & Tarjan | Fibonacci heaps → improved Prim | O(E + V log V) |
| 1995 | Karger, Klein, Tarjan | Randomized linear-time MST | O(V + E) expected |
| 2000 | Chazelle | Deterministic near-linear MST | O(E α(V)) |
The MST problem has driven fundamental advances in algorithm design for nearly a century. Each decade brought new data structures (Union-Find, Fibonacci heaps, soft heaps) motivated by the quest for faster MST algorithms.
Is there a deterministic O(V + E) algorithm for MST? Karger-Klein-Tarjan achieves this in expected time (randomized). Chazelle's deterministic algorithm achieves O(E α(V)), which is NEARLY linear but not quite. Closing this gap — or proving a lower bound showing O(V + E) is impossible deterministically — remains open. It is one of the most elegant unsolved problems in algorithms.
| Concept | One-Sentence Summary |
|---|---|
| Spanning tree | Connected, acyclic subgraph hitting all vertices — exactly V-1 edges. |
| Cut property | The lightest edge crossing a cut respecting A is safe to add to the MST. |
| Cycle property | The heaviest edge in any cycle is NOT in the MST (distinct weights). |
| Kruskal | Sort edges, add if no cycle (Union-Find). O(E log E). Edge-centric. |
| Prim | Grow tree, add cheapest frontier edge (priority queue). O((V+E) log V). Vertex-centric. |
| Boruvka | All components simultaneously add cheapest outgoing edge. O(E log V). Parallelizable. |
| Union-Find | Track components with path compression + union by rank. O(α(n)) per op. |