Minimum spanning trees, shortest paths, and network flow. The three pillars of weighted graph optimization, each solved by elegant greedy or dynamic strategies.
You need to connect five cities with fiber-optic cable. Each possible link has a different installation cost. You want every city connected while spending as little as possible. Which links do you build?
Or consider GPS navigation. Every road segment has a travel time. You want the fastest route from your house to the airport. Which roads do you take?
These are weighted graph problems. The edges carry numbers -- costs, distances, capacities, times -- and we want to optimize some function of those numbers. Without weights, BFS finds shortest paths. With weights, we need entirely different algorithms.
Recall that the traveling salesman problem -- find the cheapest tour visiting every vertex -- is also a weighted graph problem. But no efficient algorithm exists for TSP. The fact that MST, shortest path, and network flow can be solved efficiently is worthy of deep respect.
A spanning tree of a graph G = (V, E) is a subset of edges forming a tree that connects all vertices. Every connected graph has at least one spanning tree. For edge-weighted graphs, we want the minimum spanning tree (MST) -- the spanning tree whose total edge weight is smallest.
MSTs are the answer whenever we need to connect a set of points by the smallest total amount of roadway, wire, or pipe. Any tree is the smallest connected subgraph in terms of edge count (exactly n - 1 edges), while the MST is the smallest in terms of total weight.
Key facts about MSTs:
| Property | Detail |
|---|---|
| Edges | Exactly n - 1 for n vertices |
| Uniqueness | Unique if all edge weights are distinct |
| No cycles | Adding any non-tree edge creates exactly one cycle |
| Cut property | The lightest edge crossing any cut must be in the MST |
Two classic greedy algorithms find MSTs: Prim's (grow one tree from a starting vertex) and Kruskal's (merge small trees by picking cheapest edges globally). Both are correct because greedy works here -- the lightest safe edge is always the right choice.
Prim's algorithm grows an MST one edge at a time from a starting vertex, always adding the cheapest edge that connects a tree vertex to a non-tree vertex. It is a greedy algorithm: at every step, make the locally best choice.
Why is it correct? Suppose Prim's picks edge (x, y) and this is wrong -- there's some MST that doesn't include (x, y). That MST must have a path from x to y, and some edge (v1, v2) on that path crosses the tree boundary. But Prim's chose (x, y) over (v1, v2), so w(x, y) ≤ w(v1, v2). Swapping (v1, v2) for (x, y) gives a spanning tree no heavier than the MST. Contradiction.
python def prim(graph, n): in_tree = [False] * n dist = [float('inf')] * n parent = [-1] * n dist[0] = 0 for _ in range(n): # Pick closest non-tree vertex v = min((d, i) for i, d in enumerate(dist) if not in_tree[i])[1] in_tree[v] = True for (w, weight) in graph[v]: if not in_tree[w] and weight < dist[w]: dist[w] = weight parent[w] = v return parent
Kruskal's algorithm takes a different greedy approach. Instead of growing one tree, it builds up connected components by repeatedly picking the globally cheapest edge that doesn't create a cycle.
python def kruskal(edges, n): edges.sort(key=lambda e: e[2]) # sort by weight uf = UnionFind(n) mst = [] for (u, v, w) in edges: if uf.find(u) != uf.find(v): uf.union(u, v) mst.append((u, v, w)) if len(mst) == n - 1: break return mst
The key question for Kruskal's efficiency is: how quickly can we test whether two vertices are in the same component? A naive BFS/DFS check takes O(n) per edge, giving O(mn) total. The union-find data structure does it in nearly O(1) amortized.
Kruskal's algorithm needs to answer two questions repeatedly: "Are vertices u and v in the same component?" and "Merge the components containing u and v." The union-find data structure handles both efficiently.
Each element points to a parent. The root of each tree identifies the component. To find which component an element belongs to, follow parent pointers to the root. To union two components, make one root point to the other.
python class UnionFind: def __init__(self, n): self.parent = list(range(n)) self.size = [1] * n def find(self, x): while self.parent[x] != x: self.parent[x] = self.parent[self.parent[x]] # path compression x = self.parent[x] return x def union(self, a, b): a, b = self.find(a), self.find(b) if a == b: return if self.size[a] < self.size[b]: a, b = b, a # always attach smaller to larger self.parent[b] = a self.size[a] += self.size[b]
Two optimizations make union-find blazingly fast:
| Optimization | Idea | Effect |
|---|---|---|
| Union by size/rank | Attach the smaller tree under the larger root | Keeps tree height O(log n) |
| Path compression | On find, point every node directly at the root | Amortized nearly O(1) per operation |
MSTs connect everything cheaply. Now we want the shortest path from a source vertex s to all other vertices in a weighted graph. Dijkstra's algorithm is the method of choice.
The key insight: if the shortest path from s to t passes through vertex x, then it must use the shortest path from s to x as a prefix. Otherwise, we could shorten the s-to-t path by using a better s-to-x prefix. This optimal substructure means we can build shortest paths incrementally.
python def dijkstra(graph, start, n): dist = [float('inf')] * n parent = [-1] * n in_tree = [False] * n dist[start] = 0 for _ in range(n): # Pick closest unfinished vertex v = min((d, i) for i, d in enumerate(dist) if not in_tree[i])[1] in_tree[v] = True for (w, weight) in graph[v]: if dist[w] > dist[v] + weight: # RELAX dist[w] = dist[v] + weight parent[w] = v return dist, parent
Limitation: Dijkstra's only works with non-negative edge weights. A negative edge could invalidate a vertex we already marked as "finished." For negative weights (without negative cycles), use Bellman-Ford (O(nm)) instead.
Click "Step" to advance Dijkstra's algorithm from vertex 0. Numbers show current shortest distances. Warm = finalized, teal = being relaxed.
Complexity analysis: With the simple array implementation shown above, Dijkstra's runs in O(n2): we do n iterations, each scanning all n vertices to find the minimum. With a binary heap, we can extract the minimum in O(log n) and update distances in O(log n), giving O((n + m) log n). With a Fibonacci heap, O(m + n log n). For dense graphs (m ≈ n2), the simple array version is actually best.
MST variations: The MST framework adapts to several related problems. The maximum spanning tree is found by negating all edge weights and running Prim's. The minimum bottleneck spanning tree (minimizing the heaviest edge) is always a standard MST. The minimum product spanning tree is found by replacing weights with their logarithms, since log(ab) = log(a) + log(b).
Sometimes we need the shortest path between every pair of vertices -- not just from one source. Applications include finding the "center" of a graph (the vertex minimizing average distance to all others) or computing the graph's diameter (the longest shortest path).
We could run Dijkstra's from every vertex: O(n · n2) = O(n3). But Floyd-Warshall is an elegant dynamic programming alternative that also runs in O(n3) but with beautifully tight loops.
The idea: number the vertices 1 to n. Define W[i, j, k] as the shortest path from i to j using only vertices {1, ..., k} as intermediaries. When k = 0, only direct edges are allowed. Each iteration allows one more possible intermediate vertex:
python def floyd_warshall(W, n): # W[i][j] = edge weight, or inf if no edge for k in range(n): for i in range(n): for j in range(n): through_k = W[i][k] + W[k][j] if through_k < W[i][j]: W[i][j] = through_k return W
| Algorithm | Problem | Time | Best For |
|---|---|---|---|
| Dijkstra | Single-source | O(n2) or O(m log n) | One source, non-negative weights |
| Bellman-Ford | Single-source | O(nm) | Negative weights (no neg. cycles) |
| Floyd-Warshall | All-pairs | O(n3) | Dense graphs, all-pairs queries |
| n × Dijkstra | All-pairs | O(n3) or O(nm log n) | Sparse graphs, all-pairs queries |
Think of a weighted graph as a network of pipes. Each edge's weight is the pipe's capacity -- how much fluid it can carry. Given a source s and a sink t, what is the maximum amount of flow we can push from s to t?
The max-flow min-cut theorem says the maximum flow equals the minimum cut -- the smallest total capacity of edges whose removal disconnects s from t. This is a deep and beautiful result.
The key data structure is the residual graph R(G, f). For each edge (i, j) with capacity c and current flow f:
| Residual Edge | Condition | Meaning |
|---|---|---|
| (i, j) with weight c - f | c - f > 0 | Can push more flow forward |
| (j, i) with weight f | f > 0 | Can "undo" flow by pushing backward |
Bipartite matching is the most important application. Given a bipartite graph (jobs on one side, workers on the other), find the maximum matching -- the most job-worker pairs with no conflicts. Reduce it to max-flow: add a source connected to all jobs, a sink connected to all workers, all edges capacity 1. Maximum flow = maximum matching.
Watch Prim's and Kruskal's algorithms build minimum spanning trees on the same graph. Notice how they make different decisions but arrive at the same result.
Click "Step" to advance both algorithms. Warm = Prim's frontier, teal = Kruskal's next edge. Numbers on edges are weights.
Weighted graph algorithms are the workhorses of practical algorithm design. Here is how they connect to the rest of the book:
| Concept | Where It Leads |
|---|---|
| MST (Kruskal's) | Uses union-find, which appears in many other algorithms |
| Dijkstra's | Chapter 7: Heuristic search uses shortest-path ideas (A*) |
| Floyd-Warshall | Chapter 8: A beautiful example of dynamic programming on graphs |
| Network Flow | Chapter 9: Many NP-hard problems can be modeled as flow if structure allows |
| Graph Modeling | The lesson "design graphs, not algorithms" applies everywhere |