Skiena, Chapter 6

Weighted Graph Algorithms

Minimum spanning trees, shortest paths, and network flow. The three pillars of weighted graph optimization, each solved by elegant greedy or dynamic strategies.

Prerequisites: Chapter 5 (BFS, DFS, graph representations). That's it.
10
Chapters
5+
Simulations
10
Quizzes

Chapter 0: Why Weights Matter

You need to connect five cities with fiber-optic cable. Each possible link has a different installation cost. You want every city connected while spending as little as possible. Which links do you build?

Or consider GPS navigation. Every road segment has a travel time. You want the fastest route from your house to the airport. Which roads do you take?

These are weighted graph problems. The edges carry numbers -- costs, distances, capacities, times -- and we want to optimize some function of those numbers. Without weights, BFS finds shortest paths. With weights, we need entirely different algorithms.

The core trio: This chapter covers three fundamental optimization problems on weighted graphs: minimum spanning trees (connect everything cheaply), shortest paths (find the cheapest route), and network flow (push as much as possible through pipes). Remarkably, all three have efficient polynomial-time solutions.

Recall that the traveling salesman problem -- find the cheapest tour visiting every vertex -- is also a weighted graph problem. But no efficient algorithm exists for TSP. The fact that MST, shortest path, and network flow can be solved efficiently is worthy of deep respect.

Why can't we just use BFS for shortest paths in weighted graphs?

Chapter 1: Minimum Spanning Trees

A spanning tree of a graph G = (V, E) is a subset of edges forming a tree that connects all vertices. Every connected graph has at least one spanning tree. For edge-weighted graphs, we want the minimum spanning tree (MST) -- the spanning tree whose total edge weight is smallest.

MSTs are the answer whenever we need to connect a set of points by the smallest total amount of roadway, wire, or pipe. Any tree is the smallest connected subgraph in terms of edge count (exactly n - 1 edges), while the MST is the smallest in terms of total weight.

Why trees? A tree on n vertices has exactly n - 1 edges -- the minimum needed to keep everything connected. Adding any edge creates a cycle (redundant). Removing any edge disconnects the graph (not enough). MSTs give us connectivity at minimum cost with zero redundancy.

Key facts about MSTs:

PropertyDetail
EdgesExactly n - 1 for n vertices
UniquenessUnique if all edge weights are distinct
No cyclesAdding any non-tree edge creates exactly one cycle
Cut propertyThe lightest edge crossing any cut must be in the MST

Two classic greedy algorithms find MSTs: Prim's (grow one tree from a starting vertex) and Kruskal's (merge small trees by picking cheapest edges globally). Both are correct because greedy works here -- the lightest safe edge is always the right choice.

A graph has n vertices. How many edges does any spanning tree of this graph have?

Chapter 2: Prim's Algorithm

Prim's algorithm grows an MST one edge at a time from a starting vertex, always adding the cheapest edge that connects a tree vertex to a non-tree vertex. It is a greedy algorithm: at every step, make the locally best choice.

Start
Pick any vertex. It's now in the tree by itself.
Grow
Find the cheapest edge from a tree vertex to a non-tree vertex. Add it.
Repeat
Keep growing until all vertices are in the tree.

Why is it correct? Suppose Prim's picks edge (x, y) and this is wrong -- there's some MST that doesn't include (x, y). That MST must have a path from x to y, and some edge (v1, v2) on that path crosses the tree boundary. But Prim's chose (x, y) over (v1, v2), so w(x, y) ≤ w(v1, v2). Swapping (v1, v2) for (x, y) gives a spanning tree no heavier than the MST. Contradiction.

python
def prim(graph, n):
    in_tree = [False] * n
    dist = [float('inf')] * n
    parent = [-1] * n
    dist[0] = 0
    for _ in range(n):
        # Pick closest non-tree vertex
        v = min((d, i) for i, d in enumerate(dist) if not in_tree[i])[1]
        in_tree[v] = True
        for (w, weight) in graph[v]:
            if not in_tree[w] and weight < dist[w]:
                dist[w] = weight
                parent[w] = v
    return parent
Complexity: As implemented above with a simple array scan, Prim's runs in O(n2). Using a priority queue (binary heap), it improves to O((n + m) log n). With a Fibonacci heap, O(m + n log n). For dense graphs (m ≈ n2), the simple O(n2) version is fine.
In Prim's algorithm, what determines which edge to add next?

Chapter 3: Kruskal's Algorithm

Kruskal's algorithm takes a different greedy approach. Instead of growing one tree, it builds up connected components by repeatedly picking the globally cheapest edge that doesn't create a cycle.

Sort
Sort all edges by weight, lightest first.
Scan
Consider edges in sorted order. If an edge connects two different components, add it. If both endpoints are in the same component, skip it (would create a cycle).
Stop
After adding n - 1 edges, you have the MST.
python
def kruskal(edges, n):
    edges.sort(key=lambda e: e[2])  # sort by weight
    uf = UnionFind(n)
    mst = []
    for (u, v, w) in edges:
        if uf.find(u) != uf.find(v):
            uf.union(u, v)
            mst.append((u, v, w))
            if len(mst) == n - 1:
                break
    return mst
Prim vs Kruskal: Prim's is O(n2) with arrays or O(m log n) with heaps -- better for dense graphs. Kruskal's is O(m log m) dominated by the sort -- better for sparse graphs. Both produce the same MST when edge weights are distinct. Use whichever fits your graph better.

The key question for Kruskal's efficiency is: how quickly can we test whether two vertices are in the same component? A naive BFS/DFS check takes O(n) per edge, giving O(mn) total. The union-find data structure does it in nearly O(1) amortized.

Kruskal's algorithm adds the globally cheapest edge that doesn't create a cycle. Why does this greedy strategy produce an optimal MST?

Chapter 4: Union-Find

Kruskal's algorithm needs to answer two questions repeatedly: "Are vertices u and v in the same component?" and "Merge the components containing u and v." The union-find data structure handles both efficiently.

Each element points to a parent. The root of each tree identifies the component. To find which component an element belongs to, follow parent pointers to the root. To union two components, make one root point to the other.

python
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n

    def find(self, x):
        while self.parent[x] != x:
            self.parent[x] = self.parent[self.parent[x]]  # path compression
            x = self.parent[x]
        return x

    def union(self, a, b):
        a, b = self.find(a), self.find(b)
        if a == b: return
        if self.size[a] < self.size[b]:
            a, b = b, a  # always attach smaller to larger
        self.parent[b] = a
        self.size[a] += self.size[b]

Two optimizations make union-find blazingly fast:

OptimizationIdeaEffect
Union by size/rankAttach the smaller tree under the larger rootKeeps tree height O(log n)
Path compressionOn find, point every node directly at the rootAmortized nearly O(1) per operation
How tall can the tree get? With union by size, the only way to increase height is to merge two equal-height trees. A height-1 tree has 1 node. A height-2 tree needs at least 2 nodes. Height-3 needs at least 4 nodes. You must double nodes to gain one level of height, so height is at most log2 n. Both find and union run in O(log n). With path compression added, the amortized cost drops to O(α(n)), where α is the inverse Ackermann function -- effectively constant.
Why does union-by-size keep the tree height at O(log n)?

Chapter 5: Dijkstra's Algorithm

MSTs connect everything cheaply. Now we want the shortest path from a source vertex s to all other vertices in a weighted graph. Dijkstra's algorithm is the method of choice.

The key insight: if the shortest path from s to t passes through vertex x, then it must use the shortest path from s to x as a prefix. Otherwise, we could shorten the s-to-t path by using a better s-to-x prefix. This optimal substructure means we can build shortest paths incrementally.

Initialize
Set dist[s] = 0, all others = ∞. Mark all vertices as unfinished.
Select
Pick the unfinished vertex v with smallest dist[v]. Mark it finished.
Relax
For each neighbor w of v: if dist[v] + w(v,w) < dist[w], update dist[w].
Repeat
Continue until all vertices are finished (or target is found).
python
def dijkstra(graph, start, n):
    dist = [float('inf')] * n
    parent = [-1] * n
    in_tree = [False] * n
    dist[start] = 0
    for _ in range(n):
        # Pick closest unfinished vertex
        v = min((d, i) for i, d in enumerate(dist) if not in_tree[i])[1]
        in_tree[v] = True
        for (w, weight) in graph[v]:
            if dist[w] > dist[v] + weight:  # RELAX
                dist[w] = dist[v] + weight
                parent[w] = v
    return dist, parent
Dijkstra vs Prim -- nearly identical! Compare the code above to Prim's. The only difference is the relaxation condition. Prim's asks: "Is this edge cheaper than any edge we've seen to w?" (considers edge weight alone). Dijkstra's asks: "Is this path through v cheaper than any path we've found to w?" (considers total path cost). Three lines of code difference, completely different problems solved.

Limitation: Dijkstra's only works with non-negative edge weights. A negative edge could invalidate a vertex we already marked as "finished." For negative weights (without negative cycles), use Bellman-Ford (O(nm)) instead.

Dijkstra's Shortest Path

Click "Step" to advance Dijkstra's algorithm from vertex 0. Numbers show current shortest distances. Warm = finalized, teal = being relaxed.

Click Step to begin Dijkstra from vertex 0

Complexity analysis: With the simple array implementation shown above, Dijkstra's runs in O(n2): we do n iterations, each scanning all n vertices to find the minimum. With a binary heap, we can extract the minimum in O(log n) and update distances in O(log n), giving O((n + m) log n). With a Fibonacci heap, O(m + n log n). For dense graphs (m ≈ n2), the simple array version is actually best.

MST variations: The MST framework adapts to several related problems. The maximum spanning tree is found by negating all edge weights and running Prim's. The minimum bottleneck spanning tree (minimizing the heaviest edge) is always a standard MST. The minimum product spanning tree is found by replacing weights with their logarithms, since log(ab) = log(a) + log(b).

What is the key difference between Prim's and Dijkstra's algorithms?

Chapter 6: All-Pairs Shortest Paths

Sometimes we need the shortest path between every pair of vertices -- not just from one source. Applications include finding the "center" of a graph (the vertex minimizing average distance to all others) or computing the graph's diameter (the longest shortest path).

We could run Dijkstra's from every vertex: O(n · n2) = O(n3). But Floyd-Warshall is an elegant dynamic programming alternative that also runs in O(n3) but with beautifully tight loops.

The idea: number the vertices 1 to n. Define W[i, j, k] as the shortest path from i to j using only vertices {1, ..., k} as intermediaries. When k = 0, only direct edges are allowed. Each iteration allows one more possible intermediate vertex:

W[i, j]k = min(W[i, j]k-1, W[i, k]k-1 + W[k, j]k-1)
python
def floyd_warshall(W, n):
    # W[i][j] = edge weight, or inf if no edge
    for k in range(n):
        for i in range(n):
            for j in range(n):
                through_k = W[i][k] + W[k][j]
                if through_k < W[i][j]:
                    W[i][j] = through_k
    return W
Transitive closure: Floyd-Warshall also computes transitive closure -- which vertices can reach which. If W[i][j] is still ∞ after the algorithm, there's no path from i to j. This is useful for reachability analysis in directed graphs.
AlgorithmProblemTimeBest For
DijkstraSingle-sourceO(n2) or O(m log n)One source, non-negative weights
Bellman-FordSingle-sourceO(nm)Negative weights (no neg. cycles)
Floyd-WarshallAll-pairsO(n3)Dense graphs, all-pairs queries
n × DijkstraAll-pairsO(n3) or O(nm log n)Sparse graphs, all-pairs queries
Floyd-Warshall is essentially the same time complexity as running Dijkstra's from every vertex. Why would you prefer it?

Chapter 7: Network Flow

Think of a weighted graph as a network of pipes. Each edge's weight is the pipe's capacity -- how much fluid it can carry. Given a source s and a sink t, what is the maximum amount of flow we can push from s to t?

The max-flow min-cut theorem says the maximum flow equals the minimum cut -- the smallest total capacity of edges whose removal disconnects s from t. This is a deep and beautiful result.

Augmenting paths: The Ford-Fulkerson method repeatedly finds a path from s to t with remaining capacity (an augmenting path) and pushes flow along it. The flow is optimal when no augmenting path exists. Using BFS to find augmenting paths (Edmonds-Karp) guarantees O(nm2) total time, or O(n3) augmentations.

The key data structure is the residual graph R(G, f). For each edge (i, j) with capacity c and current flow f:

Residual EdgeConditionMeaning
(i, j) with weight c - fc - f > 0Can push more flow forward
(j, i) with weight ff > 0Can "undo" flow by pushing backward

Bipartite matching is the most important application. Given a bipartite graph (jobs on one side, workers on the other), find the maximum matching -- the most job-worker pairs with no conflicts. Reduce it to max-flow: add a source connected to all jobs, a sink connected to all workers, all edges capacity 1. Maximum flow = maximum matching.

Design graphs, not algorithms: Skiena's key lesson. Many problems can be solved by reducing them to known graph problems. Maximum spanning tree? Negate weights, run Prim's. Bipartite matching? Build a flow network. The secret is learning to model your problem as a graph, then use a standard algorithm.
The max-flow min-cut theorem states that:

Chapter 8: The MST Builder

Watch Prim's and Kruskal's algorithms build minimum spanning trees on the same graph. Notice how they make different decisions but arrive at the same result.

Prim's vs Kruskal's MST

Click "Step" to advance both algorithms. Warm = Prim's frontier, teal = Kruskal's next edge. Numbers on edges are weights.

Click Step to begin
What to notice: Prim's grows a single connected tree outward from vertex 0, always choosing the cheapest edge to a new vertex. Kruskal's considers edges globally by weight, adding them if they connect different components. Both build the same MST when edge weights are distinct -- the greedy choice is always safe for spanning trees.
Both Prim's and Kruskal's are greedy. When does the greedy approach fail for graph optimization?

Chapter 9: Connections

Weighted graph algorithms are the workhorses of practical algorithm design. Here is how they connect to the rest of the book:

ConceptWhere It Leads
MST (Kruskal's)Uses union-find, which appears in many other algorithms
Dijkstra'sChapter 7: Heuristic search uses shortest-path ideas (A*)
Floyd-WarshallChapter 8: A beautiful example of dynamic programming on graphs
Network FlowChapter 9: Many NP-hard problems can be modeled as flow if structure allows
Graph ModelingThe lesson "design graphs, not algorithms" applies everywhere
Skiena's take-home lessons from Chapter 6:
• Most graph applications reduce to standard problems: MST, shortest paths, flow, matching. Learn to recognize them.
• Prim's and Dijkstra's are nearly identical algorithms solving different problems. The difference is one line of code: edge weight vs. total path distance.
• Union-find is a surprisingly powerful data structure that makes Kruskal's algorithm efficient.
• The max-flow min-cut theorem connects seemingly different problems (flow and connectivity).
• Design graphs, not algorithms. Model your real-world problem as a graph, then use a standard algorithm.
You need to connect 100 cities with fiber optic cable at minimum cost. Which algorithm do you use?