Floyd-Warshall, Johnson's — finding shortest paths between every pair of vertices.
You run a logistics company with 6 warehouses. Every morning, you need to know: what is the cheapest shipping cost between every pair of warehouses? Not just from one hub — between ALL of them simultaneously. That is the all-pairs shortest paths problem.
One approach is obvious: run a single-source algorithm (like Dijkstra) from every vertex. With V vertices and E edges, running Dijkstra V times with a binary heap costs O(V(V + E) log V). For dense graphs where E = Θ(V²), that is O(V³ log V). Can we shave off that log V factor?
Yes. Floyd-Warshall solves all-pairs shortest paths in exactly Θ(V³) time — no log factor — using an elegant dynamic programming idea. For sparse graphs, Johnson's algorithm does even better: O(V² log V + VE), which beats Floyd-Warshall when E is much smaller than V².
The input is a weighted directed graph G = (V, E) with weight function w. We represent it as an adjacency matrix W where:
The output is a distance matrix D where D[i][j] = δ(i, j), the shortest-path weight from vertex i to vertex j. We may also produce a predecessor matrix Π where Π[i][j] is the predecessor of j on a shortest path from i — this lets us reconstruct the actual paths.
The simulation below lets you see this transformation. On the left is a weighted graph. On the right is the distance matrix — initially just the adjacency matrix W. Your job: figure out the true shortest distances.
The graph on the left has 5 vertices. The matrix on the right shows current best-known distances (starting with direct edge weights). Hover over a cell to highlight the corresponding path on the graph. Click "Solve" to fill in all shortest distances.
Hover over matrix cells to see paths.You absolutely can. And for sparse graphs with non-negative weights, that is often the best approach. But consider the trade-offs:
| Approach | Time Complexity | Dense (E = V²) | Sparse (E = V) | Neg. Weights? |
|---|---|---|---|---|
| Dijkstra × V (binary heap) | O(V(V+E) log V) | O(V³ log V) | O(V² log V) | No |
| Bellman-Ford × V | O(V²E) | O(V4) | O(V³) | Yes |
| Matrix multiplication | O(V³ log V) | O(V³ log V) | O(V³ log V) | Yes |
| Floyd-Warshall | Θ(V³) | Θ(V³) | Θ(V³) | Yes |
| Johnson's | O(V² log V + VE) | O(V³ log V) | O(V² log V) | Yes |
For dense graphs, Floyd-Warshall's Θ(V³) beats Dijkstra × V by a log V factor. For sparse graphs with negative edges, Johnson's is the clear winner. Knowing which to pick is an essential interview skill.
All three approaches produce the same two outputs:
Think of D as the "price list" and Π as the "route map." Together, they answer both "how much?" and "which way?" for every pair of vertices.
The distance matrix D has V² entries. For a graph with 1000 vertices, that is a million numbers. Computing each one requires examining potentially all V other vertices as intermediates. The fundamental question of this chapter: can we fill in all V² entries faster than running V separate single-source algorithms?
Let us build intuition with a small example before diving into algorithms. Suppose you have 5 warehouses (vertices 1-5) with the following shipping costs:
Even with just 5 vertices, finding all shortest paths by hand is tedious. The path from 1 to 3 is not the direct route (cost 8) — it is 1 → 2 → 4 → 3 (cost 3 + 1 + (-5) = -1). The negative edge through warehouse 4 creates a shortcut. This is exactly the kind of non-obvious result that all-pairs algorithms reveal.
Let us try to count the paths manually. From vertex 1 to vertex 3:
The winner is 1 → 2 → 4 → 3 with cost -1. Finding this required checking multiple multi-hop paths and recognizing that the -5 edge makes a longer path (3 edges) cheaper than the direct edge. For V = 100, the number of possible paths explodes combinatorially. We need algorithms.
All-pairs algorithms typically work with adjacency matrices (2D arrays) rather than adjacency lists. This makes sense: the output is a V × V matrix, so the algorithms already need O(V²) space. The input adjacency matrix uses O(V²) space too, which is fine for dense graphs.
For sparse graphs with E ≪ V², adjacency lists use only O(V + E) space. Johnson's algorithm takes advantage of this: it stores the graph as adjacency lists and only builds the V × V distance matrix at the end.
| Representation | Space | Lookup edge (u,v)? | Iterate neighbors of u? | Best for |
|---|---|---|---|---|
| Adjacency matrix | O(V²) | O(1) | O(V) | Dense, Floyd-Warshall |
| Adjacency list | O(V + E) | O(degree(u)) | O(degree(u)) | Sparse, Johnson's/Dijkstra |
Before we get to Floyd-Warshall, there is a beautiful connection between shortest paths and matrix multiplication. It is not the fastest algorithm, but it reveals deep structure that shows up in competitive programming and theory.
In regular matrix multiplication, we compute C = A × B where C[i][j] = ∑k A[i][k] · B[k][j]. Now imagine replacing the sum (∑) with min, and the product (·) with addition (+). This gives us a new operation:
This says: the best way to get from i to j is to go from i to some intermediate vertex k (costing L[i][k]), then take the direct edge from k to j (costing W[k][j]). Minimize over all choices of k.
If L(m)[i][j] is the weight of the shortest path from i to j using at most m edges, then:
where ⊗ is our "min-plus" matrix multiplication. Computing L(1), L(2), ..., L(n-1) one at a time takes n-1 matrix multiplications, each costing O(V³). Total: O(V4).
But here is the trick: we do not need every intermediate matrix. Since shortest path weights satisfy L(2m) = L(m) ⊗ L(m) (just like regular squaring), we can compute L(1), L(2), L(4), L(8), ..., L(2⌈log(n-1)⌉). That is only ⌈log(n-1)⌉ multiplications instead of n-1. Total: O(V³ log V).
Consider a 4-vertex graph with adjacency matrix W:
Let us compute L(2) = W ⊗ W. For cell L(2)[1][3]:
Computing the full L(2) matrix:
Already in L(2), we see paths of up to 2 edges. The entry L(2)[4][2] = -1 corresponds to the path 4 → 3 → 2 (cost -5 + 4 = -1). One more squaring gives L(4), which equals the final distance matrix since no shortest path needs more than 3 edges in a 4-vertex graph.
Watch the matrix L evolve from L(1) = W toward the final distance matrix. Each step computes L(2m) = L(m) ⊗ L(m) (repeated squaring). Click "Step" to advance one squaring.
L(1) = W. Click Step for L(2).python import math def min_plus_multiply(A, B, n): """Min-plus matrix multiplication: C[i][j] = min_k(A[i][k] + B[k][j])""" C = [[math.inf] * n for _ in range(n)] for i in range(n): for j in range(n): for k in range(n): C[i][j] = min(C[i][j], A[i][k] + B[k][j]) return C def all_pairs_repeated_squaring(W, n): """Compute all-pairs shortest paths via repeated squaring.""" L = [row[:] for row in W] # L^(1) = W m = 1 while m < n - 1: L = min_plus_multiply(L, L, n) # L^(2m) = L^(m) x L^(m) m *= 2 return L # = D, the distance matrix
Why do we call this "matrix multiplication"? Compare the two operations side by side:
| Standard | Min-Plus |
|---|---|
| C[i][j] = ∑k A[i][k] · B[k][j] | C[i][j] = mink{ A[i][k] + B[k][j] } |
| Combine with addition (∑) | Combine with min |
| Pair with multiplication (·) | Pair with addition (+) |
| Identity: I (1 on diagonal, 0 off) | Identity: D0 (0 on diagonal, ∞ off) |
| Zero: 0 absorbs under · | "Zero": ∞ absorbs under + |
The structural parallel is exact. Both are "multiply row by column and combine." The only difference is which operations play the role of (+, ×). This is why the theory of semirings unifies both: regular matrix multiplication uses the (∑, ×) semiring, while shortest paths use the (min, +) semiring.
Why repeated squaring works: Just as A8 = ((A2)2)2 in regular arithmetic, L(8) = ((L(2))⊗2)⊗2 in min-plus. Associativity of the operation guarantees this, and min-plus multiplication is associative (you can verify this from the definition).
The naive approach computes L(1), L(2), ..., L(n-1) sequentially. Each "multiplication" L(m) = L(m-1) ⊗ W costs O(V³), and we do n-1 of them. Total: O(V4).
Repeated squaring computes L(1), L(2), L(4), ..., stopping when the exponent exceeds n-1. We need only ⌈log2(n-1)⌉ multiplications. Each is still O(V³), so total: O(V³ log V).
For V = 1000: naive does ~999 multiplications (999 billion operations). Repeated squaring does ~10 multiplications (10 billion operations). A 100x speedup, yet still worse than Floyd-Warshall's 1 billion operations.
If we could compute min-plus matrix multiplication in O(Vω) time where ω < 3, then all-pairs shortest paths would be O(Vω log V). For regular matrix multiplication, ω < 2.372 (the current best). But min-plus does not have the same algebraic structure (no additive inverses), and nobody has achieved sub-cubic min-plus multiplication. Whether ωmin-plus = 3 or could be smaller is one of the biggest open questions in theoretical computer science.
Floyd-Warshall is one of the most elegant algorithms in all of computer science. Three nested loops, five lines of code, and it solves all-pairs shortest paths in Θ(V³) time. The key idea is deceptively simple: consider intermediate vertices one at a time.
Define d(k)[i][j] as the weight of the shortest path from i to j where all intermediate vertices are drawn from the set {1, 2, ..., k}. An intermediate vertex is any vertex on the path other than the endpoints i and j.
Base case: d(0)[i][j] = W[i][j]. With zero intermediate vertices allowed, we can only use direct edges.
Recursive case: when we introduce vertex k as a potential intermediate vertex, either:
Take the minimum:
The final answer is d(n)[i][j] = δ(i, j), since all n vertices are allowed as intermediates.
The recurrence references d(k-1), suggesting we need two matrices. But notice: when computing d(k)[i][j], we read d(k-1)[i][k] and d(k-1)[k][j]. The value d[i][k] does not change when we process row k (because d(k)[i][k] = min(d(k-1)[i][k], d(k-1)[i][k] + d(k-1)[k][k]) = d(k-1)[i][k] since d[k][k] = 0). Similarly, d[k][j] does not change. So we can safely update in place.
Graph edges: 1→2 (3), 1→3 (8), 2→3 (2), 2→4 (1), 3→2 (4), 4→1 (2), 4→3 (-5), 1→5 (7), 5→4 (6).
Initial matrix D(0) = W:
Let us trace k = 1. For every pair (i, j), check: is d[i][1] + d[1][j] < d[i][j]? Vertex 1 has outgoing edges to 2, 3, and 5. The only vertices that can reach 1 are vertex 4 (d[4][1] = 2). So we check pairs where i = 4:
Let us appreciate the beauty of this DP formulation. Most DP problems have states that grow exponentially. For shortest paths, you might expect the DP state to include the full set of vertices visited (as in the Travelling Salesman Problem). That would give O(2V · V) states — exponential.
Floyd-Warshall's genius is using {1, ..., k} as the state. We do not track which specific vertices are intermediates — just how many are "unlocked." Because we add vertices in order (1, then 2, then 3...), we only need V states, not 2V. This works because shortest paths have optimal substructure: if the shortest path uses intermediate vertex k, then the sub-paths from i to k and from k to j are themselves shortest paths using only intermediates from {1, ..., k-1}.
This is fundamentally different from the Travelling Salesman Problem, where the "no-repeat-visit" constraint forces you to track the exact set of visited vertices. TSP is NP-hard; all-pairs shortest paths is polynomial. The difference is exactly this: optimal substructure with ordered intermediates vs. optimal substructure with subset-based states.
Both Floyd-Warshall and Bellman-Ford are dynamic programming algorithms for shortest paths. But they use different DP states:
| Algorithm | DP State | Transition | Dimensions |
|---|---|---|---|
| Bellman-Ford | d[v] after m edge relaxations | Relax every edge, V-1 times | One source, all destinations |
| Floyd-Warshall | d[i][j] with intermediates {1, ..., k} | Try vertex k as intermediate | All sources, all destinations |
Bellman-Ford asks: "Can we reach v in at most m edges?" Floyd-Warshall asks: "Can we reach j from i using only the first k vertices as waypoints?" Both converge to the true shortest path, but via completely different relaxation strategies.
This also explains why Floyd-Warshall naturally handles all pairs while Bellman-Ford handles only single-source: Floyd-Warshall's DP state already includes both i and j, so it computes all pairs "for free" as part of its DP table.
Let us trace every pass. This is the kind of hand-calculation that shows up on exams and cements understanding.
k = 2 (allow vertex 2 as intermediate): Which pairs benefit from routing through vertex 2? Any pair (i, j) where d[i][2] + d[2][j] < d[i][j]. Vertex 2 connects to 3 (cost 2) and 4 (cost 1).
k = 3 (allow vertex 3 as intermediate): Vertex 3 connects to vertex 2 (cost 4). Does any pair improve by routing through 3?
k = 4 (allow vertex 4 as intermediate): Vertex 4 connects to 1 (cost 2) and 3 (cost -5). This is where the negative edge creates havoc.
k = 5 (allow vertex 5 as intermediate): Vertex 5 has limited connectivity, so few updates occur.
Final distance matrix D(5):
Notice d[1][3] = -1. The cheapest path from warehouse 1 to warehouse 3 actually makes money (negative cost). The path is 1 → 2 → 4 → 3, exploiting the -5 subsidy from warehouse 4.
Watch the distance matrix update as k increases from 1 to 5. Orange cells have just been updated. The graph on the left highlights the current intermediate vertex k in warm orange.
k = 0 (initial). Click Step for k = 1.python def floyd_warshall(W, n): """All-pairs shortest paths in Theta(n^3). W: n x n adjacency matrix (0 on diagonal, inf for no edge). Returns distance matrix D.""" d = [row[:] for row in W] # copy W for k in range(n): # try each intermediate vertex for i in range(n): # for each source for j in range(n): # for each destination if d[i][k] + d[k][j] < d[i][j]: d[i][j] = d[i][k] + d[k][j] return d
That is the entire algorithm. Three nested loops, one comparison, one assignment. The outer loop (k) is the DP dimension — which vertices are allowed as intermediates. The inner loops (i, j) check every pair. Total: exactly V³ iterations.
Let us verify this code against our worked example. The input is the 5 × 5 adjacency matrix W from above. After the three nested loops complete, d should match our hand-computed D(5):
python import math INF = math.inf W = [ [0, 3, 8, INF, 7], # from vertex 0 (= "1") [INF, 0, 2, 1, INF], # from vertex 1 (= "2") [INF, 4, 0, INF, INF], # from vertex 2 (= "3") [2, INF, -5, 0, INF], # from vertex 3 (= "4") [INF, INF, INF, 6, 0], # from vertex 4 (= "5") ] D = floyd_warshall(W, 5) # D[0][2] = -1 (path 1→2→4→3, cost 3+1-5 = -1) # D[1][2] = -4 (path 2→4→3, cost 1-5 = -4) # D[3][1] = -1 (path 4→3→2, cost -5+4 = -1) print(D)
Note the 0-indexing in Python vs 1-indexing in the math. Vertex "1" in our example is index 0 in the array. The algorithm does not care about indexing — it just needs consistent indices.
This is such a common bug that it is worth demonstrating. Consider putting i outermost:
python # WRONG! This is NOT Floyd-Warshall for i in range(n): # i outermost for j in range(n): # j middle for k in range(n): # k innermost — WRONG! if d[i][k] + d[k][j] < d[i][j]: d[i][j] = d[i][k] + d[k][j]
This code asks: "For each pair (i, j), try all possible intermediate vertices k." That sounds right but is subtly wrong. When we check d[i][k], the value of d[i][k] may not yet be optimal because we haven't finished processing all intermediate vertices for that sub-path. Floyd-Warshall's correctness depends on processing intermediate vertices in a specific order (1, 2, 3, ..., n), which requires k to be the outer loop.
The wrong loop order sometimes gives correct answers by accident (especially on small graphs), making this bug incredibly hard to debug. In interviews, always triple-check: k outermost, i middle, j innermost.
A helpful mnemonic: "K Includes J" — k (outermost), i (middle), j (innermost). Or think of it as "intermediates first, then pairs." The k loop sets up the context (which intermediates are allowed), and the i, j loops fill in the answers given that context.
Some implementations use two separate matrices (d_old and d_new) to make the DP transition cleaner:
python def floyd_warshall_two_matrix(W, n): """Floyd-Warshall with explicit old/new matrices. Clearer but uses 2x memory.""" d = [row[:] for row in W] for k in range(n): d_new = [row[:] for row in d] # copy for i in range(n): for j in range(n): d_new[i][j] = min(d[i][j], d[i][k] + d[k][j]) d = d_new return d
This version is equivalent to the in-place version (as proven above) but makes the DP structure explicit: d_new[i][j] only reads from d (the previous iteration's values). The in-place version works because d[i][k] and d[k][j] happen to be invariant when we update d[i][j] — but if you are not confident in that proof, the two-matrix version is safer.
In practice, always use the in-place version. It halves memory usage and is just as fast (no extra allocation per iteration). But the two-matrix version is great for understanding and debugging.
Time: Three nested loops, each running n times. The body is O(1). Total: Θ(n³). No hidden constants, no log factors, no priority queues. Just a triple loop. The constant is about 1 comparison + 1 addition + 1 conditional assignment per iteration.
Space: Θ(n²) for the distance matrix. We update in place, so no extra matrix needed. With the predecessor matrix, it is still Θ(n²) — just two n×n arrays.
Comparison: Floyd-Warshall is simpler than running Dijkstra V times and handles negative edges. For dense graphs (E = Θ(V²)), it matches or beats all alternatives. For sparse graphs, Johnson's algorithm (Chapter 5) is better.
Exact operation count: For n vertices, Floyd-Warshall performs exactly n³ iterations of the inner loop. Each iteration does 1 addition and 1 comparison (and occasionally 1 assignment). So the total is about 2n³ arithmetic operations. For n = 100: 2,000,000 operations. For n = 1000: 2,000,000,000 operations (~2 seconds on a modern CPU at 109 ops/sec).
Floyd-Warshall gives us the shortest distances. But often we need the actual paths. Just like single-source algorithms maintain a predecessor array π, Floyd-Warshall maintains a predecessor matrix Π.
Π[i][j] stores the predecessor of j on a shortest path from i. Initialize it as:
During Floyd-Warshall, when we find a shorter path through k:
Why Π[k][j] and not just k? Because the shortest path from i to j through k might not go directly from k to j. It might go k → ... → j through several vertices. The predecessor of j on that sub-path is Π[k][j].
To recover the shortest path from i to j, follow the predecessor chain backwards:
python def reconstruct_path(pi, i, j): """Reconstruct shortest path from i to j using predecessor matrix.""" if pi[i][j] is None: return [] # no path exists path = [j] while path[-1] != i: path.append(pi[i][path[-1]]) path.reverse() return path
python def floyd_warshall_paths(W, n): """Floyd-Warshall with predecessor matrix for path reconstruction.""" d = [row[:] for row in W] pi = [[None] * n for _ in range(n)] # Initialize predecessors for i in range(n): for j in range(n): if i != j and W[i][j] < float('inf'): pi[i][j] = i # Main loop for k in range(n): for i in range(n): for j in range(n): if d[i][k] + d[k][j] < d[i][j]: d[i][j] = d[i][k] + d[k][j] pi[i][j] = pi[k][j] # NOT just k! return d, pi
Let us trace the predecessor matrix alongside the distance matrix for our 5-vertex graph. Initial Π(0):
When Floyd-Warshall updates d[1][4] = 4 (via 1 → 2 → 4), it sets Π[1][4] = Π[2][4] = 2. The predecessor of 4 on the shortest path from 1 is vertex 2 (the vertex just before 4).
When it later updates d[1][3] = -1 (via 1 → 2 → 4 → 3), it sets Π[1][3] = Π[4][3] = 4. The predecessor of 3 on the shortest path from 1 is vertex 4, NOT vertex 2 or vertex 1. To get the full path, follow the chain: Π[1][3] = 4, Π[1][4] = 2, Π[1][2] = 1. Path: 1 → 2 → 4 → 3.
Reconstructing a single path takes O(V) time in the worst case (the path might visit all V vertices). Reconstructing ALL V² paths takes O(V³) time total. This is dominated by the O(V³) Floyd-Warshall computation itself, so path reconstruction adds no asymptotic overhead.
Some implementations store the next vertex on the path instead of the predecessor. If nxt[i][j] = the vertex immediately after i on the shortest path from i to j, then path reconstruction becomes a forward trace:
python def reconstruct_forward(nxt, i, j): """Follow 'next' pointers from i to j.""" if nxt[i][j] is None: return [] # no path path = [i] while path[-1] != j: path.append(nxt[path[-1]][j]) return path
The update rule changes slightly: when d[i][k] + d[k][j] < d[i][j], set nxt[i][j] = nxt[i][k] (go toward k first). Both approaches are correct; the "next" version is often more intuitive for printing paths.
Using our 5-vertex graph from Chapter 2, after Floyd-Warshall completes, suppose D[1][4] = 4. How do we find the actual path?
Click any cell in the distance matrix to see the full shortest path from that row's vertex to that column's vertex. The path is highlighted on the graph and the predecessor chain is shown below.
Click a cell in the matrix to see its path.Path reconstruction produces undefined results when negative cycles exist. If d[i][j] = -∞ (the path from i to j passes through a negative cycle), the predecessor chain might loop forever. Always check for negative cycles before attempting path reconstruction.
python def safe_reconstruct(d, pi, i, j): """Reconstruct path, handling negative cycles and unreachable pairs.""" # Check for negative cycle for k in range(len(d)): if d[k][k] < 0: # negative cycle exists if d[i][k] < float('inf') and d[k][j] < float('inf'): return None # path goes through neg. cycle # Check unreachable if d[i][j] >= float('inf'): return [] # no path exists # Safe to reconstruct return reconstruct_path(pi, i, j)
Sometimes you do not care about shortest distances — you just want to know: is there any path from i to j? This is the transitive closure problem. It answers reachability for all pairs of vertices.
The transitive closure of a directed graph G is a boolean matrix T where T[i][j] = 1 if and only if there exists a path (of any length) from vertex i to vertex j.
We could run Floyd-Warshall and check which entries are not ∞. But there is a cleaner approach: replace the arithmetic operations. Instead of (min, +) we use (OR, AND):
In words: there is a path from i to j using intermediates {1, ..., k} if either (a) there was already a path using intermediates {1, ..., k-1}, or (b) there is a path from i to k AND a path from k to j, both using intermediates {1, ..., k-1}.
python def transitive_closure(adj, n): """Compute transitive closure using Warshall's algorithm. adj: n x n boolean adjacency matrix (True if edge exists). Returns n x n boolean reachability matrix.""" t = [[adj[i][j] or (i == j) for j in range(n)] for i in range(n)] for k in range(n): for i in range(n): for j in range(n): t[i][j] = t[i][j] or (t[i][k] and t[k][j]) return t
Time: Θ(V³). Space: Θ(V²). Same as Floyd-Warshall, but with boolean operations which are faster in practice (bit manipulation, no floating point).
Since the matrix entries are booleans, we can pack an entire row into a bitset (a single integer). Then t[i][j] = t[i][j] OR (t[i][k] AND t[k][j]) becomes a single bitwise OR of entire rows:
This processes an entire row in one operation, giving a practical speedup of 64x on 64-bit machines (processing 64 booleans per word).
Consider 5 vertices {A, B, C, D, E} with edges: A→B, B→C, B→D, D→A, D→C, E→D.
Initial reachability T(0) (TRUE on diagonal + direct edges):
k = A: Check if routing through A helps. For each pair (i, j): if i can reach A AND A can reach j, then i can reach j. A reaches B. D reaches A. E does not reach A. So D can now reach B (via D→A→B).
k = B: A reaches B. B reaches C and D. So A can now reach C (A→B→C) and A can reach D (A→B→D). Also D now reaches B (from previous step), so D can reach C (D→...→B→C) and D can reach D (already true).
k = D: This is where it gets interesting. A reaches D (from k=B). D reaches A, C, and now B. So A can reach A (A→...→D→A — a cycle!), A can reach C (already true), and E can reach A (E→D→A), E can reach B (E→D→...→B), E can reach C (E→D→C).
After all 5 passes, the final matrix shows: A, B, D form a strongly connected component (all can reach all). E can reach everything. C is a sink (can reach only itself).
Watch the reachability matrix fill in as k increases. Green cells are TRUE (path exists). Gray cells are FALSE. The current intermediate vertex k is highlighted on the graph.
k = 0 (initial). Click Step for k = 1.| Application | Vertices | Edges | Question |
|---|---|---|---|
| Database query optimization | Relations | Foreign keys | Can table A join to table B? |
| Dependency analysis | Modules/packages | Import statements | Does module A (transitively) depend on B? |
| Social networks | Users | Follow relationships | Can user A see user B's content (through follows)? |
| Compiler optimization | Variables | Assignments/aliases | Can variable x affect variable y? |
For large graphs, the bitwise optimization makes a significant practical difference:
python def transitive_closure_bitwise(n, edges): """Transitive closure using Python integers as bitsets. Each row is a single integer — O(n^3/w) where w = word size.""" # Initialize: bit j of row[i] = 1 iff edge i→j or i==j row = [0] * n for i in range(n): row[i] |= (1 << i) # self-reachability for u, v in edges: row[u] |= (1 << v) # Warshall's algorithm with bitwise OR for k in range(n): for i in range(n): if row[i] & (1 << k): # does i reach k? row[i] |= row[k] # merge k's reachability into i # Convert back to matrix return [[(row[i] >> j) & 1 == 1 for j in range(n)] for i in range(n)]
In Python, integers have arbitrary precision, so each row is stored as a single integer regardless of n. The |= operation processes all n bits at once (well, in chunks of 64 bits on a 64-bit machine). For n = 1000, this is ~15x faster than the boolean version.
Floyd-Warshall is Θ(V³) regardless of how many edges exist. For sparse graphs (E much less than V²), that is wasteful. Johnson's algorithm combines Bellman-Ford and Dijkstra to achieve O(V² log V + VE) — a huge win when the graph is sparse.
We want to run Dijkstra from every vertex (that is the fast part), but Dijkstra requires non-negative edges. Some edges might be negative. Can we make all edges non-negative without changing shortest paths?
Naive idea: add a large constant C to every edge weight. This does NOT work! It changes which paths are shortest. A path with 5 edges gains 5C, while a path with 2 edges gains only 2C. The path with fewer edges becomes relatively cheaper.
We need h(v) for each vertex such that w'(u, v) = w(u, v) + h(u) - h(v) ≥ 0 for all edges (u, v). Where do we get h?
The trick: add a new vertex s with zero-weight edges to every other vertex. Run Bellman-Ford from s. Set h(v) = δ(s, v) — the shortest-path weight from s to v.
Why does this guarantee non-negative reweighted edges? By the triangle inequality of shortest paths: δ(s, v) ≤ δ(s, u) + w(u, v). Rearranging: w(u, v) + δ(s, u) - δ(s, v) ≥ 0. That is exactly w'(u, v) ≥ 0.
Consider any path p from u to v: u = v0 → v1 → ... → vk = v. The reweighted cost is:
Every path from u to v has its weight shifted by the same amount h(u) - h(v). So the path that minimizes w(p) also minimizes w'(p). Shortest paths are preserved.
python import heapq, math def johnsons(graph, n): """Johnson's algorithm for all-pairs shortest paths. graph: dict of {u: [(v, w), ...]} adjacency list. Returns n x n distance matrix or None if negative cycle.""" # Step 1: Add auxiliary vertex s = n s = n aug = {v: lst[:] for v, lst in graph.items()} aug[s] = [(v, 0) for v in range(n)] # Step 2: Bellman-Ford from s h = [math.inf] * (n + 1) h[s] = 0 for _ in range(n): # n iterations (n+1 vertices) for u in aug: for v, w in aug[u]: if h[u] + w < h[v]: h[v] = h[u] + w # Check for negative cycle for u in aug: for v, w in aug[u]: if h[u] + w < h[v]: return None # negative cycle! # Step 3: Reweight edges rg = {v: [] for v in range(n)} for u in range(n): for v, w in graph[u]: rg[u].append((v, w + h[u] - h[v])) # Step 4: Dijkstra from every vertex D = [[math.inf] * n for _ in range(n)] for src in range(n): dist = [math.inf] * n dist[src] = 0 pq = [(0, src)] while pq: d_u, u = heapq.heappop(pq) if d_u > dist[u]: continue for v, w_prime in rg[u]: if dist[u] + w_prime < dist[v]: dist[v] = dist[u] + w_prime heapq.heappush(pq, (dist[v], v)) # Step 5: Undo reweighting for v in range(n): D[src][v] = dist[v] - h[src] + h[v] return D
Let us trace Johnson's on the same graph from Chapter 2: edges 1→2 (3), 1→3 (8), 2→3 (2), 2→4 (1), 3→2 (4), 4→1 (2), 4→3 (-5), 1→5 (7), 5→4 (6).
Step 1: Add vertex s with zero-weight edges to all vertices. s→1 (0), s→2 (0), s→3 (0), s→4 (0), s→5 (0).
Step 2: Bellman-Ford from s. Since all edges from s have weight 0, and the original graph has a -5 edge from 4→3, the shortest paths from s are:
Wait — let us be more careful. Bellman-Ford runs V iterations. After iteration 1, we have h(3) = min(0, h(4) + (-5)) = min(0, 0 + (-5)) = -5. After further iterations, no changes. The h values stabilize.
Step 3: Reweight. w'(u, v) = w(u, v) + h(u) - h(v):
Hmm, let me recalculate h more carefully. Running Bellman-Ford from s on the augmented graph requires multiple iterations to converge through all paths. After convergence: h = [0, 0, -5, 0, 0] is not correct. Let me trace again.
Actually h(3) = -5 only if we can reach 3 from s with cost -5. Path s→4→3 = 0 + (-5) = -5. Path s→3 = 0. So h(3) = -5. But then w'(3, 2) = 4 + (-5) - 0 = -1 < 0. The triangle inequality would not be satisfied if the shortest path was computed incorrectly.
The issue: s→2→4→3 = 0 + 1 + (-5) = -4. s→1→2→4→3 = 0 + 3 + 1 + (-5) = -1. The shortest is s→4→3 or more precisely checking all paths, h(3) = min(0, -5, -4, -1, ...) = -5 (via s→4→3). But then w'(3, 2) = 4 + h(3) - h(2) = 4 + (-5) - 0 = -1.
We need to check: does h satisfy δ(s, 3) ≤ δ(s, 2) + w(3, 2)? That is -5 ≤ 0 + 4 = 4. Not helpful. The triangle inequality says δ(s, v) ≤ δ(s, u) + w(u, v), which gives w'(u, v) = w(u, v) + h(u) - h(v) ≥ 0. For edge (3, 2): w'(3, 2) = 4 + (-5) - 0 = -1. But h(2) should actually be -1 because δ(s, 2) = min(0, -5+4) = -1 via s→4→3→2. Let us re-run Bellman-Ford properly:
NOW let us reweight: w'(u, v) = w(u, v) + h(u) - h(v):
All reweighted edges are ≥ 0. The negative edge disappeared! Now Dijkstra works on this reweighted graph. After running Dijkstra from each vertex and undoing the reweighting (d[u][v] = d'[u][v] - h(u) + h(v)), we get the same distance matrix as Floyd-Warshall.
Let us verify that reweighting preserves shortest paths. The shortest path from 1 to 3 in the original graph is 1 → 2 → 4 → 3 with cost 3 + 1 + (-5) = -1. In the reweighted graph, this path costs 4 + 0 + 0 = 4. The telescope formula says: w'(path) = w(path) + h(1) - h(3) = -1 + 0 - (-5) = 4. Correct!
After Dijkstra computes d'[1][3] = 4 in the reweighted graph, we recover the true distance: d[1][3] = d'[1][3] - h(1) + h(3) = 4 - 0 + (-5) = -1. This matches our earlier Floyd-Warshall result.
Watch the three phases: (1) Bellman-Ford computes h values, (2) edges are reweighted (all become non-negative), (3) Dijkstra runs from each vertex. Use the buttons to step through each phase.
Phase 1: Bellman-Ford. Click Step.Bellman-Ford: O(VE) for the augmented graph. This is the "one-time cost" of computing the potential function.
Reweighting: O(E). A single pass over all edges.
Dijkstra × V: O(V(V + E) log V) with a binary heap, or O(V² log V + VE) with a Fibonacci heap.
Total: O(V² log V + VE). For sparse graphs (E = O(V)), this is O(V² log V), compared to Floyd-Warshall's Θ(V³).
The crossover point depends on the edge density. Let us compare:
| Edge Density | E | Floyd-Warshall | Johnson's | Winner |
|---|---|---|---|---|
| Very sparse | O(V) | V³ | V² log V | Johnson's by V/log V |
| Moderate | O(V log V) | V³ | V² log V | Johnson's |
| Medium | O(V¹·&sup5;) | V³ | V²·&sup5; | Johnson's |
| Dense | O(V²) | V³ | V³ log V | Floyd-Warshall |
| Complete | V(V-1) | V³ | V³ log V | Floyd-Warshall |
Johnson's dominates unless E = Ω(V²/log V). In practice, many real-world graphs are sparse (road networks, social graphs, web graphs), making Johnson's the practical choice for large-scale all-pairs computations.
Why does Johnson's produce correct results?
1. Bellman-Ford correctness: Bellman-Ford correctly computes shortest paths from s in the augmented graph (no negative cycles by assumption). So h(v) = δ(s, v) satisfies the triangle inequality: h(v) ≤ h(u) + w(u, v).
2. Non-negative reweighting: w'(u, v) = w(u, v) + h(u) - h(v) ≥ 0 follows directly from the triangle inequality.
3. Path weight preservation: For any path p from u to v, w'(p) = w(p) + h(u) - h(v). All paths between the same endpoints are shifted by the same constant, so the shortest path under w is also shortest under w'.
4. Dijkstra correctness: Since all reweighted edges are non-negative, Dijkstra correctly finds shortest paths in the reweighted graph.
5. Undo reweighting: d(u, v) = d'(u, v) - h(u) + h(v) recovers the true shortest-path weight.
Each step's correctness follows from the previous step. The chain is: Bellman-Ford → valid potential → non-negative reweighting → Dijkstra works → undo gives correct answer.
For implementation clarity, here is exactly what data flows through each phase:
A natural question: why not just pick an existing vertex as the source for Bellman-Ford? The problem is that an existing vertex might not be able to reach all other vertices. The auxiliary vertex s with zero-weight edges to every vertex guarantees that every vertex is reachable from s with cost 0 (before any negative edges are considered). This ensures h(v) is well-defined for all v.
Could we use h(v) = 0 for all v (skip Bellman-Ford entirely)? Only if all edges are already non-negative. If any edge (u, v) has negative weight, then w'(u, v) = w(u, v) + 0 - 0 = w(u, v) < 0, and Dijkstra would fail.
Could we use h(v) = min incoming edge weight? No — this local approach does not guarantee w'(u, v) ≥ 0 for all edges. The potential function must be globally consistent, which is exactly what Bellman-Ford provides.
Johnson's is more space-efficient than Floyd-Warshall for sparse graphs:
| Component | Space |
|---|---|
| h array (potential function) | O(V) |
| Reweighted adjacency list | O(V + E) |
| Dijkstra priority queue | O(V) per run |
| Output distance matrix | O(V²) |
| Total | O(V² + E) |
Floyd-Warshall also uses O(V²) for the distance matrix. The main space advantage of Johnson's is that the intermediate data structures scale with E, not V². For sparse graphs (E ≪ V²), this can matter.
This is the payoff. Build a graph, pick an algorithm, and watch it compute ALL shortest paths in real time. Click any pair in the distance matrix to see the shortest path highlighted on the graph.
Click on the canvas to place vertices (up to 7). Then click "Add Edge" mode and click two vertices to add a directed edge (enter weight in the prompt). Choose Floyd-Warshall or Johnson's, hit "Solve", and watch the distance matrix fill in. Click any cell in the matrix to see the path.
When you run the simulation, pay attention to these patterns:
Symmetry in undirected-like graphs. If your graph has edges in both directions with similar weights, the distance matrix will be approximately symmetric: d[i][j] ≈ d[j][i]. But directed graphs can be wildly asymmetric — it might cost 2 to go from A to B but 100 to go back.
The infinity entries. An ∞ in the distance matrix means "no path exists." This tells you about the connectivity structure. If many cells are ∞, the graph is poorly connected. A strongly connected graph has no ∞ entries (every vertex can reach every other).
Negative entries. If you see negative distances, it means the path exploits one or more negative-weight edges. Check the path by clicking the cell — you will see it routes through negative edges.
The diagonal. Should always be 0 (unless there is a negative cycle, in which case the diagonal entry goes negative — see Chapter 7).
The distance matrix has rich structure worth studying:
The diagonal is always zero (assuming no negative cycles). d[i][i] = 0 because the shortest "path" from a vertex to itself is the empty path with zero cost.
Asymmetry. In directed graphs, d[i][j] ≠ d[j][i] in general. The cost from A to B might be completely different from B to A. This is unlike undirected graphs where d[i][j] = d[j][i].
Triangle inequality. For all i, j, k: d[i][j] ≤ d[i][k] + d[k][j]. If there were a violation, then routing through k would give a shorter path from i to j, contradicting the optimality of d[i][j]. This is a fundamental property of shortest-path distances and makes them a metric (on strongly connected graphs with non-negative weights).
Infinity entries reveal structure. If d[i][j] = ∞ for many pairs, the graph is poorly connected. The pattern of ∞ entries reveals the graph's connectivity structure: which vertices can reach which. This is exactly the information captured by the transitive closure.
To build a custom graph: (1) Set mode to "Place Vertex" and click to place vertices. (2) Switch to "Add Edge" mode. (3) Click a source vertex, then a destination vertex. (4) Enter the edge weight when prompted (negative values allowed). (5) Repeat for all edges. (6) Choose an algorithm and click Solve.
Try these challenges:
| Challenge | Goal |
|---|---|
| The triangle | Build 3 vertices with edges forming a triangle. Can you predict all 6 off-diagonal distances? |
| The shortcut | Build 4 vertices in a line (1→2→3→4) with weight 1 each, then add a direct edge from 1 to 4 with weight 10. Solve — does the direct edge ever get used? |
| The negative trick | Build a graph where the shortest path between two vertices is negative. What edge weights did you need? |
| Disconnected components | Build two separate triangles (no edges between them). What does the distance matrix look like? |
All-pairs shortest paths are undefined when a negative-weight cycle is reachable. A negative cycle is a cycle whose total edge weight is negative — you can loop around it infinitely, driving the path cost to negative infinity. Both Floyd-Warshall and Johnson's can detect them.
After Floyd-Warshall completes, check the diagonal of the distance matrix. If d[i][i] < 0 for any vertex i, then vertex i lies on a negative cycle.
Why does this work? d[i][i] is the shortest "path" from i back to itself. Normally this is 0 (the empty path). But if there is a negative cycle reachable from i and that also reaches i, then we can go around the cycle and return to i with negative cost.
Johnson's detects negative cycles during the Bellman-Ford phase (Step 2). If after V-1 relaxation passes any edge can still be relaxed, a negative cycle exists. This is exactly the standard Bellman-Ford negative cycle detection.
When a negative cycle exists, some entries in the distance matrix are -∞. Specifically, d[i][j] = -∞ if and only if there exists a path from i to j that passes through a vertex on a negative cycle. Such a path can be made arbitrarily cheap by looping around the negative cycle.
This graph has a negative cycle: 2 → 3 → 4 → 2 with total weight -2. Run Floyd-Warshall and watch d[2][2], d[3][3], d[4][4] go negative (shown in red on the diagonal). The negative cycle vertices pulse in red on the graph.
Click Run to detect the negative cycle.When a negative cycle exists, it does not ruin the entire distance matrix — only specific entries. An entry d[i][j] is -∞ if and only if there is a path from i to some vertex on a negative cycle, and a path from that cycle to j. Vertices that cannot reach or be reached from the cycle still have well-defined shortest-path distances.
Formally, after Floyd-Warshall:
In the simulation, the negative cycle is 2 → 3 → 4 → 2 with total weight 1 + (-3) + (-1) = -3. Vertices 2, 3, and 4 are all on this cycle, so d[2][2], d[3][3], and d[4][4] will all be negative. Vertex 1 can reach the cycle (1→2), so d[1][j] for j ∈ {2, 3, 4} will also be affected. Vertex 5 can reach the cycle (5→4), so d[5][j] will be affected too.
In real applications, negative cycles usually represent bugs or arbitrage opportunities:
| Domain | Negative Cycle Meaning | Response |
|---|---|---|
| Currency exchange | Arbitrage opportunity (free money!) | Exploit it (or report it) |
| Network routing | Routing loop with decreasing costs | Bug — fix the routing config |
| Game AI | Infinite reward loop | Cap the loop or redesign |
| Project scheduling | Contradictory constraints | Infeasible schedule — relax constraints |
This is a famous interview question. Suppose you have exchange rates between currencies:
Starting with $1 USD: convert to EUR (0.85), then GBP (0.85 × 0.88 = 0.748), then back to USD (0.748 × 1.40 = 1.047). You end up with $1.047 — a 4.7% profit per cycle!
To detect this with Floyd-Warshall, take -log of each exchange rate as the edge weight. Then finding a negative cycle in the -log graph corresponds to finding a product of rates > 1 (arbitrage). Specifically: -log(0.85) + -log(0.88) + -log(1.40) = 0.163 + 0.128 + (-0.336) = -0.046 < 0. Negative cycle detected!
python import math def detect_arbitrage(currencies, rates): """Detect currency arbitrage using Floyd-Warshall. currencies: list of currency names rates: dict of {(from, to): exchange_rate} Returns: list of currencies in the arbitrage cycle, or None.""" n = len(currencies) idx = {c: i for i, c in enumerate(currencies)} INF = float('inf') # Build -log weight matrix d = [[INF] * n for _ in range(n)] for i in range(n): d[i][i] = 0 for (f, t), rate in rates.items(): d[idx[f]][idx[t]] = -math.log(rate) # Floyd-Warshall for k in range(n): for i in range(n): for j in range(n): if d[i][k] + d[k][j] < d[i][j]: d[i][j] = d[i][k] + d[k][j] # Check diagonal for negative cycles for i in range(n): if d[i][i] < -1e-10: # small epsilon for float errors profit = math.exp(-d[i][i]) - 1 return currencies[i], profit return None # Example usage: result = detect_arbitrage( ['USD', 'EUR', 'GBP'], {('USD','EUR'): 0.85, ('EUR','GBP'): 0.88, ('GBP','USD'): 1.40, ('EUR','USD'): 1.17, ('GBP','EUR'): 1.13, ('USD','GBP'): 0.71} ) # Returns: ('USD', 0.047) — 4.7% profit per cycle
All-pairs shortest paths is a staple of system design and algorithm interviews. Here is your cheat sheet and drill set.
| Algorithm | Time | Space | Neg. Edges? | Neg. Cycle Detect? | Best For |
|---|---|---|---|---|---|
| Floyd-Warshall | Θ(V³) | Θ(V²) | Yes | Yes (diagonal) | Dense, simplicity |
| Johnson's | O(V²logV+VE) | O(V+E) | Yes | Yes (BF phase) | Sparse + neg. edges |
| Dijkstra × V | O(V²logV+VE) | O(V+E) | No | No | Sparse, non-neg. |
| BF × V | O(V²E) | O(V+E) | Yes | Yes | Almost never |
| Matrix mult. | O(V³logV) | O(V²) | Yes | No | Theory only |
The number one coding question for this topic: implement Floyd-Warshall. Here is the minimal template:
python def floyd_warshall(n, edges): """edges: list of (u, v, w) tuples. 0-indexed vertices.""" INF = float('inf') d = [[INF] * n for _ in range(n)] for i in range(n): d[i][i] = 0 for u, v, w in edges: d[u][v] = w for k in range(n): for i in range(n): for j in range(n): if d[i][k] + d[k][j] < d[i][j]: d[i][j] = d[i][k] + d[k][j] # Negative cycle check for i in range(n): if d[i][i] < 0: return None # negative cycle return d
All-pairs shortest paths questions test five distinct skills:
python def transitive_closure(n, edges): """Return boolean matrix: can vertex i reach vertex j?""" t = [[i == j for j in range(n)] for i in range(n)] for u, v in edges: t[u][v] = True for k in range(n): for i in range(n): for j in range(n): t[i][j] = t[i][j] or (t[i][k] and t[k][j]) return t
python def graph_diameter(n, edges): """The longest shortest path (graph diameter).""" d = floyd_warshall(n, edges) if d is None: return None # negative cycle INF = float('inf') diameter = 0 for i in range(n): for j in range(n): if d[i][j] < INF: diameter = max(diameter, d[i][j]) return diameter
Often in interviews, you need to not just find the distance but reconstruct the path. Here is a complete solution:
python def all_pairs_with_paths(n, edges): """Floyd-Warshall with path reconstruction. Returns (dist_matrix, function to get path).""" INF = float('inf') d = [[INF] * n for _ in range(n)] nxt = [[None] * n for _ in range(n)] # NEXT vertex (not predecessor) for i in range(n): d[i][i] = 0 for u, v, w in edges: d[u][v] = w nxt[u][v] = v # from u, go to v next for k in range(n): for i in range(n): for j in range(n): if d[i][k] + d[k][j] < d[i][j]: d[i][j] = d[i][k] + d[k][j] nxt[i][j] = nxt[i][k] # go toward k first def get_path(u, v): if nxt[u][v] is None: return [] path = [u] while path[-1] != v: path.append(nxt[path[-1]][v]) return path return d, get_path
| Edge Case | What Happens | How to Handle |
|---|---|---|
| Self-loops with negative weight | d[i][i] becomes negative after first pass | Detected by diagonal check |
| Disconnected graph | d[i][j] = ∞ for unreachable pairs | Check for ∞ before using distances |
| Parallel edges | Only keep the minimum-weight edge | Preprocess when building W |
| Very large V (> 5000) | V³ = 125 billion operations | Use Johnson's if sparse, or approximate |
Compare Floyd-Warshall (V³, orange) vs Johnson's on a sparse graph (V² log V + V·2V, teal) for V from 4 to 20. The gap widens as V grows.
Asymptotic complexity does not tell the whole story. In practice:
Floyd-Warshall has excellent cache behavior. The inner two loops access the matrix in row-major order, which is cache-friendly on modern CPUs. The simple structure also allows SIMD vectorization. In practice, Floyd-Warshall is often faster than Johnson's for V up to ~1000 even on moderately sparse graphs.
Johnson's has overhead. Bellman-Ford + V Dijkstra runs involve heap operations, adjacency list traversals, and more complex data structures. The constant factors are higher than Floyd-Warshall's simple array access.
Parallelism. Floyd-Warshall is hard to parallelize because each layer d(k) depends on d(k-1). The matrix multiplication approach is more parallelizable (each matrix multiplication can be done in parallel). Johnson's has natural parallelism: the V Dijkstra runs are independent and can run on separate threads.
| Property | Floyd-Warshall | Johnson's |
|---|---|---|
| Lines of code | ~10 | ~40 |
| Data structure | 2D array only | Adjacency list + heap |
| Cache behavior | Excellent | Poor (pointer chasing) |
| Parallelizable | Hard | Easy (V independent runs) |
| Works with neg. edges | Yes | Yes |
| Practical crossover | Best for V ≤ 1000 | Best for V > 1000, sparse |
All-pairs shortest paths sits at the intersection of several fundamental algorithmic ideas. Let us trace the connections.
| Connection | Relationship | Key Insight |
|---|---|---|
| Ch 24: Single-source shortest paths | Foundation | Johnson's algorithm literally uses Bellman-Ford + Dijkstra as subroutines. All-pairs = "run single-source V times, but cleverly." |
| Ch 15: Dynamic Programming | Core technique | Floyd-Warshall IS dynamic programming. The intermediate vertex set {1, ..., k} is the DP state. The recurrence d(k)[i][j] = min(d(k-1)[i][j], d(k-1)[i][k] + d(k-1)[k][j]) is textbook DP. |
| Ch 26: Maximum flow | Application | Maximum flow algorithms often find shortest (or augmenting) paths as subroutines. The Edmonds-Karp algorithm uses BFS to find shortest augmenting paths. |
| Matrix multiplication | Algebraic connection | Shortest paths = matrix multiplication on the (min, +) semiring. Any improvement to min-plus matrix multiplication directly improves all-pairs shortest paths. |
| Transitive closure | Boolean special case | Replace (min, +) with (OR, AND) and you get reachability instead of distances. Same algorithm, different algebra. |
| Limitation | Alternative |
|---|---|
| O(V³) even for sparse graphs | Johnson's: O(V² log V + VE) |
| Cannot handle negative cycles (undefined output) | Detect and report; no algorithm can "solve" negative cycles |
| Not parallelizable (DP dependencies between layers) | Matrix multiplication approach has better parallelism (Δ-stepping) |
| V > ~5000 becomes impractical | Approximate algorithms, or only compute pairs you need |
← Chapter 24: Single-Source Shortest Paths — Dijkstra, Bellman-Ford, DAG shortest paths. The building blocks that Johnson's algorithm combines.
Chapter 15: Dynamic Programming — Floyd-Warshall is one of the most elegant DP algorithms. If you understand the DP formulation (subsets of intermediate vertices), you understand Floyd-Warshall at a deeper level than just "three nested loops."
Chapter 26: Maximum Flow — Flow problems use shortest path subroutines. The min-cost max-flow problem is essentially a shortest path problem on a residual graph.
One of the deepest connections in this chapter is the semiring abstraction. A semiring is a set with two operations (call them ⊕ and ⊗) that satisfy certain algebraic laws. Three important semirings for graph algorithms:
| Semiring | ⊕ | ⊗ | Problem Solved |
|---|---|---|---|
| Shortest path | min | + | All-pairs shortest paths |
| Boolean | OR | AND | Transitive closure (reachability) |
| Bottleneck | max | min | Widest path (max bandwidth between pairs) |
| Counting | + | × | Number of paths between pairs |
All four problems are solved by the exact same three-loop algorithm — just plug in different ⊕ and ⊗. Floyd-Warshall is not just a shortest path algorithm; it is a template for solving problems on closed semirings. This is one of the most beautiful generalizations in algorithm design.
Floyd-Warshall has a fascinating history. The algorithm was independently discovered by at least three people:
| Who | When | Context |
|---|---|---|
| Stephen Warshall | 1962 | Published the transitive closure algorithm (boolean version) |
| Robert Floyd | 1962 | Generalized to shortest paths (real-valued version) |
| Bernard Roy | 1959 | Published an equivalent algorithm in French, 3 years earlier |
It should perhaps be called the "Roy-Floyd-Warshall" algorithm, but history is not always fair. Johnson published his algorithm in 1977, fifteen years later, showing that the combination of Bellman-Ford and Dijkstra yields better asymptotics for sparse graphs.
All-pairs shortest paths has a tantalizing open problem: can we solve it in truly sub-cubic time? That is, O(V3-ε) for some constant ε > 0? The matrix multiplication approach gives O(V³ log V) via repeated squaring, and sub-cubic min-plus matrix multiplication would give sub-cubic all-pairs shortest paths. But nobody has achieved sub-cubic min-plus multiplication for general matrices. The Williamssian barriers framework suggests this might be fundamentally hard.
For unweighted graphs (all weights are 0 or 1), Seidel's algorithm achieves O(Vω log V) where ω < 2.372 is the matrix multiplication exponent. But for weighted graphs, Θ(V³) remains the best known.
Research since CLRS has produced several important extensions:
Distance oracles (Thorup & Zwick, 2005): preprocess the graph in O(V2+1/k) time and O(V1+1/k) space to answer approximate distance queries in O(k) time. The approximation factor is 2k - 1. For k = 2, you get 3-approximate distances in O(1) time with O(V3/2) space — much better than storing the full V² matrix.
Dynamic all-pairs shortest paths: when the graph changes (edges added or removed), can we update the distance matrix faster than recomputing from scratch? Demetrescu and Italiano (2004) gave an algorithm that handles each edge update in O(V²) amortized time, compared to O(V³) for full recomputation.
Parallel all-pairs: on a machine with p processors, Floyd-Warshall can be parallelized to O(V³/p) time with O(V²/p) memory per processor, using a blocked partitioning of the distance matrix. This is one of the classic algorithms for distributed computing courses.
Here is a decision tree for real-world use, incorporating both theory and practice:
In competitive programming, Floyd-Warshall is almost always the right choice because V is typically small (≤ 400) and the code is trivially short. In production systems, the graph is often too large for any cubic algorithm, and you need specialized solutions (distance oracles, hub labeling, contraction hierarchies) that exploit the structure of real-world graphs.
The algorithms in this chapter are foundational — they define the problem space and provide the baseline that all practical solutions are measured against. Understanding Floyd-Warshall deeply (the DP formulation, the in-place trick, the negative cycle detection) gives you the conceptual vocabulary to reason about any shortest-path problem, no matter how complex.
If you remember nothing else from this chapter, remember this mental model:
| Approach | Analogy | DP State | Strength |
|---|---|---|---|
| Matrix mult. | Extend paths by one hop at a time | Number of edges used | Theoretical elegance |
| Floyd-Warshall | Add waypoints one at a time | Set of allowed intermediates | Simplicity (5 lines) |
| Johnson's | Remove negatives, then use fast tool | Hybrid: BF + Dijkstra | Speed on sparse graphs |
Matrix multiplication asks: "What is the best path using at most m edges?" Floyd-Warshall asks: "What is the best path using intermediates from {1, ..., k}?" Johnson's says: "Transform the problem so we can use Dijkstra." Three completely different perspectives on the same problem, each revealing different structure.
This chapter completes the shortest-path trilogy in CLRS: single-source (Ch. 24), all-pairs (Ch. 25), and their applications throughout the rest of the book. The next chapter, Maximum Flow (Ch. 26), uses shortest paths internally (the Edmonds-Karp algorithm finds shortest augmenting paths). The min-cost max-flow problem directly combines flow and shortest paths.
Beyond CLRS, all-pairs shortest paths connects to:
| Topic | Connection |
|---|---|
| Algebraic graph theory | The (min, +) semiring generalizes to other algebras on graphs |
| Metric spaces | The distance matrix D defines a metric on the graph vertices |
| Network design | Knowing all-pairs distances helps place facilities or design overlay networks |
| Approximation algorithms | Distance oracles trade space for query time: O(1) per query with O(V1+1/k) space |
| Parallel algorithms | Matrix multiplication approach is more parallelizable than Floyd-Warshall |
All-pairs shortest paths is one of those rare algorithmic problems where the solution is more beautiful than the problem. Floyd-Warshall's five lines of code hide a deep DP structure. Johnson's algorithm shows the power of problem transformation. And the matrix multiplication connection reveals that shortest paths belong to a vast algebraic landscape extending far beyond graphs.
Master these three algorithms — Floyd-Warshall for dense graphs, Johnson's for sparse, matrix multiplication for theory — and you have a complete toolkit for any shortest-path problem an interview or system design session can throw at you.
"The purpose of computation is insight, not numbers." — Richard Hamming