All-Pairs Shortest Paths (CLRS Ch. 25) — From Absolute Zero to Mastery

Chapter 0: The Problem

You run a logistics company with 6 warehouses. Every morning, you need to know: what is the cheapest shipping cost between every pair of warehouses? Not just from one hub — between ALL of them simultaneously. That is the all-pairs shortest paths problem.

One approach is obvious: run a single-source algorithm (like Dijkstra) from every vertex. With V vertices and E edges, running Dijkstra V times with a binary heap costs O(V(V + E) log V). For dense graphs where E = Θ(V²), that is O(V³ log V). Can we shave off that log V factor?

Yes. Floyd-Warshall solves all-pairs shortest paths in exactly Θ(V³) time — no log factor — using an elegant dynamic programming idea. For sparse graphs, Johnson's algorithm does even better: O(V² log V + VE), which beats Floyd-Warshall when E is much smaller than V².

The landscape. Three approaches to all-pairs shortest paths, each dominating in different regimes: (1) Matrix multiplication — O(V³ log V), elegant but not the fastest. (2) Floyd-Warshall — O(V³), the classic for dense graphs or when simplicity matters. (3) Johnson's — O(V² log V + VE), the winner for sparse graphs. This chapter covers all three.

The Input and Output

The input is a weighted directed graph G = (V, E) with weight function w. We represent it as an adjacency matrix W where:

W[i][j] = 0               if i = j
W[i][j] = w(i, j)        if (i, j) ∈ E
W[i][j] = ∞             if (i, j) ∉ E

The output is a distance matrix D where D[i][j] = δ(i, j), the shortest-path weight from vertex i to vertex j. We may also produce a predecessor matrix Π where Π[i][j] is the predecessor of j on a shortest path from i — this lets us reconstruct the actual paths.

The simulation below lets you see this transformation. On the left is a weighted graph. On the right is the distance matrix — initially just the adjacency matrix W. Your job: figure out the true shortest distances.

All-Pairs: From Adjacency Matrix to Distance Matrix

The graph on the left has 5 vertices. The matrix on the right shows current best-known distances (starting with direct edge weights). Hover over a cell to highlight the corresponding path on the graph. Click "Solve" to fill in all shortest distances.

Hover over matrix cells to see paths.

Why Not Just Run Dijkstra V Times?

You absolutely can. And for sparse graphs with non-negative weights, that is often the best approach. But consider the trade-offs:

Approach	Time Complexity	Dense (E = V²)	Sparse (E = V)	Neg. Weights?
Dijkstra × V (binary heap)	O(V(V+E) log V)	O(V³ log V)	O(V² log V)	No
Bellman-Ford × V	O(V²E)	O(V⁴)	O(V³)	Yes
Matrix multiplication	O(V³ log V)	O(V³ log V)	O(V³ log V)	Yes
Floyd-Warshall	Θ(V³)	Θ(V³)	Θ(V³)	Yes
Johnson's	O(V² log V + VE)	O(V³ log V)	O(V² log V)	Yes

For dense graphs, Floyd-Warshall's Θ(V³) beats Dijkstra × V by a log V factor. For sparse graphs with negative edges, Johnson's is the clear winner. Knowing which to pick is an essential interview skill.

The key question for interviews: "Given a graph with V vertices and E edges, which all-pairs algorithm would you use?" The answer depends on two things: (1) Is the graph dense or sparse? (2) Are there negative edge weights? Dense + any weights → Floyd-Warshall. Sparse + negative weights → Johnson's. Sparse + non-negative → Dijkstra × V.

The Structure of the Solution

All three approaches produce the same two outputs:

Distance Matrix D

D[i][j] = δ(i, j), the shortest-path weight from i to j. Size: V × V. Entries range from -∞ (negative cycle) to +∞ (unreachable). The diagonal is all zeros (if no negative cycles).

↓

Predecessor Matrix Π

Π[i][j] = the vertex just before j on a shortest path from i to j. Lets us reconstruct actual paths by following predecessors backwards. NIL if no path exists.

Think of D as the "price list" and Π as the "route map." Together, they answer both "how much?" and "which way?" for every pair of vertices.

The distance matrix D has V² entries. For a graph with 1000 vertices, that is a million numbers. Computing each one requires examining potentially all V other vertices as intermediates. The fundamental question of this chapter: can we fill in all V² entries faster than running V separate single-source algorithms?

Size matters. For V = 100, Floyd-Warshall does 10⁶ operations — instant. For V = 1,000, it does 10⁹ — a few seconds. For V = 10,000, it does 10¹² — impractical. This is why choosing the right algorithm based on graph density matters enormously. A factor of V/log V can be the difference between minutes and hours.

A Concrete Example: Five Warehouses

Let us build intuition with a small example before diving into algorithms. Suppose you have 5 warehouses (vertices 1-5) with the following shipping costs:

1 → 2: cost 3         // cheap direct route
1 → 3: cost 8         // expensive direct route
2 → 3: cost 2         // but going through 2 is cheaper: 3+2=5
2 → 4: cost 1
3 → 2: cost 4         // going back is more expensive
4 → 1: cost 2         // creates a cycle: 1→2→4→1 costs 3+1+2=6
4 → 3: cost -5        // negative edge! warehouse 4 subsidizes shipping to 3
1 → 5: cost 7
5 → 4: cost 6

Even with just 5 vertices, finding all shortest paths by hand is tedious. The path from 1 to 3 is not the direct route (cost 8) — it is 1 → 2 → 4 → 3 (cost 3 + 1 + (-5) = -1). The negative edge through warehouse 4 creates a shortcut. This is exactly the kind of non-obvious result that all-pairs algorithms reveal.

Let us try to count the paths manually. From vertex 1 to vertex 3:

1 → 3:                  cost 8     (direct)
1 → 2 → 3:             cost 3+2 = 5 (via 2)
1 → 2 → 4 → 3:        cost 3+1+(-5) = -1 (via 2 and 4, using neg. edge)
1 → 5 → 4 → 3:        cost 7+6+(-5) = 8 (via 5 and 4)
1 → 2 → 4 → 1 → 3: cost 3+1+2+8 = 14 (with cycle, even longer)

The winner is 1 → 2 → 4 → 3 with cost -1. Finding this required checking multiple multi-hop paths and recognizing that the -5 edge makes a longer path (3 edges) cheaper than the direct edge. For V = 100, the number of possible paths explodes combinatorially. We need algorithms.

Adjacency Matrix vs Adjacency List

All-pairs algorithms typically work with adjacency matrices (2D arrays) rather than adjacency lists. This makes sense: the output is a V × V matrix, so the algorithms already need O(V²) space. The input adjacency matrix uses O(V²) space too, which is fine for dense graphs.

For sparse graphs with E ≪ V², adjacency lists use only O(V + E) space. Johnson's algorithm takes advantage of this: it stores the graph as adjacency lists and only builds the V × V distance matrix at the end.

Representation	Space	Lookup edge (u,v)?	Iterate neighbors of u?	Best for
Adjacency matrix	O(V²)	O(1)	O(V)	Dense, Floyd-Warshall
Adjacency list	O(V + E)	O(degree(u))	O(degree(u))	Sparse, Johnson's/Dijkstra

Quick check: You have a dense graph with V = 1000 vertices, E = 400,000 edges, and some edges have negative weights (no negative cycles). Which all-pairs shortest paths approach is best?

Run Dijkstra from every vertex — O(V(V+E) log V) Run Bellman-Ford from every vertex — O(V²E) Floyd-Warshall — Θ(V³). Dense graph means E ≈ V², so Floyd-Warshall's cubic time beats alternatives. Dijkstra fails because of negative weights.

Chapter 1: Matrix Multiplication Approach

Before we get to Floyd-Warshall, there is a beautiful connection between shortest paths and matrix multiplication. It is not the fastest algorithm, but it reveals deep structure that shows up in competitive programming and theory.

Shortest Paths as "Multiplication"

In regular matrix multiplication, we compute C = A × B where C[i][j] = ∑_k A[i][k] · B[k][j]. Now imagine replacing the sum (∑) with min, and the product (·) with addition (+). This gives us a new operation:

L'[i][j] = min_k { L[i][k] + W[k][j] }

This says: the best way to get from i to j is to go from i to some intermediate vertex k (costing L[i][k]), then take the direct edge from k to j (costing W[k][j]). Minimize over all choices of k.

If L^(m)[i][j] is the weight of the shortest path from i to j using at most m edges, then:

L⁽¹⁾ = W                     // direct edges only
L^(m) = L^(m-1) ⊗ W         // extend by one more edge
L^(n-1) = D                   // final answer (no simple path has more than n-1 edges)

where ⊗ is our "min-plus" matrix multiplication. Computing L⁽¹⁾, L⁽²⁾, ..., L^(n-1) one at a time takes n-1 matrix multiplications, each costing O(V³). Total: O(V⁴).

Repeated Squaring: O(V³ log V)

But here is the trick: we do not need every intermediate matrix. Since shortest path weights satisfy L^(2m) = L^(m) ⊗ L^(m) (just like regular squaring), we can compute L⁽¹⁾, L⁽²⁾, L⁽⁴⁾, L⁽⁸⁾, ..., L^{(2^{⌈log(n-1)⌉})}. That is only ⌈log(n-1)⌉ multiplications instead of n-1. Total: O(V³ log V).

Why this works. Once m ≥ n - 1, L^(m) = L^(n-1) = D. No shortest path (without negative cycles) has more than n-1 edges. So "overshooting" with a power of 2 is fine — the matrix stops changing once it reaches the true answer.

Worked Example

Consider a 4-vertex graph with adjacency matrix W:

    1   2   3   4
1 [ 0   3   8   ∞ ]
2 [ ∞  0   2   1 ]
3 [ ∞  4   0   ∞ ]
4 [ 2   ∞  -5  0 ]

Let us compute L⁽²⁾ = W ⊗ W. For cell L⁽²⁾[1][3]:

L⁽²⁾[1][3] = min(
  W[1][1] + W[1][3],   // 0 + 8 = 8 (via vertex 1)
  W[1][2] + W[2][3],   // 3 + 2 = 5 (via vertex 2)
  W[1][3] + W[3][3],   // 8 + 0 = 8 (via vertex 3)
  W[1][4] + W[4][3]   // ∞ + (-5) = ∞ (via vertex 4)
) = 5   // shortest 2-edge path from 1 to 3 goes through vertex 2

Computing the full L⁽²⁾ matrix:

L⁽²⁾[1][1] = min(0+0, 3+∞, 8+∞, ∞+2) = 0
L⁽²⁾[1][2] = min(0+3, 3+0, 8+4, ∞+∞) = 3
L⁽²⁾[1][3] = min(0+8, 3+2, 8+0, ∞+(-5)) = 5 ← via vertex 2
L⁽²⁾[1][4] = min(0+∞, 3+1, 8+∞, ∞+0) = 4 ← via vertex 2
L⁽²⁾[4][1] = min(2+0, ∞+∞, -5+∞, 0+2) = 2
L⁽²⁾[4][2] = min(2+3, ∞+0, -5+4, 0+∞) = -1 ← via vertex 3
L⁽²⁾[4][3] = min(2+8, ∞+2, -5+0, 0+(-5)) = -5
L⁽²⁾[4][4] = min(2+∞, ∞+1, -5+∞, 0+0) = 0

Already in L⁽²⁾, we see paths of up to 2 edges. The entry L⁽²⁾[4][2] = -1 corresponds to the path 4 → 3 → 2 (cost -5 + 4 = -1). One more squaring gives L⁽⁴⁾, which equals the final distance matrix since no shortest path needs more than 3 edges in a 4-vertex graph.

Animated Min-Plus Matrix Multiplication

Watch the matrix L evolve from L⁽¹⁾ = W toward the final distance matrix. Each step computes L^(2m) = L^(m) ⊗ L^(m) (repeated squaring). Click "Step" to advance one squaring.

L⁽¹⁾ = W. Click Step for L⁽²⁾.

Implementation

python
import math

def min_plus_multiply(A, B, n):
    """Min-plus matrix multiplication: C[i][j] = min_k(A[i][k] + B[k][j])"""
    C = [[math.inf] * n for _ in range(n)]
    for i in range(n):
        for j in range(n):
            for k in range(n):
                C[i][j] = min(C[i][j], A[i][k] + B[k][j])
    return C

def all_pairs_repeated_squaring(W, n):
    """Compute all-pairs shortest paths via repeated squaring."""
    L = [row[:] for row in W]  # L^(1) = W
    m = 1
    while m < n - 1:
        L = min_plus_multiply(L, L, n)  # L^(2m) = L^(m) x L^(m)
        m *= 2
    return L  # = D, the distance matrix

Why not use this in practice? O(V³ log V) is worse than Floyd-Warshall's O(V³). But the matrix multiplication viewpoint has theoretical value: (1) it connects to algebraic path problems on semirings, (2) any faster min-plus matrix multiplication immediately improves all-pairs shortest paths, and (3) the repeated squaring trick appears in many other DP problems.

The Connection to Regular Matrix Multiplication

Why do we call this "matrix multiplication"? Compare the two operations side by side:

Standard	Min-Plus
C[i][j] = ∑_k A[i][k] · B[k][j]	C[i][j] = min_k{ A[i][k] + B[k][j] }
Combine with addition (∑)	Combine with min
Pair with multiplication (·)	Pair with addition (+)
Identity: I (1 on diagonal, 0 off)	Identity: D₀ (0 on diagonal, ∞ off)
Zero: 0 absorbs under ·	"Zero": ∞ absorbs under +

The structural parallel is exact. Both are "multiply row by column and combine." The only difference is which operations play the role of (+, ×). This is why the theory of semirings unifies both: regular matrix multiplication uses the (∑, ×) semiring, while shortest paths use the (min, +) semiring.

Why repeated squaring works: Just as A⁸ = ((A²)²)² in regular arithmetic, L⁽⁸⁾ = ((L⁽²⁾)⊗²)⊗² in min-plus. Associativity of the operation guarantees this, and min-plus multiplication is associative (you can verify this from the definition).

From O(V⁴) to O(V³ log V)

The naive approach computes L⁽¹⁾, L⁽²⁾, ..., L^(n-1) sequentially. Each "multiplication" L^(m) = L^(m-1) ⊗ W costs O(V³), and we do n-1 of them. Total: O(V⁴).

Repeated squaring computes L⁽¹⁾, L⁽²⁾, L⁽⁴⁾, ..., stopping when the exponent exceeds n-1. We need only ⌈log₂(n-1)⌉ multiplications. Each is still O(V³), so total: O(V³ log V).

For V = 1000: naive does ~999 multiplications (999 billion operations). Repeated squaring does ~10 multiplications (10 billion operations). A 100x speedup, yet still worse than Floyd-Warshall's 1 billion operations.

Can We Do Better? The Sub-Cubic Dream

If we could compute min-plus matrix multiplication in O(V^ω) time where ω < 3, then all-pairs shortest paths would be O(V^ω log V). For regular matrix multiplication, ω < 2.372 (the current best). But min-plus does not have the same algebraic structure (no additive inverses), and nobody has achieved sub-cubic min-plus multiplication. Whether ω_min-plus = 3 or could be smaller is one of the biggest open questions in theoretical computer science.

Quick check: In min-plus matrix multiplication, what replaces the standard (+, ×) operations?

(max, +) — we maximize sums (min, +) — we minimize sums. The "sum" over k becomes a min, and the "product" of entries becomes addition of edge weights. (min, ×) — we minimize products

Chapter 2: Floyd-Warshall

Floyd-Warshall is one of the most elegant algorithms in all of computer science. Three nested loops, five lines of code, and it solves all-pairs shortest paths in Θ(V³) time. The key idea is deceptively simple: consider intermediate vertices one at a time.

The DP Formulation

Define d^(k)[i][j] as the weight of the shortest path from i to j where all intermediate vertices are drawn from the set {1, 2, ..., k}. An intermediate vertex is any vertex on the path other than the endpoints i and j.

Base case: d⁽⁰⁾[i][j] = W[i][j]. With zero intermediate vertices allowed, we can only use direct edges.

Recursive case: when we introduce vertex k as a potential intermediate vertex, either:

Option A: Skip vertex k

The shortest path from i to j using intermediates {1, ..., k} does NOT go through k. Then d^(k)[i][j] = d^(k-1)[i][j].

Option B: Go through vertex k

The shortest path DOES go through k. Split it: i ↝ k ↝ j. Cost: d^(k-1)[i][k] + d^(k-1)[k][j]. Both sub-paths use intermediates from {1, ..., k-1} only (k is an endpoint, not intermediate, in each half).

Take the minimum:

d^(k)[i][j] = min( d^(k-1)[i][j], d^(k-1)[i][k] + d^(k-1)[k][j] )

The final answer is d⁽ⁿ⁾[i][j] = δ(i, j), since all n vertices are allowed as intermediates.

The eureka moment. Floyd-Warshall is not asking "which path is shortest?" directly. It is asking: "For each vertex k, does routing through k improve the path from i to j?" Try every k, and you have tried every possible intermediate vertex. That is why three nested loops suffice. The outer loop iterates over k (which vertex to try as intermediate). The inner two loops iterate over all pairs (i, j) to check if k helps.

Why Can We Use a Single Matrix? (Space Optimization)

The recurrence references d^(k-1), suggesting we need two matrices. But notice: when computing d^(k)[i][j], we read d^(k-1)[i][k] and d^(k-1)[k][j]. The value d[i][k] does not change when we process row k (because d^(k)[i][k] = min(d^(k-1)[i][k], d^(k-1)[i][k] + d^(k-1)[k][k]) = d^(k-1)[i][k] since d[k][k] = 0). Similarly, d[k][j] does not change. So we can safely update in place.

Worked Example: 5 Vertices

Graph edges: 1→2 (3), 1→3 (8), 2→3 (2), 2→4 (1), 3→2 (4), 4→1 (2), 4→3 (-5), 1→5 (7), 5→4 (6).

Initial matrix D⁽⁰⁾ = W:

   1   2   3   4   5
1[ 0   3   8  ∞  7 ]
2[ ∞  0   2   1  ∞ ]
3[ ∞  4   0  ∞  ∞ ]
4[ 2  ∞  -5  0  ∞ ]
5[ ∞  ∞  ∞  6  0 ]

Let us trace k = 1. For every pair (i, j), check: is d[i][1] + d[1][j] < d[i][j]? Vertex 1 has outgoing edges to 2, 3, and 5. The only vertices that can reach 1 are vertex 4 (d[4][1] = 2). So we check pairs where i = 4:

// k = 1: check if routing through vertex 1 helps
d[4][2]: min(∞, d[4][1]+d[1][2]) = min(∞, 2+3) = 5 ← improved!
d[4][3]: min(-5, d[4][1]+d[1][3]) = min(-5, 2+8) = -5 (no change)
d[4][5]: min(∞, d[4][1]+d[1][5]) = min(∞, 2+7) = 9 ← improved!

The Elegant DP Structure

Let us appreciate the beauty of this DP formulation. Most DP problems have states that grow exponentially. For shortest paths, you might expect the DP state to include the full set of vertices visited (as in the Travelling Salesman Problem). That would give O(2^V · V) states — exponential.

Floyd-Warshall's genius is using {1, ..., k} as the state. We do not track which specific vertices are intermediates — just how many are "unlocked." Because we add vertices in order (1, then 2, then 3...), we only need V states, not 2^V. This works because shortest paths have optimal substructure: if the shortest path uses intermediate vertex k, then the sub-paths from i to k and from k to j are themselves shortest paths using only intermediates from {1, ..., k-1}.

This is fundamentally different from the Travelling Salesman Problem, where the "no-repeat-visit" constraint forces you to track the exact set of visited vertices. TSP is NP-hard; all-pairs shortest paths is polynomial. The difference is exactly this: optimal substructure with ordered intermediates vs. optimal substructure with subset-based states.

Floyd-Warshall vs. Bellman-Ford: The Deep Connection

Both Floyd-Warshall and Bellman-Ford are dynamic programming algorithms for shortest paths. But they use different DP states:

Algorithm	DP State	Transition	Dimensions
Bellman-Ford	d[v] after m edge relaxations	Relax every edge, V-1 times	One source, all destinations
Floyd-Warshall	d[i][j] with intermediates {1, ..., k}	Try vertex k as intermediate	All sources, all destinations

Bellman-Ford asks: "Can we reach v in at most m edges?" Floyd-Warshall asks: "Can we reach j from i using only the first k vertices as waypoints?" Both converge to the true shortest path, but via completely different relaxation strategies.

This also explains why Floyd-Warshall naturally handles all pairs while Bellman-Ford handles only single-source: Floyd-Warshall's DP state already includes both i and j, so it computes all pairs "for free" as part of its DP table.

Complete Hand-Trace: All 5 Passes

Let us trace every pass. This is the kind of hand-calculation that shows up on exams and cements understanding.

k = 2 (allow vertex 2 as intermediate): Which pairs benefit from routing through vertex 2? Any pair (i, j) where d[i][2] + d[2][j] < d[i][j]. Vertex 2 connects to 3 (cost 2) and 4 (cost 1).

d[1][3]: min(8, d[1][2]+d[2][3]) = min(8, 3+2) = 5   ← 1→2→3 beats 1→3!
d[1][4]: min(∞, d[1][2]+d[2][4]) = min(∞, 3+1) = 4   ← new path found
d[3][4]: min(∞, d[3][2]+d[2][4]) = min(∞, 4+1) = 5   ← 3→2→4

k = 3 (allow vertex 3 as intermediate): Vertex 3 connects to vertex 2 (cost 4). Does any pair improve by routing through 3?

d[4][2]: min(5, d[4][3]+d[3][2]) = min(5, -5+4) = -1 ← 4→3→2 with the negative edge!
d[1][2]: min(3, d[1][3]+d[3][2]) = min(3, 5+4) = 3 (no change)

k = 4 (allow vertex 4 as intermediate): Vertex 4 connects to 1 (cost 2) and 3 (cost -5). This is where the negative edge creates havoc.

d[1][3]: min(5, d[1][4]+d[4][3]) = min(5, 4+(-5)) = -1   ← 1→2→4→3, using neg. edge!
d[1][1]: min(0, d[1][4]+d[4][1]) = min(0, 4+2) = 0    (no change, diagonal stays 0)
d[2][1]: min(∞, d[2][4]+d[4][1]) = min(∞, 1+2) = 3
d[2][3]: min(2, d[2][4]+d[4][3]) = min(2, 1+(-5)) = -4 ← 2→4→3
d[3][1]: min(∞, d[3][4]+d[4][1]) = min(∞, 5+2) = 7
d[3][3]: min(0, d[3][4]+d[4][3]) = min(0, 5+(-5)) = 0   (no change)
d[5][1]: min(∞, d[5][4]+d[4][1]) = min(∞, 6+2) = 8
d[5][3]: min(∞, d[5][4]+d[4][3]) = min(∞, 6+(-5)) = 1

k = 5 (allow vertex 5 as intermediate): Vertex 5 has limited connectivity, so few updates occur.

d[1][4]: min(4, d[1][5]+d[5][4]) = min(4, 7+6) = 4 (no change)

Final distance matrix D⁽⁵⁾:

   1   2   3   4   5
1[ 0   3  -1   4   7 ]
2[ 3   0  -4   1  ∞ ]
3[ 7   4   0   5  ∞ ]
4[ 2  -1  -5   0  ∞ ]
5[ 8   5   1   6  0 ]

Notice d[1][3] = -1. The cheapest path from warehouse 1 to warehouse 3 actually makes money (negative cost). The path is 1 → 2 → 4 → 3, exploiting the -5 subsidy from warehouse 4.

Floyd-Warshall Step-Through

Watch the distance matrix update as k increases from 1 to 5. Orange cells have just been updated. The graph on the left highlights the current intermediate vertex k in warm orange.

k = 0 (initial). Click Step for k = 1.

The Code: Five Beautiful Lines

python
def floyd_warshall(W, n):
    """All-pairs shortest paths in Theta(n^3).
    W: n x n adjacency matrix (0 on diagonal, inf for no edge).
    Returns distance matrix D."""
    d = [row[:] for row in W]  # copy W
    for k in range(n):             # try each intermediate vertex
        for i in range(n):         # for each source
            for j in range(n):     # for each destination
                if d[i][k] + d[k][j] < d[i][j]:
                    d[i][j] = d[i][k] + d[k][j]
    return d

That is the entire algorithm. Three nested loops, one comparison, one assignment. The outer loop (k) is the DP dimension — which vertices are allowed as intermediates. The inner loops (i, j) check every pair. Total: exactly V³ iterations.

Let us verify this code against our worked example. The input is the 5 × 5 adjacency matrix W from above. After the three nested loops complete, d should match our hand-computed D⁽⁵⁾:

python
import math
INF = math.inf
W = [
    [0, 3,  8,  INF, 7],   # from vertex 0 (= "1")
    [INF, 0, 2,  1,   INF], # from vertex 1 (= "2")
    [INF, 4, 0,  INF, INF], # from vertex 2 (= "3")
    [2, INF, -5, 0,   INF], # from vertex 3 (= "4")
    [INF, INF, INF, 6, 0],  # from vertex 4 (= "5")
]

D = floyd_warshall(W, 5)
# D[0][2] = -1   (path 1→2→4→3, cost 3+1-5 = -1)
# D[1][2] = -4   (path 2→4→3, cost 1-5 = -4)
# D[3][1] = -1   (path 4→3→2, cost -5+4 = -1)
print(D)

Note the 0-indexing in Python vs 1-indexing in the math. Vertex "1" in our example is index 0 in the array. The algorithm does not care about indexing — it just needs consistent indices.

Loop order matters! The k loop MUST be outermost. A common mistake is putting k as the innermost loop — this computes something, but NOT shortest paths. Think about why: we need d^(k-1) fully computed before we start d^(k). The outer k loop ensures we process intermediate vertices in order.

What Happens with Wrong Loop Order?

This is such a common bug that it is worth demonstrating. Consider putting i outermost:

python
# WRONG! This is NOT Floyd-Warshall
for i in range(n):         # i outermost
    for j in range(n):     # j middle
        for k in range(n): # k innermost — WRONG!
            if d[i][k] + d[k][j] < d[i][j]:
                d[i][j] = d[i][k] + d[k][j]

This code asks: "For each pair (i, j), try all possible intermediate vertices k." That sounds right but is subtly wrong. When we check d[i][k], the value of d[i][k] may not yet be optimal because we haven't finished processing all intermediate vertices for that sub-path. Floyd-Warshall's correctness depends on processing intermediate vertices in a specific order (1, 2, 3, ..., n), which requires k to be the outer loop.

The wrong loop order sometimes gives correct answers by accident (especially on small graphs), making this bug incredibly hard to debug. In interviews, always triple-check: k outermost, i middle, j innermost.

A helpful mnemonic: "K Includes J" — k (outermost), i (middle), j (innermost). Or think of it as "intermediates first, then pairs." The k loop sets up the context (which intermediates are allowed), and the i, j loops fill in the answers given that context.

Memory-Efficient Variant

Some implementations use two separate matrices (d_old and d_new) to make the DP transition cleaner:

python
def floyd_warshall_two_matrix(W, n):
    """Floyd-Warshall with explicit old/new matrices.
    Clearer but uses 2x memory."""
    d = [row[:] for row in W]
    for k in range(n):
        d_new = [row[:] for row in d]  # copy
        for i in range(n):
            for j in range(n):
                d_new[i][j] = min(d[i][j], d[i][k] + d[k][j])
        d = d_new
    return d

This version is equivalent to the in-place version (as proven above) but makes the DP structure explicit: d_new[i][j] only reads from d (the previous iteration's values). The in-place version works because d[i][k] and d[k][j] happen to be invariant when we update d[i][j] — but if you are not confident in that proof, the two-matrix version is safer.

In practice, always use the in-place version. It halves memory usage and is just as fast (no extra allocation per iteration). But the two-matrix version is great for understanding and debugging.

Complexity Analysis

Time: Three nested loops, each running n times. The body is O(1). Total: Θ(n³). No hidden constants, no log factors, no priority queues. Just a triple loop. The constant is about 1 comparison + 1 addition + 1 conditional assignment per iteration.

Space: Θ(n²) for the distance matrix. We update in place, so no extra matrix needed. With the predecessor matrix, it is still Θ(n²) — just two n×n arrays.

Comparison: Floyd-Warshall is simpler than running Dijkstra V times and handles negative edges. For dense graphs (E = Θ(V²)), it matches or beats all alternatives. For sparse graphs, Johnson's algorithm (Chapter 5) is better.

Exact operation count: For n vertices, Floyd-Warshall performs exactly n³ iterations of the inner loop. Each iteration does 1 addition and 1 comparison (and occasionally 1 assignment). So the total is about 2n³ arithmetic operations. For n = 100: 2,000,000 operations. For n = 1000: 2,000,000,000 operations (~2 seconds on a modern CPU at 10⁹ ops/sec).

Quick check: In Floyd-Warshall, when processing intermediate vertex k = 3, we consider whether the path i → ... → 3 → ... → j is shorter than the current best path from i to j. Which vertices can appear as intermediates in the sub-paths i ↝ 3 and 3 ↝ j?

Any vertex in the graph Only vertices {1, 2} — the sub-paths use intermediates from {1, ..., k-1}, which is {1, 2} when k = 3. Vertex 3 itself is an endpoint in each sub-path, not an intermediate. Only vertices {1, 2, 3}

Chapter 3: Path Reconstruction

Floyd-Warshall gives us the shortest distances. But often we need the actual paths. Just like single-source algorithms maintain a predecessor array π, Floyd-Warshall maintains a predecessor matrix Π.

The Predecessor Matrix

Π[i][j] stores the predecessor of j on a shortest path from i. Initialize it as:

Π⁽⁰⁾[i][j] = NIL if i = j or W[i][j] = ∞
Π⁽⁰⁾[i][j] = i if i ≠ j and W[i][j] < ∞ (direct edge exists)

During Floyd-Warshall, when we find a shorter path through k:

if d[i][k] + d[k][j] < d[i][j]:
d[i][j] = d[i][k] + d[k][j]
Π[i][j] = Π[k][j] // predecessor of j on the i→k→j path

Why Π[k][j] and not just k? Because the shortest path from i to j through k might not go directly from k to j. It might go k → ... → j through several vertices. The predecessor of j on that sub-path is Π[k][j].

Reconstructing a Path

To recover the shortest path from i to j, follow the predecessor chain backwards:

python
def reconstruct_path(pi, i, j):
    """Reconstruct shortest path from i to j using predecessor matrix."""
    if pi[i][j] is None:
        return []  # no path exists
    path = [j]
    while path[-1] != i:
        path.append(pi[i][path[-1]])
    path.reverse()
    return path

Full Floyd-Warshall with Path Recovery

python
def floyd_warshall_paths(W, n):
    """Floyd-Warshall with predecessor matrix for path reconstruction."""
    d = [row[:] for row in W]
    pi = [[None] * n for _ in range(n)]

    # Initialize predecessors
    for i in range(n):
        for j in range(n):
            if i != j and W[i][j] < float('inf'):
                pi[i][j] = i

    # Main loop
    for k in range(n):
        for i in range(n):
            for j in range(n):
                if d[i][k] + d[k][j] < d[i][j]:
                    d[i][j] = d[i][k] + d[k][j]
                    pi[i][j] = pi[k][j]  # NOT just k!

    return d, pi

Common mistake: setting Π[i][j] = k instead of Π[k][j]. If the shortest path from k to j goes through multiple vertices (say k → 5 → 2 → j), the predecessor of j is 2, not k. We need Π[k][j] to get the correct predecessor. Setting it to k would only work if there is a direct edge from k to j.

How the Predecessor Matrix Evolves

Let us trace the predecessor matrix alongside the distance matrix for our 5-vertex graph. Initial Π⁽⁰⁾:

    1    2    3    4    5
1[ NIL  1    1   NIL  1 ]
2[ NIL  NIL  2    2   NIL ]
3[ NIL  3   NIL  NIL  NIL ]
4[ 4   NIL  4   NIL  NIL ]
5[ NIL  NIL  NIL  5   NIL ]

When Floyd-Warshall updates d[1][4] = 4 (via 1 → 2 → 4), it sets Π[1][4] = Π[2][4] = 2. The predecessor of 4 on the shortest path from 1 is vertex 2 (the vertex just before 4).

When it later updates d[1][3] = -1 (via 1 → 2 → 4 → 3), it sets Π[1][3] = Π[4][3] = 4. The predecessor of 3 on the shortest path from 1 is vertex 4, NOT vertex 2 or vertex 1. To get the full path, follow the chain: Π[1][3] = 4, Π[1][4] = 2, Π[1][2] = 1. Path: 1 → 2 → 4 → 3.

Time Complexity of Path Reconstruction

Reconstructing a single path takes O(V) time in the worst case (the path might visit all V vertices). Reconstructing ALL V² paths takes O(V³) time total. This is dominated by the O(V³) Floyd-Warshall computation itself, so path reconstruction adds no asymptotic overhead.

Alternative: "Next" Matrix Instead of Predecessor

Some implementations store the next vertex on the path instead of the predecessor. If nxt[i][j] = the vertex immediately after i on the shortest path from i to j, then path reconstruction becomes a forward trace:

python
def reconstruct_forward(nxt, i, j):
    """Follow 'next' pointers from i to j."""
    if nxt[i][j] is None:
        return []  # no path
    path = [i]
    while path[-1] != j:
        path.append(nxt[path[-1]][j])
    return path

The update rule changes slightly: when d[i][k] + d[k][j] < d[i][j], set nxt[i][j] = nxt[i][k] (go toward k first). Both approaches are correct; the "next" version is often more intuitive for printing paths.

Worked Example

Using our 5-vertex graph from Chapter 2, after Floyd-Warshall completes, suppose D[1][4] = 4. How do we find the actual path?

// Follow predecessors from j = 4 back to i = 1
Π[1][4] = 2 // predecessor of 4 on path from 1 is vertex 2
Π[1][2] = 1 // predecessor of 2 on path from 1 is vertex 1 (direct edge)
// Path: 1 → 2 → 4, cost = 3 + 1 = 4 ✓

Path Reconstruction Explorer

Click any cell in the distance matrix to see the full shortest path from that row's vertex to that column's vertex. The path is highlighted on the graph and the predecessor chain is shown below.

Click a cell in the matrix to see its path.

When Does Path Reconstruction Fail?

Path reconstruction produces undefined results when negative cycles exist. If d[i][j] = -∞ (the path from i to j passes through a negative cycle), the predecessor chain might loop forever. Always check for negative cycles before attempting path reconstruction.

python
def safe_reconstruct(d, pi, i, j):
    """Reconstruct path, handling negative cycles and unreachable pairs."""
    # Check for negative cycle
    for k in range(len(d)):
        if d[k][k] < 0:  # negative cycle exists
            if d[i][k] < float('inf') and d[k][j] < float('inf'):
                return None  # path goes through neg. cycle
    # Check unreachable
    if d[i][j] >= float('inf'):
        return []  # no path exists
    # Safe to reconstruct
    return reconstruct_path(pi, i, j)

Quick check: During Floyd-Warshall, when we discover that d[i][k] + d[k][j] < d[i][j], we update Π[i][j]. What do we set it to?

k — the intermediate vertex we just went through i — the source vertex Π[k][j] — the predecessor of j on the shortest path from k to j. This correctly handles multi-hop sub-paths from k to j.

Chapter 4: Transitive Closure

Sometimes you do not care about shortest distances — you just want to know: is there any path from i to j? This is the transitive closure problem. It answers reachability for all pairs of vertices.

From Floyd-Warshall to Warshall

The transitive closure of a directed graph G is a boolean matrix T where T[i][j] = 1 if and only if there exists a path (of any length) from vertex i to vertex j.

We could run Floyd-Warshall and check which entries are not ∞. But there is a cleaner approach: replace the arithmetic operations. Instead of (min, +) we use (OR, AND):

t^(k)[i][j] = t^(k-1)[i][j] OR ( t^(k-1)[i][k] AND t^(k-1)[k][j] )

In words: there is a path from i to j using intermediates {1, ..., k} if either (a) there was already a path using intermediates {1, ..., k-1}, or (b) there is a path from i to k AND a path from k to j, both using intermediates {1, ..., k-1}.

Same structure, different algebra. Floyd-Warshall uses (min, +) on real numbers. Transitive closure uses (OR, AND) on booleans. Both have the same three-loop structure. This pattern generalizes to any closed semiring — a powerful abstraction that unifies shortest paths, reachability, regular expressions on graphs, and more.

Implementation

python
def transitive_closure(adj, n):
    """Compute transitive closure using Warshall's algorithm.
    adj: n x n boolean adjacency matrix (True if edge exists).
    Returns n x n boolean reachability matrix."""
    t = [[adj[i][j] or (i == j) for j in range(n)] for i in range(n)]
    for k in range(n):
        for i in range(n):
            for j in range(n):
                t[i][j] = t[i][j] or (t[i][k] and t[k][j])
    return t

Time: Θ(V³). Space: Θ(V²). Same as Floyd-Warshall, but with boolean operations which are faster in practice (bit manipulation, no floating point).

Optimization: Bitwise Operations

Since the matrix entries are booleans, we can pack an entire row into a bitset (a single integer). Then t[i][j] = t[i][j] OR (t[i][k] AND t[k][j]) becomes a single bitwise OR of entire rows:

if t[i][k]: // does i reach k?
t[i] |= t[k] // merge k's reachability into i's row (bitwise OR)

This processes an entire row in one operation, giving a practical speedup of 64x on 64-bit machines (processing 64 booleans per word).

Worked Example: 5-Vertex Transitive Closure

Consider 5 vertices {A, B, C, D, E} with edges: A→B, B→C, B→D, D→A, D→C, E→D.

Initial reachability T⁽⁰⁾ (TRUE on diagonal + direct edges):

   A  B  C  D  E
A[ T  T  F  F  F ]
B[ F  T  T  T  F ]
C[ F  F  T  F  F ]
D[ T  F  T  T  F ]
E[ F  F  F  T  T ]

k = A: Check if routing through A helps. For each pair (i, j): if i can reach A AND A can reach j, then i can reach j. A reaches B. D reaches A. E does not reach A. So D can now reach B (via D→A→B).

k = B: A reaches B. B reaches C and D. So A can now reach C (A→B→C) and A can reach D (A→B→D). Also D now reaches B (from previous step), so D can reach C (D→...→B→C) and D can reach D (already true).

k = D: This is where it gets interesting. A reaches D (from k=B). D reaches A, C, and now B. So A can reach A (A→...→D→A — a cycle!), A can reach C (already true), and E can reach A (E→D→A), E can reach B (E→D→...→B), E can reach C (E→D→C).

After all 5 passes, the final matrix shows: A, B, D form a strongly connected component (all can reach all). E can reach everything. C is a sink (can reach only itself).

Transitive Closure Animation

Watch the reachability matrix fill in as k increases. Green cells are TRUE (path exists). Gray cells are FALSE. The current intermediate vertex k is highlighted on the graph.

k = 0 (initial). Click Step for k = 1.

Applications of Transitive Closure

Application	Vertices	Edges	Question
Database query optimization	Relations	Foreign keys	Can table A join to table B?
Dependency analysis	Modules/packages	Import statements	Does module A (transitively) depend on B?
Social networks	Users	Follow relationships	Can user A see user B's content (through follows)?
Compiler optimization	Variables	Assignments/aliases	Can variable x affect variable y?

Transitive Closure in Python: Bitwise Optimization

For large graphs, the bitwise optimization makes a significant practical difference:

python
def transitive_closure_bitwise(n, edges):
    """Transitive closure using Python integers as bitsets.
    Each row is a single integer — O(n^3/w) where w = word size."""
    # Initialize: bit j of row[i] = 1 iff edge i→j or i==j
    row = [0] * n
    for i in range(n):
        row[i] |= (1 << i)  # self-reachability
    for u, v in edges:
        row[u] |= (1 << v)

    # Warshall's algorithm with bitwise OR
    for k in range(n):
        for i in range(n):
            if row[i] & (1 << k):  # does i reach k?
                row[i] |= row[k]    # merge k's reachability into i

    # Convert back to matrix
    return [[(row[i] >> j) & 1 == 1 for j in range(n)] for i in range(n)]

In Python, integers have arbitrary precision, so each row is stored as a single integer regardless of n. The |= operation processes all n bits at once (well, in chunks of 64 bits on a 64-bit machine). For n = 1000, this is ~15x faster than the boolean version.

Applications in graph theory. Transitive closure answers questions like: "Which vertices are in the same strongly connected component?" If T[i][j] AND T[j][i] are both true, then i and j are in the same SCC. This gives an O(V³) algorithm for finding SCCs — slower than Tarjan's O(V+E) but conceptually simpler.

Quick check: Transitive closure uses (OR, AND) instead of (min, +). What is the analogous "base case" — what does t⁽⁰⁾[i][j] equal?

TRUE if i = j or there is a direct edge from i to j. FALSE otherwise. We always start by knowing vertices can reach themselves and their direct neighbors. 0 on the diagonal, ∞ elsewhere TRUE everywhere — assume all pairs are reachable, then prune

Chapter 5: Johnson's Algorithm

Floyd-Warshall is Θ(V³) regardless of how many edges exist. For sparse graphs (E much less than V²), that is wasteful. Johnson's algorithm combines Bellman-Ford and Dijkstra to achieve O(V² log V + VE) — a huge win when the graph is sparse.

The Problem with Negative Edges

We want to run Dijkstra from every vertex (that is the fast part), but Dijkstra requires non-negative edges. Some edges might be negative. Can we make all edges non-negative without changing shortest paths?

Naive idea: add a large constant C to every edge weight. This does NOT work! It changes which paths are shortest. A path with 5 edges gains 5C, while a path with 2 edges gains only 2C. The path with fewer edges becomes relatively cheaper.

The reweighting trick is the soul of Johnson's algorithm. Instead of adding the same constant to every edge, we add a vertex-dependent offset: w'(u, v) = w(u, v) + h(u) - h(v). This is called a potential function. The key property: for ANY path from s to t, the total weight changes by exactly h(s) - h(t), regardless of the path taken. So all paths between the same endpoints shift by the same amount, preserving which path is shortest.

The Potential Function h

We need h(v) for each vertex such that w'(u, v) = w(u, v) + h(u) - h(v) ≥ 0 for all edges (u, v). Where do we get h?

The trick: add a new vertex s with zero-weight edges to every other vertex. Run Bellman-Ford from s. Set h(v) = δ(s, v) — the shortest-path weight from s to v.

Why does this guarantee non-negative reweighted edges? By the triangle inequality of shortest paths: δ(s, v) ≤ δ(s, u) + w(u, v). Rearranging: w(u, v) + δ(s, u) - δ(s, v) ≥ 0. That is exactly w'(u, v) ≥ 0.

Step 1: Add auxiliary vertex s

Create vertex s with edges (s, v) of weight 0 for every vertex v. This does not affect shortest paths between original vertices.

↓

Step 2: Bellman-Ford from s

Compute h(v) = δ(s, v) for all v. If Bellman-Ford detects a negative cycle, STOP — no solution exists. Time: O(VE).

↓

Step 3: Reweight all edges

Set w'(u, v) = w(u, v) + h(u) - h(v). All reweighted edges are ≥ 0. Time: O(E).

↓

Step 4: Run Dijkstra from every vertex

Compute d'[u][v] for all pairs using reweighted edges. Time: O(V(V + E) log V) with binary heap.

↓

Step 5: Undo reweighting

d[u][v] = d'[u][v] - h(u) + h(v). Recover true shortest-path distances. Time: O(V²).

Why the Reweighting Preserves Shortest Paths

Consider any path p from u to v: u = v₀ → v₁ → ... → v_k = v. The reweighted cost is:

w'(p) = ∑_i w'(v_i-1, v_i)
     = ∑_i [ w(v_i-1, v_i) + h(v_i-1) - h(v_i) ]
     = w(p) + h(v₀) - h(v_k)   // telescoping sum!
     = w(p) + h(u) - h(v)

Every path from u to v has its weight shifted by the same amount h(u) - h(v). So the path that minimizes w(p) also minimizes w'(p). Shortest paths are preserved.

Implementation

python
import heapq, math

def johnsons(graph, n):
    """Johnson's algorithm for all-pairs shortest paths.
    graph: dict of {u: [(v, w), ...]} adjacency list.
    Returns n x n distance matrix or None if negative cycle."""

    # Step 1: Add auxiliary vertex s = n
    s = n
    aug = {v: lst[:] for v, lst in graph.items()}
    aug[s] = [(v, 0) for v in range(n)]

    # Step 2: Bellman-Ford from s
    h = [math.inf] * (n + 1)
    h[s] = 0
    for _ in range(n):  # n iterations (n+1 vertices)
        for u in aug:
            for v, w in aug[u]:
                if h[u] + w < h[v]:
                    h[v] = h[u] + w

    # Check for negative cycle
    for u in aug:
        for v, w in aug[u]:
            if h[u] + w < h[v]:
                return None  # negative cycle!

    # Step 3: Reweight edges
    rg = {v: [] for v in range(n)}
    for u in range(n):
        for v, w in graph[u]:
            rg[u].append((v, w + h[u] - h[v]))

    # Step 4: Dijkstra from every vertex
    D = [[math.inf] * n for _ in range(n)]
    for src in range(n):
        dist = [math.inf] * n
        dist[src] = 0
        pq = [(0, src)]
        while pq:
            d_u, u = heapq.heappop(pq)
            if d_u > dist[u]: continue
            for v, w_prime in rg[u]:
                if dist[u] + w_prime < dist[v]:
                    dist[v] = dist[u] + w_prime
                    heapq.heappush(pq, (dist[v], v))
        # Step 5: Undo reweighting
        for v in range(n):
            D[src][v] = dist[v] - h[src] + h[v]

    return D

Worked Example: Johnson's on Our 5-Vertex Graph

Let us trace Johnson's on the same graph from Chapter 2: edges 1→2 (3), 1→3 (8), 2→3 (2), 2→4 (1), 3→2 (4), 4→1 (2), 4→3 (-5), 1→5 (7), 5→4 (6).

Step 1: Add vertex s with zero-weight edges to all vertices. s→1 (0), s→2 (0), s→3 (0), s→4 (0), s→5 (0).

Step 2: Bellman-Ford from s. Since all edges from s have weight 0, and the original graph has a -5 edge from 4→3, the shortest paths from s are:

h(1) = δ(s, 1) = 0
h(2) = δ(s, 2) = 0
h(3) = δ(s, 3) = -5 // s→4→3: 0 + (-5) = -5
h(4) = δ(s, 4) = 0
h(5) = δ(s, 5) = 0

Wait — let us be more careful. Bellman-Ford runs V iterations. After iteration 1, we have h(3) = min(0, h(4) + (-5)) = min(0, 0 + (-5)) = -5. After further iterations, no changes. The h values stabilize.

Step 3: Reweight. w'(u, v) = w(u, v) + h(u) - h(v):

w'(1, 2) = 3 + 0 - 0 = 3
w'(1, 3) = 8 + 0 - (-5) = 13 // increased because h(3) is very negative
w'(2, 3) = 2 + 0 - (-5) = 7
w'(2, 4) = 1 + 0 - 0 = 1
w'(3, 2) = 4 + (-5) - 0 = -1... wait, that is negative!

Hmm, let me recalculate h more carefully. Running Bellman-Ford from s on the augmented graph requires multiple iterations to converge through all paths. After convergence: h = [0, 0, -5, 0, 0] is not correct. Let me trace again.

Actually h(3) = -5 only if we can reach 3 from s with cost -5. Path s→4→3 = 0 + (-5) = -5. Path s→3 = 0. So h(3) = -5. But then w'(3, 2) = 4 + (-5) - 0 = -1 < 0. The triangle inequality would not be satisfied if the shortest path was computed incorrectly.

The issue: s→2→4→3 = 0 + 1 + (-5) = -4. s→1→2→4→3 = 0 + 3 + 1 + (-5) = -1. The shortest is s→4→3 or more precisely checking all paths, h(3) = min(0, -5, -4, -1, ...) = -5 (via s→4→3). But then w'(3, 2) = 4 + h(3) - h(2) = 4 + (-5) - 0 = -1.

We need to check: does h satisfy δ(s, 3) ≤ δ(s, 2) + w(3, 2)? That is -5 ≤ 0 + 4 = 4. Not helpful. The triangle inequality says δ(s, v) ≤ δ(s, u) + w(u, v), which gives w'(u, v) = w(u, v) + h(u) - h(v) ≥ 0. For edge (3, 2): w'(3, 2) = 4 + (-5) - 0 = -1. But h(2) should actually be -1 because δ(s, 2) = min(0, -5+4) = -1 via s→4→3→2. Let us re-run Bellman-Ford properly:

// Bellman-Ford from s (V = 5 iterations)
Init: h = [0, 0, 0, 0, 0]
Iter 1: s→all (h unchanged). Edge 4→3: h[3] = min(0, 0-5) = -5
Iter 2: Edge 3→2: h[2] = min(0, -5+4) = -1
Iter 3: Edge 2→4: h[4] = min(0, -1+1) = 0 (no change)
Iter 4: No changes. Converged.
Final: h = [0, -1, -5, 0, 0]

NOW let us reweight: w'(u, v) = w(u, v) + h(u) - h(v):

w'(1, 2) = 3 + 0 - (-1) = 4
w'(1, 3) = 8 + 0 - (-5) = 13
w'(2, 3) = 2 + (-1) - (-5) = 6
w'(2, 4) = 1 + (-1) - 0 = 0
w'(3, 2) = 4 + (-5) - (-1) = 0
w'(4, 1) = 2 + 0 - 0 = 2
w'(4, 3) = -5 + 0 - (-5) = 0 ← the -5 edge becomes 0!
w'(1, 5) = 7 + 0 - 0 = 7
w'(5, 4) = 6 + 0 - 0 = 6

All reweighted edges are ≥ 0. The negative edge disappeared! Now Dijkstra works on this reweighted graph. After running Dijkstra from each vertex and undoing the reweighting (d[u][v] = d'[u][v] - h(u) + h(v)), we get the same distance matrix as Floyd-Warshall.

The beauty of the potential function. Notice how the -5 edge became 0 after reweighting. That is because vertex 3's very negative h-value (-5) means "vertex 3 is cheap to reach." When we compute w'(4, 3) = -5 + h(4) - h(3) = -5 + 0 - (-5) = 0, the negative weight is exactly canceled by the potential difference. The potential function redistributes the edge weights so that no edge is negative, while preserving the relative ordering of path costs.

Let us verify that reweighting preserves shortest paths. The shortest path from 1 to 3 in the original graph is 1 → 2 → 4 → 3 with cost 3 + 1 + (-5) = -1. In the reweighted graph, this path costs 4 + 0 + 0 = 4. The telescope formula says: w'(path) = w(path) + h(1) - h(3) = -1 + 0 - (-5) = 4. Correct!

After Dijkstra computes d'[1][3] = 4 in the reweighted graph, we recover the true distance: d[1][3] = d'[1][3] - h(1) + h(3) = 4 - 0 + (-5) = -1. This matches our earlier Floyd-Warshall result.

Johnson's Algorithm Visualized

Watch the three phases: (1) Bellman-Ford computes h values, (2) edges are reweighted (all become non-negative), (3) Dijkstra runs from each vertex. Use the buttons to step through each phase.

Phase 1: Bellman-Ford. Click Step.

Complexity

Bellman-Ford: O(VE) for the augmented graph. This is the "one-time cost" of computing the potential function.

Reweighting: O(E). A single pass over all edges.

Dijkstra × V: O(V(V + E) log V) with a binary heap, or O(V² log V + VE) with a Fibonacci heap.

Total: O(V² log V + VE). For sparse graphs (E = O(V)), this is O(V² log V), compared to Floyd-Warshall's Θ(V³).

When Johnson's Beats Floyd-Warshall

The crossover point depends on the edge density. Let us compare:

Edge Density	E	Floyd-Warshall	Johnson's	Winner
Very sparse	O(V)	V³	V² log V	Johnson's by V/log V
Moderate	O(V log V)	V³	V² log V	Johnson's
Medium	O(V¹·⁵)	V³	V²·⁵	Johnson's
Dense	O(V²)	V³	V³ log V	Floyd-Warshall
Complete	V(V-1)	V³	V³ log V	Floyd-Warshall

Johnson's dominates unless E = Ω(V²/log V). In practice, many real-world graphs are sparse (road networks, social graphs, web graphs), making Johnson's the practical choice for large-scale all-pairs computations.

Correctness Proof Sketch

Why does Johnson's produce correct results?

1. Bellman-Ford correctness: Bellman-Ford correctly computes shortest paths from s in the augmented graph (no negative cycles by assumption). So h(v) = δ(s, v) satisfies the triangle inequality: h(v) ≤ h(u) + w(u, v).

2. Non-negative reweighting: w'(u, v) = w(u, v) + h(u) - h(v) ≥ 0 follows directly from the triangle inequality.

3. Path weight preservation: For any path p from u to v, w'(p) = w(p) + h(u) - h(v). All paths between the same endpoints are shifted by the same constant, so the shortest path under w is also shortest under w'.

4. Dijkstra correctness: Since all reweighted edges are non-negative, Dijkstra correctly finds shortest paths in the reweighted graph.

5. Undo reweighting: d(u, v) = d'(u, v) - h(u) + h(v) recovers the true shortest-path weight.

Each step's correctness follows from the previous step. The chain is: Bellman-Ford → valid potential → non-negative reweighting → Dijkstra works → undo gives correct answer.

Complete Data Flow Through Johnson's

For implementation clarity, here is exactly what data flows through each phase:

Input

Adjacency list: {0: [(1, 3), (2, 8), ...], ...}. V vertices, E edges. Some edges may be negative.

↓

Bellman-Ford Output

Array h of V floats. h[v] = shortest distance from auxiliary vertex s to v. Time: O(VE).

↓

Reweighted Graph

Same structure, but w'(u,v) = w(u,v) + h[u] - h[v]. All weights ≥ 0. Size: O(V + E).

↓

V Dijkstra Outputs

V arrays of V floats each. Runtime: O(V(V+E) log V) total.

↓

Final Distance Matrix

V × V matrix: d[u][v] = d'[u][v] - h[u] + h[v]. True shortest-path distances.

Why Add an Auxiliary Vertex?

A natural question: why not just pick an existing vertex as the source for Bellman-Ford? The problem is that an existing vertex might not be able to reach all other vertices. The auxiliary vertex s with zero-weight edges to every vertex guarantees that every vertex is reachable from s with cost 0 (before any negative edges are considered). This ensures h(v) is well-defined for all v.

Could we use h(v) = 0 for all v (skip Bellman-Ford entirely)? Only if all edges are already non-negative. If any edge (u, v) has negative weight, then w'(u, v) = w(u, v) + 0 - 0 = w(u, v) < 0, and Dijkstra would fail.

Could we use h(v) = min incoming edge weight? No — this local approach does not guarantee w'(u, v) ≥ 0 for all edges. The potential function must be globally consistent, which is exactly what Bellman-Ford provides.

Johnson's in one sentence. Use Bellman-Ford to find a potential function that makes all edges non-negative, then exploit that non-negativity to run the faster Dijkstra algorithm V times.

Space Complexity

Johnson's is more space-efficient than Floyd-Warshall for sparse graphs:

Component	Space
h array (potential function)	O(V)
Reweighted adjacency list	O(V + E)
Dijkstra priority queue	O(V) per run
Output distance matrix	O(V²)
Total	O(V² + E)

Floyd-Warshall also uses O(V²) for the distance matrix. The main space advantage of Johnson's is that the intermediate data structures scale with E, not V². For sparse graphs (E ≪ V²), this can matter.

Quick check: Johnson's algorithm reweights edge (u, v) as w'(u, v) = w(u, v) + h(u) - h(v). Why does this telescoping trick preserve shortest paths?

Every path from u to v has its total weight shifted by exactly h(u) - h(v), regardless of the number of edges. Since all paths between the same pair shift equally, the ordering (which is shortest) is preserved. The potential function h is always positive, so adding it cannot create negative edges Dijkstra's algorithm automatically compensates for the reweighting

Chapter 6: All-Pairs Showcase

This is the payoff. Build a graph, pick an algorithm, and watch it compute ALL shortest paths in real time. Click any pair in the distance matrix to see the shortest path highlighted on the graph.

All-Pairs Shortest Paths: Build, Solve, Explore

Click on the canvas to place vertices (up to 7). Then click "Add Edge" mode and click two vertices to add a directed edge (enter weight in the prompt). Choose Floyd-Warshall or Johnson's, hit "Solve", and watch the distance matrix fill in. Click any cell in the matrix to see the path.

Click canvas to place vertices.

Experiment ideas. (1) Load the "Dense" preset and run Floyd-Warshall. Notice how every cell fills in. (2) Load "Sparse" and compare Floyd-Warshall vs Johnson's — Johnson's is faster for sparse graphs. (3) Load "Neg. Weights" — Dijkstra alone would fail here, but Johnson's reweights to make it work. (4) Build your own graph and try to predict the distance matrix before solving.

What to Look For

When you run the simulation, pay attention to these patterns:

Symmetry in undirected-like graphs. If your graph has edges in both directions with similar weights, the distance matrix will be approximately symmetric: d[i][j] ≈ d[j][i]. But directed graphs can be wildly asymmetric — it might cost 2 to go from A to B but 100 to go back.

The infinity entries. An ∞ in the distance matrix means "no path exists." This tells you about the connectivity structure. If many cells are ∞, the graph is poorly connected. A strongly connected graph has no ∞ entries (every vertex can reach every other).

Negative entries. If you see negative distances, it means the path exploits one or more negative-weight edges. Check the path by clicking the cell — you will see it routes through negative edges.

The diagonal. Should always be 0 (unless there is a negative cycle, in which case the diagonal entry goes negative — see Chapter 7).

Understanding the Distance Matrix

The distance matrix has rich structure worth studying:

The diagonal is always zero (assuming no negative cycles). d[i][i] = 0 because the shortest "path" from a vertex to itself is the empty path with zero cost.

Asymmetry. In directed graphs, d[i][j] ≠ d[j][i] in general. The cost from A to B might be completely different from B to A. This is unlike undirected graphs where d[i][j] = d[j][i].

Triangle inequality. For all i, j, k: d[i][j] ≤ d[i][k] + d[k][j]. If there were a violation, then routing through k would give a shorter path from i to j, contradicting the optimality of d[i][j]. This is a fundamental property of shortest-path distances and makes them a metric (on strongly connected graphs with non-negative weights).

Infinity entries reveal structure. If d[i][j] = ∞ for many pairs, the graph is poorly connected. The pattern of ∞ entries reveals the graph's connectivity structure: which vertices can reach which. This is exactly the information captured by the transitive closure.

Building Your Own Graph

To build a custom graph: (1) Set mode to "Place Vertex" and click to place vertices. (2) Switch to "Add Edge" mode. (3) Click a source vertex, then a destination vertex. (4) Enter the edge weight when prompted (negative values allowed). (5) Repeat for all edges. (6) Choose an algorithm and click Solve.

Try these challenges:

Challenge	Goal
The triangle	Build 3 vertices with edges forming a triangle. Can you predict all 6 off-diagonal distances?
The shortcut	Build 4 vertices in a line (1→2→3→4) with weight 1 each, then add a direct edge from 1 to 4 with weight 10. Solve — does the direct edge ever get used?
The negative trick	Build a graph where the shortest path between two vertices is negative. What edge weights did you need?
Disconnected components	Build two separate triangles (no edges between them). What does the distance matrix look like?

Chapter 7: Negative Cycles

All-pairs shortest paths are undefined when a negative-weight cycle is reachable. A negative cycle is a cycle whose total edge weight is negative — you can loop around it infinitely, driving the path cost to negative infinity. Both Floyd-Warshall and Johnson's can detect them.

Detection in Floyd-Warshall

After Floyd-Warshall completes, check the diagonal of the distance matrix. If d[i][i] < 0 for any vertex i, then vertex i lies on a negative cycle.

// After Floyd-Warshall
for i in range(n):
if d[i][i] < 0:
print(f"Vertex {i} is on a negative cycle")

Why does this work? d[i][i] is the shortest "path" from i back to itself. Normally this is 0 (the empty path). But if there is a negative cycle reachable from i and that also reaches i, then we can go around the cycle and return to i with negative cost.

Detection in Johnson's

Johnson's detects negative cycles during the Bellman-Ford phase (Step 2). If after V-1 relaxation passes any edge can still be relaxed, a negative cycle exists. This is exactly the standard Bellman-Ford negative cycle detection.

What Happens to the Distance Matrix?

When a negative cycle exists, some entries in the distance matrix are -∞. Specifically, d[i][j] = -∞ if and only if there exists a path from i to j that passes through a vertex on a negative cycle. Such a path can be made arbitrarily cheap by looping around the negative cycle.

Interview alert: "Can Floyd-Warshall detect negative cycles?" YES — check d[i][i] < 0 on the diagonal. "Can Dijkstra detect negative cycles?" NO — Dijkstra does not even work correctly with negative edges, let alone detect negative cycles. This distinction comes up frequently.

Negative Cycle Detection

This graph has a negative cycle: 2 → 3 → 4 → 2 with total weight -2. Run Floyd-Warshall and watch d[2][2], d[3][3], d[4][4] go negative (shown in red on the diagonal). The negative cycle vertices pulse in red on the graph.

Click Run to detect the negative cycle.

Which Entries Are Affected?

When a negative cycle exists, it does not ruin the entire distance matrix — only specific entries. An entry d[i][j] is -∞ if and only if there is a path from i to some vertex on a negative cycle, and a path from that cycle to j. Vertices that cannot reach or be reached from the cycle still have well-defined shortest-path distances.

Formally, after Floyd-Warshall:

d[i][j] = -∞ if ∃ vertex k such that d[i][k] < ∞ AND d[k][k] < 0 AND d[k][j] < ∞
d[i][j] = finite otherwise (assuming a path exists)

In the simulation, the negative cycle is 2 → 3 → 4 → 2 with total weight 1 + (-3) + (-1) = -3. Vertices 2, 3, and 4 are all on this cycle, so d[2][2], d[3][3], and d[4][4] will all be negative. Vertex 1 can reach the cycle (1→2), so d[1][j] for j ∈ {2, 3, 4} will also be affected. Vertex 5 can reach the cycle (5→4), so d[5][j] will be affected too.

Handling Negative Cycles in Practice

In real applications, negative cycles usually represent bugs or arbitrage opportunities:

Domain	Negative Cycle Meaning	Response
Currency exchange	Arbitrage opportunity (free money!)	Exploit it (or report it)
Network routing	Routing loop with decreasing costs	Bug — fix the routing config
Game AI	Infinite reward loop	Cap the loop or redesign
Project scheduling	Contradictory constraints	Infeasible schedule — relax constraints

The Currency Arbitrage Application

This is a famous interview question. Suppose you have exchange rates between currencies:

USD → EUR: 0.85
EUR → GBP: 0.88
GBP → USD: 1.40

Starting with $1 USD: convert to EUR (0.85), then GBP (0.85 × 0.88 = 0.748), then back to USD (0.748 × 1.40 = 1.047). You end up with $1.047 — a 4.7% profit per cycle!

To detect this with Floyd-Warshall, take -log of each exchange rate as the edge weight. Then finding a negative cycle in the -log graph corresponds to finding a product of rates > 1 (arbitrage). Specifically: -log(0.85) + -log(0.88) + -log(1.40) = 0.163 + 0.128 + (-0.336) = -0.046 < 0. Negative cycle detected!

python
import math

def detect_arbitrage(currencies, rates):
    """Detect currency arbitrage using Floyd-Warshall.
    currencies: list of currency names
    rates: dict of {(from, to): exchange_rate}
    Returns: list of currencies in the arbitrage cycle, or None."""
    n = len(currencies)
    idx = {c: i for i, c in enumerate(currencies)}
    INF = float('inf')

    # Build -log weight matrix
    d = [[INF] * n for _ in range(n)]
    for i in range(n): d[i][i] = 0
    for (f, t), rate in rates.items():
        d[idx[f]][idx[t]] = -math.log(rate)

    # Floyd-Warshall
    for k in range(n):
        for i in range(n):
            for j in range(n):
                if d[i][k] + d[k][j] < d[i][j]:
                    d[i][j] = d[i][k] + d[k][j]

    # Check diagonal for negative cycles
    for i in range(n):
        if d[i][i] < -1e-10:  # small epsilon for float errors
            profit = math.exp(-d[i][i]) - 1
            return currencies[i], profit
    return None

# Example usage:
result = detect_arbitrage(
    ['USD', 'EUR', 'GBP'],
    {('USD','EUR'): 0.85, ('EUR','GBP'): 0.88, ('GBP','USD'): 1.40,
     ('EUR','USD'): 1.17, ('GBP','EUR'): 1.13, ('USD','GBP'): 0.71}
)
# Returns: ('USD', 0.047) — 4.7% profit per cycle

Quick check: After running Floyd-Warshall on a graph, you find d[3][3] = -4 and d[5][5] = 0. What can you conclude?

Vertex 3 lies on a negative cycle (d[3][3] < 0 means a negative-cost path from 3 back to itself). Vertex 5 is not on any negative cycle (d[5][5] = 0 means no cheaper-than-free round trip). Both vertices are on negative cycles The algorithm made an error since d[i][i] should always be 0

Chapter 8: Interview Arsenal

All-pairs shortest paths is a staple of system design and algorithm interviews. Here is your cheat sheet and drill set.

Decision Flowchart: Which Algorithm?

Are all edge weights ≥ 0?

Yes → Run Dijkstra V times. O(V² log V + VE). Simplest correct approach.

↓ No (negative edges)

Is the graph dense (E ≈ V²)?

Yes → Floyd-Warshall. Θ(V³). Simpler to code than Johnson's, and on dense graphs it matches Johnson's asymptotically.

↓ No (sparse, E ≪ V²)

Johnson's Algorithm

O(V² log V + VE). Best for sparse graphs with negative edges. More complex to implement but worth it for large sparse graphs.

Quick Reference Table

Algorithm	Time	Space	Neg. Edges?	Neg. Cycle Detect?	Best For
Floyd-Warshall	Θ(V³)	Θ(V²)	Yes	Yes (diagonal)	Dense, simplicity
Johnson's	O(V²logV+VE)	O(V+E)	Yes	Yes (BF phase)	Sparse + neg. edges
Dijkstra × V	O(V²logV+VE)	O(V+E)	No	No	Sparse, non-neg.
BF × V	O(V²E)	O(V+E)	Yes	Yes	Almost never
Matrix mult.	O(V³logV)	O(V²)	Yes	No	Theory only

Coding Drill: Floyd-Warshall from Scratch

The number one coding question for this topic: implement Floyd-Warshall. Here is the minimal template:

python
def floyd_warshall(n, edges):
    """edges: list of (u, v, w) tuples. 0-indexed vertices."""
    INF = float('inf')
    d = [[INF] * n for _ in range(n)]
    for i in range(n): d[i][i] = 0
    for u, v, w in edges: d[u][v] = w

    for k in range(n):
        for i in range(n):
            for j in range(n):
                if d[i][k] + d[k][j] < d[i][j]:
                    d[i][j] = d[i][k] + d[k][j]

    # Negative cycle check
    for i in range(n):
        if d[i][i] < 0:
            return None  # negative cycle
    return d

Common Interview Questions

Q: "Find the shortest path between every pair of cities." Classic Floyd-Warshall. If V ≤ 400, the Θ(V³) is fine. If V is in the thousands with few roads, consider Johnson's.

Q: "Detect if a currency exchange has an arbitrage opportunity." Model currencies as vertices, exchange rates as edges with weight -log(rate). A negative cycle in this graph means the product of exchange rates around a loop exceeds 1.0 — arbitrage! Use Floyd-Warshall or Bellman-Ford.

Q: "Given a graph, can every vertex reach every other vertex?" This is transitive closure. Run Warshall's algorithm (boolean version of Floyd-Warshall). Or: run DFS/BFS from every vertex, but that is O(V(V+E)) which is worse for dense graphs.

Q: "What is the diameter of a graph?" The diameter is max_i,j d[i][j] — the longest shortest path. Run all-pairs shortest paths, then scan the distance matrix for the maximum finite entry.

Q: "Find the minimum number of flights between any two airports." Unweighted version: BFS from every vertex, O(V(V+E)). Or use Floyd-Warshall with all weights = 1. For small V (≤ 500), Floyd-Warshall is simpler. For large sparse graphs, BFS × V is faster.

Q: "In a social network, find the person who minimizes the maximum distance to any other person." Compute all-pairs shortest paths. For each person i, find max_j d[i][j]. Return the person with the smallest such maximum. This is the eccentricity problem — the answer is the center of the graph.

Five Dimensions of Interview Questions

All-pairs shortest paths questions test five distinct skills:

1. CONCEPT

"Explain Floyd-Warshall's DP formulation." Test: can you state the recurrence and explain what d^(k)[i][j] means?

↓

2. DESIGN

"Dense graph with negative weights, V = 500. Which algorithm?" Test: can you pick the right tool (Floyd-Warshall)?

↓

3. CODE

"Implement Floyd-Warshall with path reconstruction." Test: can you write 15 lines of correct code?

↓

4. DEBUG

"My Floyd-Warshall gives wrong answers." Common bugs: k-loop not outermost, Π[i][j]=k instead of Π[k][j], not initializing diagonal to 0.

↓

5. FRONTIER

"Can we do better than V³?" Answer: for dense graphs, no known better algorithm. For sparse, Johnson's. For unweighted, Seidel's O(V^ω log V).

Coding Drill: Transitive Closure

python
def transitive_closure(n, edges):
    """Return boolean matrix: can vertex i reach vertex j?"""
    t = [[i == j for j in range(n)] for i in range(n)]
    for u, v in edges:
        t[u][v] = True
    for k in range(n):
        for i in range(n):
            for j in range(n):
                t[i][j] = t[i][j] or (t[i][k] and t[k][j])
    return t

Coding Drill: Graph Diameter

python
def graph_diameter(n, edges):
    """The longest shortest path (graph diameter)."""
    d = floyd_warshall(n, edges)
    if d is None:
        return None  # negative cycle
    INF = float('inf')
    diameter = 0
    for i in range(n):
        for j in range(n):
            if d[i][j] < INF:
                diameter = max(diameter, d[i][j])
    return diameter

Coding Drill: Shortest Path Between Specific Pair with Floyd-Warshall

Often in interviews, you need to not just find the distance but reconstruct the path. Here is a complete solution:

python
def all_pairs_with_paths(n, edges):
    """Floyd-Warshall with path reconstruction.
    Returns (dist_matrix, function to get path)."""
    INF = float('inf')
    d = [[INF] * n for _ in range(n)]
    nxt = [[None] * n for _ in range(n)]  # NEXT vertex (not predecessor)
    for i in range(n): d[i][i] = 0
    for u, v, w in edges:
        d[u][v] = w
        nxt[u][v] = v  # from u, go to v next

    for k in range(n):
        for i in range(n):
            for j in range(n):
                if d[i][k] + d[k][j] < d[i][j]:
                    d[i][j] = d[i][k] + d[k][j]
                    nxt[i][j] = nxt[i][k]  # go toward k first

    def get_path(u, v):
        if nxt[u][v] is None: return []
        path = [u]
        while path[-1] != v:
            path.append(nxt[path[-1]][v])
        return path

    return d, get_path

Predecessor vs. Next. There are two ways to reconstruct paths: (1) predecessor matrix Π[i][j] = the vertex just BEFORE j (trace backwards), or (2) next matrix nxt[i][j] = the vertex just AFTER i (trace forwards). Both work. The "next" approach is often cleaner for printing paths, while the "predecessor" approach matches CLRS notation.

Edge Cases to Watch

Edge Case	What Happens	How to Handle
Self-loops with negative weight	d[i][i] becomes negative after first pass	Detected by diagonal check
Disconnected graph	d[i][j] = ∞ for unreachable pairs	Check for ∞ before using distances
Parallel edges	Only keep the minimum-weight edge	Preprocess when building W
Very large V (> 5000)	V³ = 125 billion operations	Use Johnson's if sparse, or approximate

Algorithm Comparison: Operations vs Graph Size

Compare Floyd-Warshall (V³, orange) vs Johnson's on a sparse graph (V² log V + V·2V, teal) for V from 4 to 20. The gap widens as V grows.

Practical Performance Considerations

Asymptotic complexity does not tell the whole story. In practice:

Floyd-Warshall has excellent cache behavior. The inner two loops access the matrix in row-major order, which is cache-friendly on modern CPUs. The simple structure also allows SIMD vectorization. In practice, Floyd-Warshall is often faster than Johnson's for V up to ~1000 even on moderately sparse graphs.

Johnson's has overhead. Bellman-Ford + V Dijkstra runs involve heap operations, adjacency list traversals, and more complex data structures. The constant factors are higher than Floyd-Warshall's simple array access.

Parallelism. Floyd-Warshall is hard to parallelize because each layer d^(k) depends on d^(k-1). The matrix multiplication approach is more parallelizable (each matrix multiplication can be done in parallel). Johnson's has natural parallelism: the V Dijkstra runs are independent and can run on separate threads.

Property	Floyd-Warshall	Johnson's
Lines of code	~10	~40
Data structure	2D array only	Adjacency list + heap
Cache behavior	Excellent	Poor (pointer chasing)
Parallelizable	Hard	Easy (V independent runs)
Works with neg. edges	Yes	Yes
Practical crossover	Best for V ≤ 1000	Best for V > 1000, sparse

Quick check: You have a sparse graph with V = 10,000 vertices, E = 50,000 edges, and some negative edge weights (no negative cycles). Approximately how many operations does each algorithm need?

Floyd-Warshall: 10³ = 10¹². Too slow. Use Bellman-Ford × V instead. Both are O(V³), so it does not matter which one you use. Floyd-Warshall: V³ = 10¹² ops. Johnson's: V² log V + VE ≈ 10⁹ + 5 × 10⁸ ≈ 1.5 × 10⁹ ops. Johnson's is ~700x faster for this sparse graph. Use Johnson's.

Chapter 9: Connections

All-pairs shortest paths sits at the intersection of several fundamental algorithmic ideas. Let us trace the connections.

Where This Fits

Connection	Relationship	Key Insight
Ch 24: Single-source shortest paths	Foundation	Johnson's algorithm literally uses Bellman-Ford + Dijkstra as subroutines. All-pairs = "run single-source V times, but cleverly."
Ch 15: Dynamic Programming	Core technique	Floyd-Warshall IS dynamic programming. The intermediate vertex set {1, ..., k} is the DP state. The recurrence d^(k)[i][j] = min(d^(k-1)[i][j], d^(k-1)[i][k] + d^(k-1)[k][j]) is textbook DP.
Ch 26: Maximum flow	Application	Maximum flow algorithms often find shortest (or augmenting) paths as subroutines. The Edmonds-Karp algorithm uses BFS to find shortest augmenting paths.
Matrix multiplication	Algebraic connection	Shortest paths = matrix multiplication on the (min, +) semiring. Any improvement to min-plus matrix multiplication directly improves all-pairs shortest paths.
Transitive closure	Boolean special case	Replace (min, +) with (OR, AND) and you get reachability instead of distances. Same algorithm, different algebra.

Limitations of Floyd-Warshall

Limitation	Alternative
O(V³) even for sparse graphs	Johnson's: O(V² log V + VE)
Cannot handle negative cycles (undefined output)	Detect and report; no algorithm can "solve" negative cycles
Not parallelizable (DP dependencies between layers)	Matrix multiplication approach has better parallelism (Δ-stepping)
V > ~5000 becomes impractical	Approximate algorithms, or only compute pairs you need

Related Lessons

← Chapter 24: Single-Source Shortest Paths — Dijkstra, Bellman-Ford, DAG shortest paths. The building blocks that Johnson's algorithm combines.

Chapter 15: Dynamic Programming — Floyd-Warshall is one of the most elegant DP algorithms. If you understand the DP formulation (subsets of intermediate vertices), you understand Floyd-Warshall at a deeper level than just "three nested loops."

Chapter 26: Maximum Flow — Flow problems use shortest path subroutines. The min-cost max-flow problem is essentially a shortest path problem on a residual graph.

The Semiring Perspective

One of the deepest connections in this chapter is the semiring abstraction. A semiring is a set with two operations (call them ⊕ and ⊗) that satisfy certain algebraic laws. Three important semirings for graph algorithms:

Semiring	⊕	⊗	Problem Solved
Shortest path	min	+	All-pairs shortest paths
Boolean	OR	AND	Transitive closure (reachability)
Bottleneck	max	min	Widest path (max bandwidth between pairs)
Counting	+	×	Number of paths between pairs

All four problems are solved by the exact same three-loop algorithm — just plug in different ⊕ and ⊗. Floyd-Warshall is not just a shortest path algorithm; it is a template for solving problems on closed semirings. This is one of the most beautiful generalizations in algorithm design.

Historical Notes

Floyd-Warshall has a fascinating history. The algorithm was independently discovered by at least three people:

Who	When	Context
Stephen Warshall	1962	Published the transitive closure algorithm (boolean version)
Robert Floyd	1962	Generalized to shortest paths (real-valued version)
Bernard Roy	1959	Published an equivalent algorithm in French, 3 years earlier

It should perhaps be called the "Roy-Floyd-Warshall" algorithm, but history is not always fair. Johnson published his algorithm in 1977, fifteen years later, showing that the combination of Bellman-Ford and Dijkstra yields better asymptotics for sparse graphs.

Open Problems

All-pairs shortest paths has a tantalizing open problem: can we solve it in truly sub-cubic time? That is, O(V^3-ε) for some constant ε > 0? The matrix multiplication approach gives O(V³ log V) via repeated squaring, and sub-cubic min-plus matrix multiplication would give sub-cubic all-pairs shortest paths. But nobody has achieved sub-cubic min-plus multiplication for general matrices. The Williamssian barriers framework suggests this might be fundamentally hard.

For unweighted graphs (all weights are 0 or 1), Seidel's algorithm achieves O(V^ω log V) where ω < 2.372 is the matrix multiplication exponent. But for weighted graphs, Θ(V³) remains the best known.

Beyond CLRS: Modern Extensions

Research since CLRS has produced several important extensions:

Distance oracles (Thorup & Zwick, 2005): preprocess the graph in O(V^2+1/k) time and O(V^1+1/k) space to answer approximate distance queries in O(k) time. The approximation factor is 2k - 1. For k = 2, you get 3-approximate distances in O(1) time with O(V^3/2) space — much better than storing the full V² matrix.

Dynamic all-pairs shortest paths: when the graph changes (edges added or removed), can we update the distance matrix faster than recomputing from scratch? Demetrescu and Italiano (2004) gave an algorithm that handles each edge update in O(V²) amortized time, compared to O(V³) for full recomputation.

Parallel all-pairs: on a machine with p processors, Floyd-Warshall can be parallelized to O(V³/p) time with O(V²/p) memory per processor, using a blocked partitioning of the distance matrix. This is one of the classic algorithms for distributed computing courses.

Practical Algorithm Selection Guide

Here is a decision tree for real-world use, incorporating both theory and practice:

// Practical decision tree for all-pairs shortest paths
if V ≤ 500:
    Use Floyd-Warshall # Simple, fast, good cache behavior
elif all edges ≥ 0 and E < V²/10:
    Use Dijkstra × V # No need for Bellman-Ford overhead
elif some edges < 0 and E < V²/10:
    Use Johnson's # Handles negatives, exploits sparsity
elif V ≤ 5000:
    Use Floyd-Warshall # Dense enough that FW's cache wins
else:
    Consider approximate methods # V > 5000, V³ is too slow

In competitive programming, Floyd-Warshall is almost always the right choice because V is typically small (≤ 400) and the code is trivially short. In production systems, the graph is often too large for any cubic algorithm, and you need specialized solutions (distance oracles, hub labeling, contraction hierarchies) that exploit the structure of real-world graphs.

The algorithms in this chapter are foundational — they define the problem space and provide the baseline that all practical solutions are measured against. Understanding Floyd-Warshall deeply (the DP formulation, the in-place trick, the negative cycle detection) gives you the conceptual vocabulary to reason about any shortest-path problem, no matter how complex.

The big picture. Floyd-Warshall teaches you that dynamic programming can operate on graphs — the "state" is which vertices are allowed as intermediates. Johnson's teaches you that cleverly combining two algorithms (Bellman-Ford + Dijkstra) can beat either one alone. And the matrix multiplication view teaches you that shortest paths belong to a rich algebraic family of problems on semirings. All three perspectives will serve you well beyond this single problem.

A Mental Model for the Three Approaches

If you remember nothing else from this chapter, remember this mental model:

Approach	Analogy	DP State	Strength
Matrix mult.	Extend paths by one hop at a time	Number of edges used	Theoretical elegance
Floyd-Warshall	Add waypoints one at a time	Set of allowed intermediates	Simplicity (5 lines)
Johnson's	Remove negatives, then use fast tool	Hybrid: BF + Dijkstra	Speed on sparse graphs

Matrix multiplication asks: "What is the best path using at most m edges?" Floyd-Warshall asks: "What is the best path using intermediates from {1, ..., k}?" Johnson's says: "Transform the problem so we can use Dijkstra." Three completely different perspectives on the same problem, each revealing different structure.

Summary of Key Formulas

// Floyd-Warshall recurrence
d^(k)[i][j] = min( d^(k-1)[i][j], d^(k-1)[i][k] + d^(k-1)[k][j] )

// Johnson's reweighting
w'(u, v) = w(u, v) + h(u) - h(v) where h(v) = δ(s, v)

// Undo reweighting
δ(u, v) = δ'(u, v) - h(u) + h(v)

// Negative cycle detection
∃ negative cycle ⇔ ∃ i : d[i][i] < 0

// Transitive closure
t^(k)[i][j] = t^(k-1)[i][j] OR (t^(k-1)[i][k] AND t^(k-1)[k][j])

What Comes Next

This chapter completes the shortest-path trilogy in CLRS: single-source (Ch. 24), all-pairs (Ch. 25), and their applications throughout the rest of the book. The next chapter, Maximum Flow (Ch. 26), uses shortest paths internally (the Edmonds-Karp algorithm finds shortest augmenting paths). The min-cost max-flow problem directly combines flow and shortest paths.

Beyond CLRS, all-pairs shortest paths connects to:

Topic	Connection
Algebraic graph theory	The (min, +) semiring generalizes to other algebras on graphs
Metric spaces	The distance matrix D defines a metric on the graph vertices
Network design	Knowing all-pairs distances helps place facilities or design overlay networks
Approximation algorithms	Distance oracles trade space for query time: O(1) per query with O(V^1+1/k) space
Parallel algorithms	Matrix multiplication approach is more parallelizable than Floyd-Warshall

Final Thought

All-pairs shortest paths is one of those rare algorithmic problems where the solution is more beautiful than the problem. Floyd-Warshall's five lines of code hide a deep DP structure. Johnson's algorithm shows the power of problem transformation. And the matrix multiplication connection reveals that shortest paths belong to a vast algebraic landscape extending far beyond graphs.

Master these three algorithms — Floyd-Warshall for dense graphs, Johnson's for sparse, matrix multiplication for theory — and you have a complete toolkit for any shortest-path problem an interview or system design session can throw at you.

"The purpose of computation is insight, not numbers." — Richard Hamming

All-Pairs Shortest Paths

Chapter 0: The Problem

The Input and Output

Why Not Just Run Dijkstra V Times?

The Structure of the Solution

A Concrete Example: Five Warehouses

Adjacency Matrix vs Adjacency List

Chapter 1: Matrix Multiplication Approach

Shortest Paths as "Multiplication"

Repeated Squaring: O(V³ log V)

Worked Example

Implementation

The Connection to Regular Matrix Multiplication

From O(V4) to O(V³ log V)

Can We Do Better? The Sub-Cubic Dream

Chapter 2: Floyd-Warshall

The DP Formulation

Why Can We Use a Single Matrix? (Space Optimization)

Worked Example: 5 Vertices

The Elegant DP Structure

Floyd-Warshall vs. Bellman-Ford: The Deep Connection

Complete Hand-Trace: All 5 Passes

The Code: Five Beautiful Lines

What Happens with Wrong Loop Order?

Memory-Efficient Variant

Complexity Analysis

Chapter 3: Path Reconstruction

The Predecessor Matrix

Reconstructing a Path

Full Floyd-Warshall with Path Recovery

How the Predecessor Matrix Evolves

Time Complexity of Path Reconstruction

Alternative: "Next" Matrix Instead of Predecessor

Worked Example

When Does Path Reconstruction Fail?

Chapter 4: Transitive Closure

From Floyd-Warshall to Warshall

Implementation

Optimization: Bitwise Operations

Worked Example: 5-Vertex Transitive Closure

Applications of Transitive Closure

Transitive Closure in Python: Bitwise Optimization

Chapter 5: Johnson's Algorithm

The Problem with Negative Edges

The Potential Function h

Why the Reweighting Preserves Shortest Paths

Implementation

Worked Example: Johnson's on Our 5-Vertex Graph

Complexity

When Johnson's Beats Floyd-Warshall

Correctness Proof Sketch

Complete Data Flow Through Johnson's

Why Add an Auxiliary Vertex?

Space Complexity

Chapter 6: All-Pairs Showcase

What to Look For

Understanding the Distance Matrix

Building Your Own Graph

Chapter 7: Negative Cycles

Detection in Floyd-Warshall

Detection in Johnson's

What Happens to the Distance Matrix?

Which Entries Are Affected?

Handling Negative Cycles in Practice

The Currency Arbitrage Application

Chapter 8: Interview Arsenal

Decision Flowchart: Which Algorithm?

Quick Reference Table

Coding Drill: Floyd-Warshall from Scratch

Common Interview Questions

Five Dimensions of Interview Questions

Coding Drill: Transitive Closure

Coding Drill: Graph Diameter

Coding Drill: Shortest Path Between Specific Pair with Floyd-Warshall

Edge Cases to Watch

Practical Performance Considerations

Chapter 9: Connections

Where This Fits

Limitations of Floyd-Warshall

Related Lessons

From O(V⁴) to O(V³ log V)