Single-Source Shortest Paths (CLRS Ch. 24) — From Absolute Zero to Mastery

Chapter 0: The Problem

You open Google Maps, type "coffee shop near me," and it returns a route in 200 milliseconds. Behind that response is one of the most important algorithms in computer science: a shortest path algorithm running on a graph with millions of intersections and road segments, each weighted by travel time.

Let us build the intuition from scratch. A weighted graph is a set of vertices (cities, intersections, routers) connected by edges, where each edge has a numerical weight (distance, time, cost). The single-source shortest paths problem asks: given a starting vertex s, find the shortest-weight path from s to every other vertex in the graph.

This sounds simple. Just pick the shortest edge at every step, right? That greedy strategy fails spectacularly. The simulation below shows exactly why.

Greedy Fails: Why Locally Shortest Edges Lead Astray

The greedy path (always picks the lightest outgoing edge) is shown in orange. The true shortest path is shown in teal. Click "Run" to animate both strategies.

In the demo above, the greedy algorithm takes the edge with weight 1 from A, then gets stuck on a long path costing 10 total. The optimal path takes a heavier first edge (weight 3) but reaches the goal with total cost 5. Greedy fails because locally optimal choices do not guarantee a globally optimal path.

Think of it like driving. The freeway entrance is 3 miles away, but once you are on the freeway, you travel at 60 mph. The side street is right here (0 miles), but it is clogged with traffic and takes 30 minutes. Greedy picks the side street because the first step is shorter. A smarter algorithm considers the total journey.

Why This Problem Matters

Shortest paths show up everywhere in computer science and beyond:

Domain	Vertices	Edges	Weights
GPS navigation	Intersections	Road segments	Travel time or distance
Network routing	Routers	Links	Latency or bandwidth cost
Social networks	People	Friendships	Degrees of separation
Game AI	Grid cells or waypoints	Movement	Terrain cost
Currency arbitrage	Currencies	Exchange pairs	-log(exchange rate)
Project scheduling	Tasks	Dependencies	Task duration

Over the next chapters, we will build three different algorithms, each suited to a different type of graph:

Bellman-Ford

Works on any graph, even with negative edge weights. Detects negative-weight cycles. O(VE) time. The most general, but slowest.

↓

DAG Shortest Paths

Exploits the absence of cycles in a DAG. One pass in topological order. O(V+E) time. The fastest, but only works on DAGs.

↓

Dijkstra's Algorithm

Greedy done right, using a priority queue. Requires non-negative weights. O((V+E) log V) with a binary heap. The sweet spot for most real-world graphs.

Why three algorithms? Because no single algorithm is best for all graphs. Bellman-Ford handles any edge weights but is slow. DAG shortest paths is the fastest but only works on acyclic graphs. Dijkstra is the practical workhorse for most real-world graphs (non-negative weights) but breaks on negative edges. Knowing WHICH algorithm to use for a given problem is half the interview battle.

Optimal Substructure: Why Shortest Paths Are Solvable

The key theoretical property that makes shortest paths tractable is optimal substructure: sub-paths of shortest paths are themselves shortest paths.

Formally: if p = v₀ → v₁ → ... → v_k is a shortest path from v₀ to v_k, then for any 0 ≤ i ≤ j ≤ k, the sub-path v_i → ... → v_j is a shortest path from v_i to v_j.

Proof by contradiction: If there were a shorter path from v_i to v_j, we could splice it into p, replacing the sub-path v_i → ... → v_j. This would give a path from v₀ to v_k with smaller total weight than p, contradicting the assumption that p is a shortest path.

This property is crucial because it means we can solve shortest paths incrementally: knowing the shortest path to u, we can extend it by one edge to get a candidate shortest path to v. This is exactly what relaxation does.

Contrast with longest paths. Longest simple paths do NOT have optimal substructure. Consider a graph: A→B (5), B→C (3), A→C (10), C→A (2). The longest simple path from A to C is A→B→C (cost 8). But the sub-path from A to B in this path is A→B (cost 5), while the longest simple path from A to B is also A→B (cost 5). Seems fine so far, but the "simple" constraint (no repeated vertices) couples the sub-problems in ways that destroy the ability to combine optimal sub-solutions. This is why longest simple paths are NP-hard.

Formal Definitions

Let us nail down the notation before diving in. Given a weighted, directed graph G = (V, E) with weight function w : E → R:

// Weight of a path p = v₀ → v₁ → ... → v_k
w(p) = ∑_i=1^k w(v_i-1, v_i)

// Shortest-path weight from s to v
δ(s, v) = min { w(p) : p is a path from s to v }   if a path exists
δ(s, v) = ∞   if no path from s to v exists
δ(s, v) = -∞   if a negative-weight cycle is reachable on a path from s to v

That last case is critical: if you can reach a negative-weight cycle on the way from s to v, you can loop around that cycle forever, driving the total weight to negative infinity. There is no finite shortest path.

Shortest Paths vs Other Path Problems

Before moving on, let us distinguish shortest paths from related problems that students often confuse:

Problem	Question	Algorithm	Time
Reachability	Can we get from s to v?	BFS or DFS	O(V+E)
Shortest path (unweighted)	Fewest edges from s to v?	BFS	O(V+E)
Shortest path (weighted)	Minimum total weight from s to v?	Dijkstra / BF / DAG	Varies
Longest path (general)	Maximum total weight from s to v?	NP-hard in general!	No efficient algo
Longest path (DAG)	Maximum weight in a DAG?	Negate + DAG shortest	O(V+E)
All-pairs shortest	Shortest between every pair?	Floyd-Warshall / Johnson	O(V³) / O(VE + V(V+E)logV)

Longest path is NP-hard! On a general graph, finding the longest simple path is NP-hard — no known polynomial-time algorithm exists. This is fundamentally different from shortest paths, which we can solve efficiently. The reason: shortest paths have optimal substructure (sub-paths of shortest paths are themselves shortest paths), but longest paths do not (sub-paths of a longest simple path may not be longest simple paths, because the "simple" constraint — no repeated vertices — couples the sub-problems). DAGs are the exception because the acyclicity makes the "simple" constraint automatic.

Before we dive into any specific algorithm, there is one fundamental operation that all three share: relaxation. That is Chapter 1.

Quick check: A greedy shortest-path strategy always picks the lightest outgoing edge. On a graph with edges A→B (weight 1), A→C (weight 4), B→D (weight 8), C→D (weight 1), what path does greedy find from A to D, and what is the actual shortest path?

Greedy picks A→C→D (cost 5), which is optimal Greedy picks A→B→D (cost 9), but the shortest is A→B (cost 1) since D is unreachable otherwise Greedy picks A→B→D (cost 9) because it greedily takes the lightest first edge (1 < 4). The actual shortest path is A→C→D (cost 5)

Chapter 1: Relaxation

Every shortest path algorithm in this chapter — Bellman-Ford, DAG shortest paths, Dijkstra — is built from a single primitive operation called relaxation. Master this one operation and the algorithms become trivial variations of "which order do we relax edges in?"

The Setup: Initialize-Single-Source

Before any algorithm runs, we initialize two arrays for every vertex v in the graph:

// INITIALIZE-SINGLE-SOURCE(G, s)
for each vertex v ∈ G.V:
    d[v] = ∞       // best-known distance from s to v (upper bound)
    π[v] = NIL     // predecessor of v on the current best path
d[s] = 0           // distance from s to itself is 0

Think of d[v] as a sticky note on each vertex saying "the best we have found so far." Initially, we know nothing (d = ∞) except that the source is distance 0 from itself. As the algorithm runs, these sticky notes get updated to smaller and smaller values until they reach the true shortest-path weight δ(s, v).

The RELAX Operation

Relaxing an edge (u, v) with weight w(u, v) means: "Can we improve the shortest-path estimate to v by going through u?" If the current best path to u, plus the edge from u to v, is shorter than the current best path to v directly, then we have found a better path. Update d[v] and record u as v's predecessor.

// RELAX(u, v, w)
if d[v] > d[u] + w(u, v):   // is the path through u shorter?
    d[v] = d[u] + w(u, v)    // yes — update the estimate
    π[v] = u                // record that we came from u

The name comes from physics. Think of d[v] as a spring stretched to some length. Relaxation tries to let the spring relax to a shorter, less tense state. The value d[v] can only decrease over time — it is an upper bound that gradually tightens toward the true shortest-path weight δ(s, v).

The master insight. Bellman-Ford relaxes every edge, V-1 times. DAG shortest paths relaxes edges in topological order, one pass. Dijkstra relaxes edges out of the vertex with the smallest d[v], using a priority queue. They are all just different schedules for applying the same RELAX operation. The schedule determines the time complexity and what types of graphs the algorithm can handle.

Why "Relaxation"? The Physical Analogy

Imagine stretching a rubber band from s to v. The length of the rubber band is d[v] — our current estimate. The relaxation operation checks: "If I route the rubber band through u instead, can I let it relax to a shorter length?" If yes, the band snaps to the shorter configuration. It only gets shorter, never longer, and eventually reaches its minimum length δ(s, v) — the natural rest state. This is why the operation is called relaxation.

The analogy is precise. In physics, a system relaxes toward its minimum energy state. In shortest paths, the estimate d[v] relaxes toward the minimum-weight path δ(s, v). Both processes are monotonically decreasing and convergent.

Three Key Properties of Relaxation

Relaxation has beautiful mathematical properties that guarantee correctness. Let δ(s, v) denote the true shortest-path weight from s to v.

1. Upper-bound property. We always have d[v] ≥ δ(s, v) after initialization, and relaxation can only decrease d[v]. It never goes below the true shortest-path weight. Once d[v] = δ(s, v), it stays there forever.

2. No-path property. If there is no path from s to v, then δ(s, v) = ∞ and d[v] = ∞ for all time. No sequence of relaxations can create a path that does not exist in the graph.

3. Convergence property. If s ↝ u → v is a shortest path (meaning the shortest path to v ends with edge (u, v)), and if d[u] = δ(s, u) at some point before we relax edge (u, v), then after that relaxation, d[v] = δ(s, v). In other words: once the predecessor on the shortest path has the correct distance, one relaxation of the last edge locks in the correct answer for v.

4. Path-relaxation property. If p = v₀ → v₁ → ... → v_k is a shortest path from s = v₀ to v_k, and the edges (v₀, v₁), (v₁, v₂), ..., (v_k-1, v_k) are relaxed in that order (not necessarily consecutively), then d[v_k] = δ(s, v_k). This is why Bellman-Ford works: V-1 passes guarantee that the edges of any shortest path are relaxed in the correct order.

Worked Example by Hand

Let us trace relaxation on a concrete graph. Vertices: s, A, B, C. Edges: s→A (weight 5), s→B (weight 10), A→B (weight 3), A→C (weight 2), B→C (weight 1).

After initialization: d[s]=0, d[A]=∞, d[B]=∞, d[C]=∞.

Step	Relax	Test	Result	d[A]	d[B]	d[C]
1	(s, A, 5)	∞ > 0+5?	Yes, d[A]=5	5	∞	∞
2	(s, B, 10)	∞ > 0+10?	Yes, d[B]=10	5	10	∞
3	(A, B, 3)	10 > 5+3?	Yes, d[B]=8	5	8	∞
4	(A, C, 2)	∞ > 5+2?	Yes, d[C]=7	5	8	7
5	(B, C, 1)	7 > 8+1?	No change	5	8	7

Notice step 3: the direct path s→B has weight 10, but going s→A→B has weight 5+3=8. Relaxation discovered the shorter route through A. And step 5 did nothing, because the path through B (cost 8+1=9) is worse than the path through A (cost 5+2=7).

Relaxation Step-by-Step

Click "Step" to relax edges one at a time. Watch d[v] values decrease. The currently relaxed edge highlights in orange.

Click Step to begin.

Implementation

python
def initialize_single_source(graph, s):
    """Set d[v] = inf, pi[v] = None for all v, then d[s] = 0."""
    d = {v: float('inf') for v in graph.vertices}
    pi = {v: None for v in graph.vertices}
    d[s] = 0
    return d, pi

def relax(u, v, w, d, pi):
    """Try to improve d[v] by going through u.
    Returns True if the estimate was improved."""
    if d[v] > d[u] + w:
        d[v] = d[u] + w  # shorter path found!
        pi[v] = u        # update predecessor
        return True     # relaxation changed something
    return False        # no improvement

Predecessor pointers reconstruct the path. After all relaxations are done, follow π[v] backwards from any vertex v to s to reconstruct the actual shortest path, not just its weight. This is how GPS apps show you the turn-by-turn route, not just the ETA. The predecessor pointers form a shortest-path tree rooted at s.

The Shortest-Path Tree

The predecessor pointers π[v] define a tree rooted at s. This is the shortest-path tree: for each vertex v reachable from s, the unique path from s to v in this tree is a shortest path in the original graph.

Properties of the shortest-path tree:

Property	What It Means
Rooted at s	π[s] = NIL; all paths start at s
V-1 edges (if all reachable)	A tree on V vertices has exactly V-1 edges
Sub-path of tree path = shortest	Any prefix of a tree path from s is also a shortest path
Not unique	Multiple shortest-path trees may exist (different tie-breaking)
Subgraph of G	Every edge in the tree is an edge in the original graph

Visualize the shortest-path tree: if you draw only the edges (u, v) where π[v] = u, you get a tree that fans out from s. In a road network, this tree looks like a river system — a trunk road from s branching into smaller roads to reach every neighborhood.

Path reconstruction is O(V) in the worst case (following predecessors from v back to s). To reconstruct the path to a specific vertex:

python
def reconstruct(pi, v):
    """Follow predecessors from v to source."""
    path = []
    while v is not None:
        path.append(v)
        v = pi[v]
    return path[::-1]  # reverse: source → v

Relaxation Order Matters

The same set of relaxations can produce different intermediate results depending on the order. Consider two edges: s→A (5), s→B (3), B→A (1). If we relax s→A first, d[A] = 5. Then s→B: d[B] = 3. Then B→A: d[A] = min(5, 3+1) = 4. But if we relax in the order s→B, B→A, s→A: after s→B d[B]=3, after B→A d[A]=4, after s→A d[A]=min(4,5)=4. Same final answer, but different intermediate states. The final answer is always the same — this is guaranteed by the path-relaxation property — but the journey differs.

Relaxation check: We have d[u] = 3 and d[v] = 10. Edge (u, v) has weight 4. After relaxing (u, v), what is d[v]?

10 (no change) 7, because d[u] + w(u,v) = 3 + 4 = 7 < 10 = d[v], so we update d[v] to 7 4, because we take the edge weight directly

Chapter 2: Bellman-Ford

Now that we have the RELAX primitive, let us build our first complete shortest-path algorithm. The Bellman-Ford algorithm is brutally simple: relax every edge, and do it V-1 times. That is the entire algorithm.

Why V-1 Passes?

A shortest path in a graph with V vertices has at most V-1 edges. Why? Because a shortest path never revisits a vertex (if it did, it would contain a cycle, and removing that cycle would give a shorter or equal-weight path — unless the cycle has negative weight). So V vertices means at most V-1 edges in any shortest path.

After pass k of Bellman-Ford, every vertex reachable from s via a shortest path of at most k edges has its correct d[v] value. After V-1 passes, all shortest paths (which have at most V-1 edges) are correct. This follows directly from the path-relaxation property we proved in Chapter 1.

// BELLMAN-FORD(G, w, s)
INITIALIZE-SINGLE-SOURCE(G, s)

// Phase 1: V-1 relaxation passes
for i = 1 to |V| - 1:
    for each edge (u, v) ∈ E:
        RELAX(u, v, w)

// Phase 2: Negative-cycle detection
for each edge (u, v) ∈ E:
    if d[v] > d[u] + w(u, v):
        return FALSE   // negative cycle reachable from s
return TRUE           // all shortest paths are correct

Negative-Cycle Detection

The genius of Bellman-Ford is the second phase. After V-1 passes, all shortest-path distances should be finalized. If we scan all edges one more time and can still relax some edge (u, v) — meaning d[v] > d[u] + w(u, v) — it means there is a negative-weight cycle reachable from s.

Why? If all shortest paths had finite weight, V-1 passes would have found them all. The only reason relaxation is still possible is that going around a cycle keeps reducing the total weight. You can loop forever, so no finite shortest path exists to any vertex reachable from that cycle.

When to use Bellman-Ford. Use it when your graph may have negative edge weights. Real applications include: (1) currency arbitrage detection — a negative cycle in a graph of -log(exchange rates) means guaranteed profit; (2) network routing — BGP uses a distributed variant; (3) difference constraints — a system of inequalities x_j - x_i ≤ w_ij maps directly to shortest paths with potentially negative weights.

Walkthrough on a Graph with Negative Edges

Consider a graph with vertices {s, A, B, C, D} and edges including some negative weights: s→A (6), s→B (7), A→B (8), A→C (5), A→D (-4), B→C (-3), B→D (9), C→B (-2), D→C (7).

The negative edges A→D (-4) and B→C (-3) and C→B (-2) make Dijkstra fail, but Bellman-Ford handles them perfectly. Let us trace it.

Bellman-Ford Step-Through

Watch all edges get relaxed in each pass. Distance estimates update (shown in boxes). After V-1 passes, the algorithm checks for negative cycles. Click "Pass" to execute one full pass over all edges.

Pass 0 of 4. Click Pass to begin.

In the simulation, watch how edge B→C with weight -3 allows Bellman-Ford to find a shorter path to C through B. And C→B with weight -2 creates a feedback loop between B and C that settles over multiple passes. This cascading effect of negative edges is exactly what Dijkstra cannot handle — but Bellman-Ford patiently resolves by repeated passes.

Hand Trace of All Passes

Let us trace d[v] after each complete pass, relaxing edges in the order they appear above:

After Pass	d[A]	d[B]	d[C]	d[D]
Init	∞	∞	∞	∞
1	6	7	4	2
2	6	2	4	2
3	6	2	-1	2
4	6	-3	-1	2

After pass 4 (V-1 = 5-1 = 4), we run the negative-cycle check: scan all edges and see if any can still be relaxed. In this graph, none can, so there is no negative cycle. The final distances are correct.

Visualizing All Passes

Bellman-Ford: Edge-by-Edge Detail

This canvas shows a smaller graph where you can trace individual edge relaxations within a pass. Click "Relax Next" to process one edge at a time. The currently tested edge turns orange; green flash = update happened.

Click Relax Next to begin.

Time Complexity

Component	Cost	Why
Initialization	O(V)	Set d[v] = ∞ for each vertex
Main loop	O(VE)	V-1 passes, each relaxing all E edges
Neg-cycle check	O(E)	One scan of all edges
Total	O(VE)

For dense graphs where E = O(V²), this is O(V³). That is slow. But it is the price for handling negative edges and detecting negative cycles.

Space Complexity

Bellman-Ford needs only O(V) extra space: the d[] and π[] arrays. The graph itself takes O(V + E) for an adjacency list or O(E) for an edge list. Bellman-Ford works naturally with an edge list (it iterates over all edges per pass), so you do not even need an adjacency list. This makes it slightly simpler to implement than Dijkstra.

Correctness Proof Sketch

After pass i, every vertex reachable from s via a shortest path of at most i edges has its correct d[v]. This follows from the path-relaxation property: if the shortest path to v is s = v₀ → v₁ → ... → v_k with k ≤ i, then during pass 1 we relax (v₀, v₁), making d[v₁] correct. During pass 2 we relax (v₁, v₂), making d[v₂] correct (since d[v₁] is already correct from pass 1). And so on. After pass k ≤ V-1, d[v_k] = δ(s, v_k).

The negative-cycle detection works because: if no negative cycle exists, then all shortest paths have at most V-1 edges, so V-1 passes suffice. If a negative cycle exists, then some edges can be relaxed forever (going around the cycle always reduces the total). The V-th pass catches this.

Implementation

python
def bellman_ford(vertices, edges, source):
    """
    vertices: list of vertex labels
    edges: list of (u, v, weight) tuples
    source: starting vertex
    Returns: (dist, pred, has_neg_cycle)
    """
    # Initialize
    dist = {v: float('inf') for v in vertices}
    pred = {v: None for v in vertices}
    dist[source] = 0

    # Phase 1: V-1 relaxation passes
    for i in range(len(vertices) - 1):
        changed = False
        for u, v, w in edges:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                pred[v] = u
                changed = True
        if not changed:
            break  # early termination: no changes = done

    # Phase 2: Negative-cycle detection
    for u, v, w in edges:
        if dist[u] + w < dist[v]:
            return dist, pred, True   # negative cycle!

    return dist, pred, False

# Example
verts = ['s', 'A', 'B', 'C', 'D']
edges = [
    ('s', 'A', 6), ('s', 'B', 7),
    ('A', 'B', 8), ('A', 'C', 5),
    ('A', 'D', -4), ('B', 'C', -3),
    ('B', 'D', 9), ('C', 'B', -2),
    ('D', 'C', 7),
]
d, pi, neg = bellman_ford(verts, edges, 's')
print(d)    # shortest distances from s
print(neg)  # False (no negative cycle)

Early termination optimization. If an entire pass over all edges produces zero relaxations (no d[v] changed), then we are done early — the shortest-path tree is finalized. This can save several passes in practice, cutting average-case time significantly, though the worst case remains V-1 passes. The code above includes this optimization with the changed flag.

Why V-1 Passes and Not V-2 or V?

This is a common interview question. The answer comes from a simple counting argument. A shortest path in a graph with V vertices visits each vertex at most once (assuming no negative cycles). A path that visits k distinct vertices has k-1 edges. Since k ≤ V, the maximum number of edges in any shortest path is V-1.

After pass i of Bellman-Ford, we have correct distances for all vertices reachable via shortest paths with at most i edges. After pass V-1, we have correct distances for all shortest paths with at most V-1 edges — which is all of them. So V-1 passes suffice.

Could V-2 passes suffice? No. Consider a chain: s → v₁ → v₂ → ... → v_V-1. The shortest path to v_V-1 has exactly V-1 edges. After V-2 passes, d[v_V-1] might not be correct yet (it depends on the edge relaxation order within each pass). So V-1 is tight.

What about the V-th pass? The V-th pass is ONLY used for negative-cycle detection. If it produces any relaxation, a negative cycle exists. If it does not, V-1 passes were already sufficient.

Negative Cycle Detection: How and Why

Let us make negative cycles concrete. Consider three currencies: USD, EUR, GBP. Exchange rates: 1 USD = 0.9 EUR, 1 EUR = 0.8 GBP, 1 GBP = 1.5 USD. Starting with 1 USD, you can convert:

1 USD → 0.9 EUR → 0.72 GBP → 1.08 USD

// You gained 0.08 USD! Repeat: 1.08 → 1.1664 → 1.2597 → ...
// This is arbitrage — a negative cycle in -log(rate) space.

In graph terms: edge weights are -log(rate). The cycle USD → EUR → GBP → USD has total weight -log(0.9) + (-log(0.8)) + (-log(1.5)) = 0.105 + 0.223 + (-0.405) = -0.077. The sum is negative, so Bellman-Ford's V-th pass detects it as a relaxable edge.

What About SPFA?

The Shortest Path Faster Algorithm (SPFA) is a queue-based optimization of Bellman-Ford. Instead of relaxing all edges in every pass, SPFA maintains a queue of "active" vertices — those whose d[v] changed recently. Only edges from active vertices are relaxed.

python
from collections import deque

def spfa(adj, source, n):
    """SPFA: Queue-based Bellman-Ford optimization.
    Average case O(E), worst case still O(VE)."""
    dist = [float('inf')] * n
    dist[source] = 0
    in_queue = [False] * n
    count = [0] * n  # how many times each vertex entered queue

    q = deque([source])
    in_queue[source] = True

    while q:
        u = q.popleft()
        in_queue[u] = False

        for v, w in adj[u]:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                if not in_queue[v]:
                    q.append(v)
                    in_queue[v] = True
                    count[v] += 1
                    if count[v] >= n:
                        return None  # negative cycle!

    return dist

SPFA is popular in competitive programming because its average case is O(E) — much faster than Bellman-Ford's O(VE). But its worst case is the same, and it can be slower in practice due to cache misses from random queue access patterns. Use it when you need speed and are willing to accept worst-case O(VE).

Bellman-Ford vs Dijkstra: A Side-by-Side Comparison

Let us compare the two most general algorithms head-to-head:

Aspect	Bellman-Ford	Dijkstra (binary heap)
Time	O(VE)	O((V+E) log V)
Space	O(V) + edge list	O(V+E) + heap
Negative edges	Yes	No (incorrect results)
Negative cycle detection	Yes	No
Graph representation	Edge list (simplest)	Adjacency list (for fast neighbor access)
Distributed?	Naturally (each vertex = independent agent)	Hard (needs global priority queue)
Early termination	If no changes in a pass	When target extracted (point-to-point)
Implementation complexity	~15 lines	~25 lines
When to use	Negative edges, small graphs, distributed settings	Non-negative edges, any size graph

The speed difference is substantial. On a graph with V=10,000 and E=50,000:

// Bellman-Ford: V × E = 10,000 × 50,000 = 500,000,000 relaxations
// Dijkstra: (V+E) × log V = 60,000 × 14 ≈ 840,000 operations
// Dijkstra is ~600x faster!

// For dense graphs (V=1000, E=500,000):
// Bellman-Ford: 500,000,000
// Dijkstra: 501,000 × 10 ≈ 5,010,000
// Dijkstra is ~100x faster.

// The gap GROWS with graph size. This is why Dijkstra dominates in practice.

Bellman-Ford check: A graph has 5 vertices and contains a negative-weight cycle reachable from the source. What happens during Bellman-Ford?

The algorithm runs forever in an infinite loop After 4 passes (V-1=4), the extra check scans all edges and finds one that can still be relaxed. This proves a negative-weight cycle exists. The algorithm returns FALSE in O(VE) time — it does not loop forever The algorithm outputs incorrect distances but does not detect the cycle

Chapter 3: DAG Shortest Paths

If your graph has no cycles — a directed acyclic graph (DAG) — you can solve shortest paths in just O(V + E) time. That is linear in the size of the graph. You cannot do better, because you have to look at every vertex and every edge at least once.

The Key Insight

In a DAG, vertices can be arranged in a topological order: a linear ordering where every edge goes from left to right. The magic is that when you process a vertex u in topological order, every vertex that could be on a shortest path to u has already been processed. So d[u] is already correct when you relax u's outgoing edges.

This means one pass through the vertices (in topological order), relaxing each vertex's outgoing edges, is sufficient. Compare this to Bellman-Ford, which needs V-1 passes because it does not know the right order.

// DAG-SHORTEST-PATHS(G, w, s)
topological_order = TOPOLOGICAL-SORT(G) // O(V+E) via DFS
INITIALIZE-SINGLE-SOURCE(G, s)

for each vertex u in topological_order:
for each edge (u, v) ∈ Adj[u]:
RELAX(u, v, w)

That is the entire algorithm. No priority queue, no repeated passes, no negative-cycle check (there are no cycles in a DAG by definition). Just one scan in topological order. Elegant, fast, and correct.

Let us verify the time complexity. Topological sort (via DFS or Kahn's algorithm) takes O(V+E). The main loop processes each vertex once and each edge once (each edge (u,v) is relaxed exactly once, when u is processed). Total: O(V+E) + O(V+E) = O(V+E).

Space: O(V) for the distance and predecessor arrays, plus O(V+E) for the adjacency list and topological order. Total O(V+E).

Negative edges are fine in a DAG. Unlike Dijkstra, DAG shortest paths works perfectly with negative edge weights. Why? Because a DAG has no cycles at all, so negative edges cannot form negative-weight cycles. The topological-order processing handles them naturally.

Why Topological Order Works: The Convergence Cascade

Consider a shortest path s = v₀ → v₁ → ... → v_k. In topological order, v₀ appears before v₁, which appears before v₂, and so on. When we process v₀, we relax (v₀, v₁), setting d[v₁] = δ(s, v₁). When we later process v₁, we relax (v₁, v₂), and since d[v₁] is already correct, d[v₂] gets its correct value. This cascade continues along the entire path.

The path-relaxation property guarantees correctness: since the edges of any shortest path are relaxed in order, all d[v] values are correct after one pass.

Hand Trace: DAG Shortest Paths

Let us trace the algorithm on our example DAG. Vertices: 0, 1, 2, 3, 4, 5. Source: 0. Topological order: 0, 1, 2, 3, 4, 5. Edges include some negative weights.

Process Vertex	Relax Edges	d[1]	d[2]	d[3]	d[4]	d[5]
(init)	-	∞	∞	∞	∞	∞
0	0→1(5), 0→2(3)	5	3	∞	∞	∞
1	1→3(6), 1→2(2)	5	3	11	∞	∞
2	2→4(4), 2→5(2), 2→3(7)	5	3	10	7	5
3	3→4(-1), 3→5(1)	5	3	10	7	5
4	4→5(-2)	5	3	10	7	5
5	(none)	5	3	10	7	5

Wait — something seems off. Let us recheck. When we process vertex 2, we relax 2→3 with weight 7: d[3] = min(11, 3+7) = min(11, 10) = 10. So d[3] updates from 11 to 10. When we process vertex 3, we relax 3→4 with weight -1: d[4] = min(7, 10+(-1)) = min(7, 9) = 7. No change. And 3→5 with weight 1: d[5] = min(5, 10+1) = min(5, 11) = 5. No change.

When we process vertex 4, we relax 4→5 with weight -2: d[5] = min(5, 7+(-2)) = min(5, 5) = 5. No change (ties do not count as updates). Final distances: {0:0, 1:5, 2:3, 3:10, 4:7, 5:5}.

The shortest path from 0 to 5 has weight 5, achieved via 0→2→5 (3+2=5). The alternative 0→2→4→5 also gives 3+4+(-2)=5 — same weight, different path. The algorithm finds one of them (whichever edge was relaxed first sets the predecessor).

Interactive DAG Walkthrough

DAG Shortest Paths: Topological Order + Relaxation

Vertices are laid out in topological order (left to right). Green = processed. Orange = currently being processed. Click "Step" to process the next vertex and relax its outgoing edges. Some edges have negative weights (shown in red).

Ready. Source is vertex 0.

Critical Path Method: The Killer Application

Here is a beautiful real-world application. In project scheduling, tasks are vertices and dependencies are directed edges. The edge weight is the task duration. A project is modeled as a DAG (tasks have dependencies but no circular ones).

The longest path from the project start to the project end is the critical path — the minimum time to complete the entire project. Any delay on a critical-path task delays the entire project. Non-critical tasks have slack: they can be delayed without affecting the project completion time.

How to find the longest path in a DAG? Negate all edge weights and run DAG shortest paths. The shortest path in the negated graph is the longest path in the original graph.

// Longest path in a DAG (Critical Path Method):
1. Negate all edge weights: w'(u,v) = -w(u,v)
2. Run DAG-SHORTEST-PATHS with the negated weights
3. The longest-path distance to v: longest[v] = -d[v]

// This works because:
// min(-w(p)) over all paths p = -(max(w(p)) over all paths p)

Why negation is safe here. In a general graph, negating weights could create negative-weight cycles, making shortest paths undefined. But a DAG has NO cycles, so negating weights cannot create negative cycles. This is why the critical path method works.

Worked Example: Longest Path via Negation

Project tasks: Start → Design (3 days) → Code (5 days) → Test (2 days). Also Start → Write Docs (4 days) → Test. We want the critical path (longest).

Edge	Original w	Negated w'
Start → Design	3	-3
Start → Write Docs	4	-4
Design → Code	5	-5
Write Docs → Test	0	0
Code → Test	2	-2

DAG shortest paths on negated weights gives d[Test] = -10 (path: Start → Design → Code → Test). The longest path = -(-10) = 10 days. The docs path takes only 4 days, so it has 6 days of slack.

When You See a DAG: Always Think Topological Sort

The DAG shortest paths algorithm is a special case of a much broader pattern: dynamic programming on DAGs. Any DP problem where the dependency structure is acyclic can be solved by topological sort + single pass. This includes:

Problem	DAG Structure	Relaxation
Shortest paths	Graph edges	d[v] = min(d[u] + w(u,v))
Longest paths (critical path)	Graph edges (negated)	d[v] = max(d[u] + w(u,v))
Number of paths	Graph edges	count[v] += count[u]
Longest increasing subseq.	a[i] < a[j] for i < j	LIS[j] = max(LIS[i] + 1)

Implementation

python
from collections import defaultdict, deque

def topological_sort(adj, vertices):
    """Kahn's algorithm: BFS-based topological sort."""
    in_deg = {v: 0 for v in vertices}
    for u in adj:
        for v, w in adj[u]:
            in_deg[v] += 1
    q = deque([v for v in vertices if in_deg[v] == 0])
    order = []
    while q:
        u = q.popleft()
        order.append(u)
        for v, w in adj.get(u, []):
            in_deg[v] -= 1
            if in_deg[v] == 0:
                q.append(v)
    return order

def dag_shortest_paths(adj, vertices, source):
    """O(V+E) shortest paths on a DAG."""
    order = topological_sort(adj, vertices)
    dist = {v: float('inf') for v in vertices}
    pred = {v: None for v in vertices}
    dist[source] = 0

    for u in order:
        if dist[u] == float('inf'):
            continue  # unreachable from source, skip
        for v, w in adj.get(u, []):
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                pred[v] = u

    return dist, pred

# Example DAG with negative edges
adj = defaultdict(list)
adj[0] = [(1, 5), (2, 3)]
adj[1] = [(3, 6), (2, 2)]
adj[2] = [(4, 4), (5, 2), (3, 7)]
adj[3] = [(4, -1), (5, 1)]
adj[4] = [(5, -2)]
d, pi = dag_shortest_paths(adj, range(6), 0)
print(d)  # {0:0, 1:5, 2:3, 3:7, 4:6, 5:4}

Complexity Comparison

Algorithm	Time	Negative Edges?	Negative Cycles?	Graph Type
BFS	O(V+E)	No	No	Unweighted
DAG Shortest	O(V+E)	Yes	N/A (no cycles)	DAGs only
Dijkstra	O((V+E) log V)	No	No	General
Bellman-Ford	O(VE)	Yes	Detects	General

Real-World DAG Examples

DAGs appear everywhere in software systems:

Domain	Vertices	Edges	Application
Build systems (Make, Bazel)	Build targets	Dependencies	Determine build order, find critical build path
Package managers (pip, npm)	Packages	Dependencies	Resolve installation order
Spreadsheets	Cells	Formula references	Recalculation order
Neural networks	Layers/ops	Data flow	Forward pass execution order
Git commit history	Commits	Parent pointers	Find merge base, rebase order
Course prerequisites	Courses	Prerequisites	Valid enrollment order

Whenever you see a dependency structure with no circular dependencies, think "DAG" and "topological sort." The O(V+E) shortest path algorithm is one of many algorithms that become possible once you have a topological order.

Counting Shortest Paths in a DAG

A common extension: count the NUMBER of shortest paths from s to each vertex. This is easily done alongside the shortest-path computation:

python
def dag_shortest_with_count(adj, vertices, source):
    """Returns (dist, count) where count[v] = number of shortest paths to v."""
    order = topological_sort(adj, vertices)
    dist = {v: float('inf') for v in vertices}
    count = {v: 0 for v in vertices}
    dist[source] = 0
    count[source] = 1  # one path from source to itself: the empty path

    for u in order:
        if dist[u] == float('inf'):
            continue
        for v, w in adj.get(u, []):
            new_d = dist[u] + w
            if new_d < dist[v]:
                dist[v] = new_d
                count[v] = count[u]  # only paths through u
            elif new_d == dist[v]:
                count[v] += count[u]  # add paths through u

    return dist, count

This works because in a DAG, when we process u in topological order, all paths to u have already been counted. So count[u] is finalized before we use it to update count[v]. The same technique works with Dijkstra (since vertices are finalized in increasing distance order, count[u] is correct when u is extracted).

DAG shortest paths check: Why can we not use DAG shortest paths on a general graph (one that may contain cycles)?

It would be too slow — O(V+E) is not fast enough for cyclic graphs A cyclic graph has no valid topological ordering. The algorithm requires processing vertices in an order where all predecessors are finalized before processing a vertex. Cycles make this impossible — vertex A depends on B which depends on A Cyclic graphs always have negative cycles, which DAG shortest paths cannot handle

Chapter 4: Dijkstra's Algorithm

Dijkstra's algorithm is the workhorse of shortest paths. It handles general graphs (not just DAGs), runs in O((V+E) log V) with a binary heap, and is the basis for almost every practical routing algorithm. The catch: all edge weights must be non-negative.

Historical Context

Edsger Dijkstra invented this algorithm in 1956 while sitting at a cafe in Amsterdam with his fiancee. He published it in 1959 in a remarkably concise three-page paper titled "A note on two problems in connexion with graphs." The algorithm was originally designed for weighted undirected graphs, but it works identically for directed graphs with non-negative weights.

The algorithm's importance was recognized immediately. Today it is one of the most widely implemented algorithms in the world, running billions of times daily in routing engines, game engines, and network protocols. Dijkstra later said: "The algorithm was a twenty-minute invention. I was just thinking about whether I could do this, and I then designed the algorithm for the shortest path."

The key innovation was the priority queue. Before Dijkstra, people either used Bellman-Ford (O(VE)) or tried ad-hoc greedy strategies that did not guarantee optimality. Dijkstra showed that a greedy strategy DOES work, IF you greedily process vertices in order of their estimated distance, AND all edge weights are non-negative.

Interestingly, Dijkstra's original paper did not use a heap — it used a linear scan to find the minimum, giving O(V²) time. The binary heap improvement came later, and the Fibonacci heap (giving the theoretically optimal O(V log V + E)) was introduced by Fredman and Tarjan in 1984, a quarter century after Dijkstra's paper.

The algorithm's elegance lies in its simplicity: just three lines of pseudocode in the main loop (extract-min, add to S, relax neighbors). The correctness proof is short but deep, and the non-negative weight requirement is both necessary and sufficient for the greedy choice to work.

The Core Idea

Maintain a set S of vertices whose shortest-path distances are finalized (proven correct). Initially, S is empty. At each step:

1. Extract-Min

From the priority queue Q, extract the vertex u with the smallest d[u] among all non-finalized vertices. This is the greedy choice.

↓

2. Finalize

Add u to the finalized set S. Its d[u] value is now proven to be δ(s, u). It will never change again.

↓

3. Relax

For every outgoing edge (u, v), relax it. If d[v] improves, decrease v's key in the priority queue.

↻ repeat until Q is empty

// DIJKSTRA(G, w, s)
INITIALIZE-SINGLE-SOURCE(G, s)
S = ∅                       // finalized set
Q = min-priority-queue(V)     // keyed by d[v]

while Q ≠ ∅:
    u = EXTRACT-MIN(Q)       // greedy: smallest d[v]
    S = S ∪ {u}              // finalize u
    for each (u, v) ∈ Adj[u]:
        RELAX(u, v, w)        // may trigger DECREASE-KEY in Q

Think of Dijkstra as an expanding wavefront. The source s is a pebble dropped in a pond. The ripples expand outward, reaching nearby vertices first. The vertex with the smallest d[v] is the next ripple to arrive — we finalize it because no future ripple can arrive earlier (since all edge weights are non-negative, paths through un-reached vertices can only be longer).

Dijkstra vs BFS: What Changes with Weights?

If you already understand BFS (breadth-first search), Dijkstra is a natural generalization. BFS finds shortest paths in unweighted graphs by exploring vertices level by level. In an unweighted graph, all edges have weight 1, so the FIFO queue naturally processes vertices in order of increasing distance.

With weighted edges, a FIFO queue no longer works — a vertex discovered later might have a shorter total distance than one discovered earlier. The priority queue replaces the FIFO queue, sorting by d[v] instead of discovery order.

Aspect	BFS	Dijkstra
Data structure	FIFO queue	Min-priority queue (heap)
Edge weights	All equal (or all 1)	Non-negative, arbitrary
Processing order	By BFS level (# edges)	By shortest distance (d[v])
Time complexity	O(V+E)	O((V+E) log V)
Relaxation	Implicit (first visit = shortest)	Explicit RELAX with DECREASE-KEY

The log V factor. The difference between O(V+E) for BFS and O((V+E) log V) for Dijkstra is entirely due to the priority queue. A FIFO queue has O(1) enqueue/dequeue; a binary heap has O(log V). If you could guarantee weights are small integers (say 0 to C), you could use a bucket queue with O(C) buckets, reducing Dijkstra to O(V + E + C) — essentially BFS with C levels instead of 1. This is Dial's algorithm, discussed in Chapter 8.

The Dijkstra Invariant

At every step of Dijkstra, the following invariant holds:

// For every finalized vertex u in S:
d[u] = δ(s, u)         // distance is correct and final

// For every non-finalized vertex v in Q:
d[v] ≥ δ(s, v)         // distance is an upper bound
d[v] = min over u ∈ S of { d[u] + w(u,v) | (u,v) ∈ E }
                      // best distance using only finalized predecessors

The invariant is maintained because: (1) when we extract u (the vertex with smallest d[v] not in S), the proof above shows d[u] = δ(s, u), and (2) we immediately relax all edges out of u, updating d[v] for v's neighbors. The priority queue ensures we always extract the correct next vertex to finalize.

Priority Queue Implementations: The Engineering Choice

The priority queue is the heart of Dijkstra. It supports two operations: EXTRACT-MIN (remove and return the vertex with smallest d[v]) and DECREASE-KEY (reduce a vertex's key when relaxation finds a shorter path). The choice of data structure determines the time complexity:

Data Structure	EXTRACT-MIN	DECREASE-KEY	Total Dijkstra	Best For
Unsorted array	O(V)	O(1)	O(V² + E)	Dense graphs (E ≈ V²)
Binary min-heap	O(log V)	O(log V)	O((V+E) log V)	Sparse graphs (E ≈ V)
Fibonacci heap	O(log V) amortized	O(1) amortized	O(V log V + E)	Theoretical optimality

For sparse graphs (E = O(V)), the binary heap gives O(V log V) — excellent. For dense graphs (E = O(V²)), the simple array at O(V²) actually beats the binary heap's O(V² log V). The Fibonacci heap achieves the best of both worlds theoretically, but its constant factors are so large that binary heaps dominate in practice up to very large graph sizes.

The practical choice. In 99% of real implementations — Google Maps, game pathfinding, network routing — people use a binary min-heap. Python's heapq module provides one. The Fibonacci heap is a theoretical talking point, not a production choice. If an interviewer asks: "Fibonacci heap gives O(V log V + E) but the constant factors and implementation complexity make binary heaps faster in practice for graphs under ~10M vertices."

Interactive Dijkstra Step-Through

Dijkstra's Algorithm: Extract-Min and Relax

Green = finalized (in S). Yellow = in frontier (in priority queue with finite d[v]). Gray = undiscovered (d = ∞). Teal edges = shortest-path tree. Priority queue contents shown below the graph. Click "Step" to extract-min and relax.

Source: A. Click Step.

Implementation with heapq (Lazy Deletion)

Python's heapq does not support DECREASE-KEY. The standard workaround is lazy deletion: push duplicate entries into the heap and skip stale ones when popping.

python
import heapq

def dijkstra(adj, source):
    """
    adj: dict of {u: [(v, weight), ...]}
    Returns: (dist, pred) dicts

    Uses lazy deletion: push (distance, vertex) pairs.
    When we pop, skip if vertex already finalized.
    """
    dist = {}       # finalized distances
    pred = {}       # predecessor map
    pq = [(0, source)]  # min-heap: (distance, vertex)
    best = {source: 0}  # best known distance (pre-finalization)

    while pq:
        d_u, u = heapq.heappop(pq)
        if u in dist:
            continue  # stale entry — u already finalized
        dist[u] = d_u  # finalize u

        for v, w in adj.get(u, []):
            if v not in dist:  # only relax non-finalized
                new_d = d_u + w
                if new_d < best.get(v, float('inf')):
                    best[v] = new_d
                    pred[v] = u
                    heapq.heappush(pq, (new_d, v))

    return dist, pred

def reconstruct_path(pred, target):
    """Follow predecessors to build the path."""
    path = []
    v = target
    while v is not None:
        path.append(v)
        v = pred.get(v)
    return path[::-1]

# Example
adj = {
    'A': [('B', 4), ('C', 2)],
    'B': [('D', 3), ('C', 5)],
    'C': [('B', 1), ('D', 8), ('E', 10)],
    'D': [('E', 2)],
    'E': [],
}
d, pi = dijkstra(adj, 'A')
print(d)  # {'A': 0, 'C': 2, 'B': 3, 'D': 6, 'E': 8}
print(reconstruct_path(pi, 'E'))  # ['A', 'C', 'B', 'D', 'E']

Why lazy deletion works. When we push a new (shorter) distance for vertex v, the old entry stays in the heap. When the old entry is eventually popped, we check if u in dist and skip it because v was already finalized with the correct (shorter) distance. The heap may contain O(E) entries in the worst case, so each operation costs O(log E) = O(log V) (since E ≤ V², log E ≤ 2 log V).

Data Flow: What Goes In, What Comes Out

Let us be precise about the inputs and outputs of Dijkstra, since this is what you need to implement it from scratch:

// INPUT
adj: Dict[vertex, List[(neighbor, weight)]] // adjacency list
source: vertex                              // starting vertex

// INTERNAL STATE
pq: MinHeap of (distance, vertex)         // priority queue
dist: Dict[vertex, float]                // finalized distances
best: Dict[vertex, float]                // best known (pre-finalization)
pred: Dict[vertex, vertex]               // predecessor map

// OUTPUT
dist: Dict[vertex, float]                // shortest distance to each vertex
pred: Dict[vertex, vertex]               // predecessor for path reconstruction

Common Implementation Bugs

Bug	Symptom	Fix
Missing `if u in dist: continue`	Correct results but TLE (processes vertices multiple times)	Add the skip guard at the top of the while loop
Initializing d[source] = 0 but forgetting to push to heap	Returns only d[source] = 0, everything else is ∞	Push (0, source) into heap at the start
Using `<=` instead of `<` in relaxation	Pushes unnecessary duplicates, causes TLE	Use strict `<` — equal distance means no improvement
Storing vertices as strings with `heapq`	Crash when two distances are equal (Python compares the second element)	Use (dist, int_id) or (dist, tie_breaker, vertex)
Not handling disconnected vertices	Loops forever or crashes on empty heap	Disconnected vertices stay at d = ∞, check after Dijkstra returns

Dijkstra complexity check: You have a road network with 1 million intersections and 3 million road segments, all non-negative weights. Which priority queue gives the best practical performance?

Binary heap: O((V+E) log V) ≈ O(4M × 20) ≈ 80M operations. Fast constants, cache-friendly. This is what production routing engines use Fibonacci heap: O(V log V + E) ≈ 23M operations, theoretically fewer but with large constant overhead Simple array: O(V²) = O(10¹²), far too slow for sparse graphs

Chapter 5: Dijkstra Deep Dive

Now that you know the mechanics, let us understand why Dijkstra works and exactly when it breaks. The correctness proof is one of the most elegant in all of algorithms.

Correctness Proof: The Greedy Choice Property

The core claim: when we extract vertex u from the priority queue, d[u] is already the true shortest-path distance δ(s, u). This is the most important theorem in this entire lesson. If you understand this proof, you understand why Dijkstra works and exactly when it fails.

Proof (by contradiction). Suppose u is the first vertex extracted from Q for which d[u] ≠ δ(s, u). Since d[u] ≥ δ(s, u) by the upper-bound property, we must have d[u] > δ(s, u). There must exist a shortest path p from s to u (otherwise δ(s, u) = ∞ = d[u]).

Since s ∈ S (finalized) and u ∉ S at the time of extraction, path p must cross from S to V\S at some point. Let y be the first vertex along p that is not in S, and let x be its predecessor on p (so x ∈ S).

// The path looks like: s ~~> x → y ~~> u
// where x ∈ S and y ∉ S

// Since x ∈ S and u was the FIRST violation:
d[x] = δ(s, x) // x was finalized correctly

// When x was extracted, edge (x, y) was relaxed, so:
d[y] ≤ d[x] + w(x, y) = δ(s, x) + w(x, y) = δ(s, y)

// By the upper-bound property:
d[y] ≥ δ(s, y)

// Combining: d[y] = δ(s, y)

// Since y is on a shortest path from s to u, and ALL SUBSEQUENT
// edges from y to u have NON-NEGATIVE weight:
δ(s, y) ≤ δ(s, u) // THIS IS WHERE NON-NEGATIVE WEIGHTS MATTER

// Therefore:
d[y] = δ(s, y) ≤ δ(s, u) < d[u]

// But we extracted u from Q before y!
// That means d[u] ≤ d[y] (u had the minimum key in Q)
// This contradicts d[y] < d[u].
// Therefore d[u] = δ(s, u). QED. ■

Why Negative Edges Break Dijkstra

The proof relies on one critical step: δ(s, y) ≤ δ(s, u). This holds because y is on the path from s to u and all subsequent edges have non-negative weight, so the path can only get longer. With negative edges, a later vertex on the path could have a smaller shortest-path weight than an earlier vertex. The greedy extraction order breaks.

Here is a minimal counterexample that every student should memorize. Graph: s→A (weight 1), s→B (weight 5), B→A (weight -10). Only 3 vertices and 3 edges.

Step	Action	d[A]	d[B]	Problem?
Init	-	∞	∞
1	Extract s, relax	1	5
2	Extract A (d=1, min)	1 (FINAL)	5	A finalized at d=1
3	Extract B (d=5), relax B→A	1	5	5+(-10)=-5 < 1, but A is already finalized!

The true shortest path to A is s→B→A with weight 5+(-10) = -5. But Dijkstra finalized A at d=1 in step 2 and never corrects it. The negative edge from B to A provides a shortcut that Dijkstra discovered too late.

Negative Edge Breaking Dijkstra

Watch Dijkstra finalize vertex A with d=1, then later discover (too late) that the path through B gives d=-5. Click "Step" to advance through the algorithm.

Click Step to begin.

The non-negative assumption is load-bearing. It is not a minor technicality. It is the entire reason the greedy strategy works. Without it, Dijkstra's correctness proof collapses, and the algorithm produces wrong answers silently — no error, no crash, just incorrect distances. This is one of the most common bugs in interview code: using Dijkstra on a graph with negative edges.

Worked Example: Complete Dijkstra Trace

Let us build complete muscle memory by tracing Dijkstra on a 6-node graph. Edges: A→B (2), A→C (4), B→C (1), B→D (7), C→E (3), D→F (1), E→D (2), E→F (5). Source: A.

Step	Extract	Relax	d[B]	d[C]	d[D]	d[E]	d[F]
Init	-	-	∞	∞	∞	∞	∞
1	A (0)	A→B, A→C	2	4	∞	∞	∞
2	B (2)	B→C, B→D	2	3	9	∞	∞
3	C (3)	C→E	2	3	9	6	∞
4	E (6)	E→D, E→F	2	3	8	6	11
5	D (8)	D→F	2	3	8	6	9
6	F (9)	(none)	2	3	8	6	9

Key observations from this trace:

(1) DECREASE-KEY in action. Step 2: B→C improves d[C] from 4 to 3. The direct path A→C (4) was found first, but A→B→C (2+1=3) is shorter. This is a DECREASE-KEY operation in the priority queue.

(2) Cascade of improvements. Step 4: E→D improves d[D] from 9 (via B→D) to 8 (via C→E→D). Step 5: D→F further improves d[F] from 11 (via E→F) to 9 (via D→F). Each improvement enables subsequent improvements — this cascading is what makes shortest paths interesting.

(3) The shortest-path tree. The final predecessor chain reveals the tree: A→B (pred[B]=A), A→B→C (pred[C]=B), A→B→C→E (pred[E]=C), A→B→C→E→D (pred[D]=E), A→B→C→E→D→F (pred[F]=D). The shortest path to F is A→B→C→E→D→F with total weight 2+1+3+2+1 = 9.

(4) Not every edge is on a shortest path. The edges A→C (4), B→C (5), C→D (8), and E→F (5) are NOT used in any shortest path from A. They were explored but superseded by better alternatives.

Johnson's Algorithm: Making Negative Edges Non-Negative

What if your graph has negative edges but you want Dijkstra's speed? Johnson's algorithm provides a clever solution: re-weight all edges to be non-negative while preserving shortest paths, then run Dijkstra.

The trick is a potential function h(v) computed by running Bellman-Ford once from a dummy source connected to all vertices with zero-weight edges:

// Johnson's Re-weighting
1. Add vertex s' with zero-weight edges to all vertices
2. Run Bellman-Ford from s': h(v) = δ(s', v)
3. Re-weight: w'(u,v) = w(u,v) + h(u) - h(v)

// Key properties:
// (a) w'(u,v) ≥ 0 for all edges (by triangle inequality)
// (b) Shortest paths are preserved: if p is shortest under w,
// then p is shortest under w'. The h values telescope:
// w'(p) = w(p) + h(source) - h(target)

After re-weighting, run Dijkstra from every source vertex. Recover original distances by subtracting the potential difference: δ(u, v) = δ'(u, v) - h(u) + h(v).

Total time for all-pairs: O(VE) for Bellman-Ford + O(V(V+E) log V) for V runs of Dijkstra. For sparse graphs, this beats Floyd-Warshall's O(V³).

Dijkstra vs Bellman-Ford: Operation Count Comparison

Operation Count: Dijkstra vs Bellman-Ford

Adjust the graph size to see how the number of relaxation operations compares. Dijkstra (green) grows as (V+E) log V. Bellman-Ford (orange) grows as V×E.

Vertices (V) 100

Edge density 3x (sparse)

Proof check: Dijkstra's correctness proof fails for negative edges at which specific logical step?

The initialization — d[s] should not be 0 with negative edges The extract-min operation — we should extract the maximum instead The step δ(s,y) ≤ δ(s,u). With negative edges on the subpath from y to u, a vertex later on the path might have a smaller true distance than a vertex earlier on the path. So extracting the vertex with smallest d[v] does not guarantee it has the smallest true distance

Chapter 6: Dijkstra Showcase

This is the big interactive simulation. Build a graph, pick a source, and watch Dijkstra construct the shortest-path tree in real time. You can also inject a negative edge to see the algorithm break.

Interactive Dijkstra Simulator

Build: Click empty space to place vertices (up to 12). Click one vertex then another to add a directed edge (you will be prompted for weight).
Run: Select a source vertex (click it, it turns orange), then click "Run Dijkstra" for animation or "Step" for one-step-at-a-time.
Watch: Green = finalized. Yellow = frontier. Teal edges = shortest-path tree building up.

Animation Speed 500ms

Click canvas to place vertices. Click "Load Preset" for a pre-built graph.

Experiments to try. (1) Load the preset graph and run Dijkstra. Watch the shortest-path tree form. Notice how the tree grows outward from the source, always extending to the nearest unfinalized vertex. (2) Click "Add Negative Edge" and re-run. See how a finalized vertex gets an incorrect distance — Dijkstra locked it in before the negative-edge shortcut was discovered. (3) Clear and build your own graph. Try a long chain of light edges vs a single heavy direct edge to recreate the greedy-fails scenario from Chapter 0.

What the Shortest-Path Tree Looks Like

After Dijkstra finishes, the predecessor pointers π[v] form a tree rooted at the source. Every path in this tree from the root to any vertex is the shortest path in the original graph. This tree has V-1 edges (assuming all vertices are reachable) and contains exactly one path from s to each vertex.

In the visualization, the teal edges form this tree. Notice that it is a subgraph of the original graph — it only uses edges that are part of some shortest path. Multiple shortest paths with the same weight may exist, but Dijkstra commits to one (the first discovered).

Algorithm Comparison: Same Graph, Three Algorithms

To cement the difference between our three algorithms, here is how they would process the same 5-node graph with edges including a negative weight:

Property	Bellman-Ford	DAG Shortest (if DAG)	Dijkstra
Handles this graph?	Yes (any graph)	Only if no cycles	Only if no neg edges
Number of relaxations	Up to (V-1) × E = 36	Exactly E = 9	At most E = 9
Data structure	Edge list	Adjacency list + topo order	Min-heap
Processes vertices in	Arbitrary order (all edges)	Topological order	Increasing d[v] order
Early termination	If no changes in a pass	Never needed (1 pass)	When Q is empty

The showcase take-away. The best way to understand these algorithms is to run them yourself. Build graphs, inject negative edges, watch Dijkstra break, compare node-exploration counts. Interactive experimentation builds the intuition that textbook reading alone cannot provide.

Priority Queue Operation Counts

With the preset graph (8 nodes, 11 edges), Dijkstra performs:

// EXTRACT-MIN operations: V = 8 (one per vertex)
// INSERT / DECREASE-KEY operations: at most E = 11
// Total heap operations: V + E = 19
// Each costs O(log V) = O(log 8) = O(3)
// Total: 19 × 3 ≈ 57 operations

// Compare to Bellman-Ford: (V-1) × E = 7 × 11 = 77 relaxations
// Dijkstra wins by ~26% on this small graph.
// On large sparse graphs, the difference is enormous.

Chapter 7: A* Search

Dijkstra explores outward in all directions equally, like ripples expanding from a pebble in a pond. If you know where the goal is, that is wasteful — you are searching behind you as much as in front. A* search fixes this by steering the search toward the goal using a heuristic function.

The Idea

Instead of prioritizing by d[v] alone (the cost to reach v from the source), A* prioritizes by:

f(v) = g(v) + h(v)

where g(v) = d[v] is the known cost from source to v, and h(v) is a heuristic estimate of the cost from v to the goal. The function f(v) estimates the total cost of the best path through v.

Vertices that are closer to the goal (according to h) get explored first. The search is biased in the right direction.

// A*(G, w, s, goal, h)
// Identical to Dijkstra, but priority = f(v) = g(v) + h(v)

INITIALIZE-SINGLE-SOURCE(G, s)
Q = min-priority-queue(V) // keyed on f(v) = d[v] + h(v)

while Q ≠ ∅:
    u = EXTRACT-MIN(Q)   // vertex with smallest f(v)
    if u == goal:
        return d[u]      // found optimal path!
    for each (u, v) ∈ Adj[u]:
        RELAX(u, v, w)   // updates d[v], recalculates f(v)

Admissible Heuristics: The Optimality Guarantee

A heuristic h is admissible if it never overestimates the true distance to the goal: h(v) ≤ δ(v, goal) for all v. If h is admissible, A* is guaranteed to find the optimal shortest path. Why? Because an admissible heuristic never makes the priority of a vertex artificially high, so the true shortest path is never deprioritized below a suboptimal path.

Common admissible heuristics:

Domain	Heuristic	Formula	Why Admissible
Grid (4-connected)	Manhattan distance	\|x₁-x₂\| + \|y₁-y₂\|	Shortest grid path without obstacles
Grid (8-connected)	Chebyshev distance	max(\|Δx\|, \|Δy\|)	Shortest with diagonal moves
Euclidean plane	Straight line	√(Δx² + Δy²)	Straight line ≤ any path
Road network	Haversine / max speed	Great-circle dist / speed limit	Fastest possible travel time

The Spectrum from Dijkstra to Greedy

A* is a spectrum between two extremes:

h(v) = 0 for all v: A* degenerates to Dijkstra. No heuristic guidance. Explores uniformly in all directions. Guaranteed optimal but potentially slow.

h(v) = δ(v, goal) exactly: A* goes straight to the goal with zero wasted exploration. Every vertex extracted is on the shortest path. But you would need to already know the answer to compute this h.

h(v) > δ(v, goal) for some v: The heuristic is inadmissible. A* may find paths faster but can return suboptimal solutions. Useful when approximate answers are acceptable (weighted A*, where h is inflated by a factor ε).

Consistency (monotonicity). A stronger condition: h(u) ≤ w(u,v) + h(v) for all edges (u,v). This is the triangle inequality applied to h. Consistency implies admissibility, and guarantees that A* never needs to re-open a closed node. All the heuristics in the table above are consistent. If your heuristic is admissible but not consistent (rare in practice), A* still finds the optimal path but may be slower because it re-expands nodes.

Why A* Is Optimal Among Informed Algorithms

A* with a consistent heuristic expands the minimum number of nodes among all algorithms that use the same heuristic information. No other algorithm using the same h(v) can find the optimal path while exploring fewer nodes. This is because A* never expands a node with f(v) > f*(goal) = optimal cost, and every node with f(v) < f*(goal) must be expanded by any correct algorithm (it could potentially be on a shorter path).

This optimality is relative to the heuristic. A better heuristic (closer to the true distance) leads to fewer expanded nodes. The perfect heuristic h(v) = δ(v, goal) leads to expanding only the nodes on the shortest path itself.

IDA* — A* with Bounded Memory

Standard A* stores all generated nodes in memory (the open and closed lists). For very large search spaces (e.g., 15-puzzle, Rubik's cube), this can exhaust memory. IDA* (Iterative Deepening A*) solves this by using iterative deepening with an f-value cutoff:

// IDA* pseudocode
threshold = h(start)
loop:
    result = DFS from start, pruning nodes with f(v) > threshold
    if result == FOUND: return solution
    threshold = min f(v) among pruned nodes
    // Gradually increase threshold until solution found

IDA* uses O(depth) memory (just the recursion stack) while retaining A*'s optimality guarantee. The trade-off: it may re-expand nodes across iterations, making it slower in practice for graphs with many distinct f-values. But for domains like sliding puzzles, it is the standard approach.

SMA* — Simplified Memory-Bounded A*

SMA* is another memory-efficient variant. It keeps expanding the best leaf node (like A*) but when memory runs out, it drops the worst leaf and stores its f-value in the parent. If the dropped node is later needed, it is regenerated. SMA* uses all available memory, unlike IDA* which uses almost none.

The memory hierarchy of A* variants:

Algorithm	Memory	Optimality	Best For
A*	O(V) — all generated nodes	Yes (with admissible h)	Small-medium graphs
IDA*	O(depth) — recursion stack	Yes (with admissible h)	Puzzles, combinatorial search
SMA*	User-specified bound	Yes (if enough memory)	When memory is limited but not tiny
Weighted A*	O(V)	ε-optimal	Real-time AI, game engines

Dijkstra vs A* on a Grid

Dijkstra vs A*: Head-to-Head on a Grid

Green = start (S), Red = goal (G). Blue shading = explored by Dijkstra. Purple shading = explored by A*. Teal/orange = final paths (same optimal path, but A* finds it with far fewer explored cells). Click cells to toggle walls before running.

Click cells to toggle walls, then click "Run Both."

The difference is dramatic. On an open grid, Dijkstra explores roughly a circle around the start before reaching the goal. A* explores a narrow corridor aimed at the goal. Both find the same optimal path, but A* visits far fewer cells. On real road networks, A* typically explores 10-100x fewer nodes than Dijkstra for point-to-point queries.

A* in Action: Worked Example

Consider a 4-node graph with coordinates: A(0,0), B(3,0), C(1,2), D(3,2). Edges: A→B(5), A→C(3), B→D(2), C→D(4). Goal: shortest path from A to D. Heuristic: Euclidean distance to D.

Vertex	h(v) = Euclidean to D(3,2)
A(0,0)	√(9+4) = 3.61
B(3,0)	√(0+4) = 2.00
C(1,2)	√(4+0) = 2.00
D(3,2)	0.00

Step	Extract	f = g + h	Relax	Queue
Init	-	-	-	{A: 0+3.61=3.61}
1	A (f=3.61)	g=0	A→B: g=5, A→C: g=3	{C: 3+2=5, B: 5+2=7}
2	C (f=5)	g=3	C→D: g=7	{B: 7, D: 7+0=7}
3	B (f=7)	g=5	B→D: g=7	{D: 7} (no improvement)
4	D (f=7)	GOAL!	-	Done. Cost = 7

A* found the optimal path A→B→D (cost 7) = A→C→D (also cost 7). It explored all 4 nodes. Dijkstra would also explore all 4 nodes on this small graph. The benefit of A* becomes dramatic on larger graphs where the heuristic prunes large swaths of the search space.

Quantifying the Speedup

On an N × N grid with no obstacles, Dijkstra explores approximately πD²/4 cells (a quarter-circle with radius D = distance to goal), while A* with Manhattan heuristic explores approximately 2D cells (a narrow diamond). The ratio:

// Dijkstra explored: πD² / 4 ≈ 0.785 D²
// A* explored: ≈ 2D (a thin corridor)
// Speedup ratio: D² / (2D) = D/2

// For a 1000x1000 grid with goal at distance 500:
// Dijkstra: ~196,000 cells
// A*: ~1,000 cells
// Speedup: ~196x

With obstacles, the speedup depends on the obstacle density. Dense mazes reduce A*'s advantage because the heuristic becomes less informative — the straight-line distance is far from the actual maze distance. But even in mazes, A* rarely does worse than Dijkstra.

Weighted A* (Bounded Suboptimality)

Sometimes you want speed more than optimality. Weighted A* inflates the heuristic by a factor ε > 1:

f(v) = g(v) + ε · h(v)

This makes A* more aggressively goal-directed, exploring fewer nodes but potentially finding suboptimal paths. The guarantee: the returned path has cost at most ε times the optimal cost. With ε = 1.5, you get paths within 50% of optimal but explore far fewer nodes. This trade-off is used in real-time game AI where speed matters more than perfect optimality.

Implementation

python
import heapq, math

def a_star(adj, start, goal, pos):
    """
    adj: {u: [(v, weight), ...]}
    pos: {v: (x, y)} for heuristic computation
    Returns: (path, cost) or (None, inf)
    """
    def h(v):
        # Euclidean distance (admissible)
        x1, y1 = pos[v]
        x2, y2 = pos[goal]
        return math.sqrt((x1-x2)**2 + (y1-y2)**2)

    g = {start: 0}  # best known cost from start
    pred = {start: None}
    pq = [(h(start), 0, start)]  # (f, g, vertex)
    closed = set()

    while pq:
        f_u, g_u, u = heapq.heappop(pq)
        if u == goal:
            # Reconstruct path
            path = []
            while u is not None:
                path.append(u)
                u = pred[u]
            return path[::-1], g[goal]
        if u in closed:
            continue
        closed.add(u)

        for v, w in adj.get(u, []):
            new_g = g[u] + w
            if v not in g or new_g < g[v]:
                g[v] = new_g
                pred[v] = u
                heapq.heappush(pq, (new_g + h(v), new_g, v))

    return None, float('inf')  # goal unreachable

When NOT to Use A*

A* is not always the best choice, even for point-to-point queries:

Scenario	Why A* Struggles	Better Alternative
No good heuristic available	h=0 makes it Dijkstra anyway	Plain Dijkstra (simpler code)
Need shortest paths to ALL vertices	A* targets one goal; no benefit	Dijkstra from source
Graph is a maze/labyrinth	Euclidean h is very inaccurate	Bidirectional Dijkstra or IDA*
Edge weights are 0/1	Overhead of f-value management	0-1 BFS (deque, no heap)
Static graph, many queries	No preprocessing benefit	Contraction hierarchies
Negative edges	A* inherits Dijkstra's limitation	Bellman-Ford

A* check: A heuristic h(v) = 100 for all vertices. Is this admissible? What behavior does it cause?

Yes, it is admissible and A* works perfectly No. For vertices very close to the goal (true distance < 100), h overestimates. This violates admissibility, so A* may return a suboptimal path. The only constant heuristic that is always admissible is h(v) = 0, which reduces A* to Dijkstra A constant heuristic has no effect since it cancels in all priority comparisons

Chapter 8: Shortest Paths in Practice

The algorithms we have learned are not academic exercises. They are running right now in your phone, your internet router, and the servers that planned your last delivery.

GPS and Map Routing

Google Maps, Apple Maps, and Waze all use Dijkstra-based algorithms. But the US road network has ~24 million intersections and ~58 million road segments. Running raw Dijkstra for every query would take seconds. Real mapping services use contraction hierarchies (CH) to answer queries in microseconds.

The idea: preprocess the graph by repeatedly "contracting" the least-important vertex — removing it and adding shortcut edges to preserve shortest-path distances between its neighbors. This creates a hierarchy where highways are at the top and local streets at the bottom.

1. Preprocessing (minutes)

Contract vertices by importance order. Unimportant vertices (dead-end streets) are contracted first, important ones (highway interchanges) last. Add shortcut edges to preserve all pairwise shortest-path distances. Result: a hierarchy with ~2x the original edges.

↓

2. Query (microseconds)

Run bidirectional Dijkstra: one search forward from source (only going "upward" in hierarchy), one backward from target (also upward). They meet at a common ancestor vertex. Explores ~500 vertices instead of millions.

↓

3. Path Unpacking

The path may use shortcut edges. Recursively unpack each shortcut into its original sub-path to get the actual road-level route for turn-by-turn directions.

Network Routing: BGP and Bellman-Ford

The internet is a graph of autonomous systems (AS) — large networks like Comcast, Google, AT&T. The Border Gateway Protocol (BGP) uses a distributed version of Bellman-Ford to find routes between them.

Each router maintains a distance vector: its best-known distance (or cost) to every destination. Periodically, it sends this vector to its neighbors. Each neighbor updates its own distances using relaxation. This is Bellman-Ford running in parallel across thousands of routers.

Why Bellman-Ford and not Dijkstra? Because no single router has the complete graph topology. Each router only knows its own distances and what its neighbors tell it. Bellman-Ford is inherently distributed — relaxation requires only local information. Dijkstra requires global knowledge (the entire priority queue).

The count-to-infinity problem. Distributed Bellman-Ford has a nasty failure mode. When a link goes down, routers keep incrementing their distance estimates in a feedback loop, slowly "counting to infinity." In the worst case, it takes O(V) message rounds to converge. BGP mitigates this with: (1) split horizon — do not advertise a route back to the neighbor you learned it from, (2) route poisoning — advertise failed routes with distance ∞, (3) hold-down timers — wait before accepting new routes to a failed destination.

Project Scheduling: Critical Path Method

We saw in Chapter 3 that the longest path in a DAG gives the critical path. This is the foundation of CPM (Critical Path Method), used in construction, manufacturing, and software project planning.

Critical Path Finder

A project DAG with task durations. The critical path (longest path from Start to End) is highlighted in orange. Tasks not on the critical path have slack — they can be delayed without affecting the project deadline.

Critical path total: computed on load.

Algorithm Decision Matrix

This is the table you should have memorized for interviews. Given a graph problem, how do you choose an algorithm?

Scenario	Algorithm	Time	Why This One
Non-negative, sparse	Dijkstra (binary heap)	O((V+E) log V)	Fastest practical option
Non-negative, dense	Dijkstra (array)	O(V²+E)	Array beats heap when E = O(V²)
Negative edges possible	Bellman-Ford	O(VE)	Only correct option for negative edges
Detect negative cycles	Bellman-Ford	O(VE)	Extra pass after V-1 passes
DAG	DAG shortest paths	O(V+E)	Fastest possible, handles neg edges
Point-to-point with coords	A*	Depends on h	Heuristic focuses search
Massive road network	Contraction hierarchies	μs per query	Minutes preprocess, instant queries
Distributed (no global view)	Bellman-Ford (dist.)	O(VE) messages	Only needs local info
Unweighted graph	BFS	O(V+E)	All edges weight 1
Edge weights 0 or 1	0-1 BFS (deque)	O(V+E)	Push 0-edges front, 1-edges back

Comparison: How Fast Are These In Practice?

Real-world benchmarks on the US road network (24M vertices, 58M edges):

Algorithm	Query Time	Preprocessing	Extra Space
Dijkstra (binary heap)	~3 seconds	None	O(V)
A* (Euclidean)	~500 ms	None	O(V)
Bidirectional Dijkstra	~1.5 seconds	None	O(V)
Contraction Hierarchies	~0.3 ms	~10 minutes	~2x edges
Hub Labeling	~0.001 ms	~30 minutes	~100 bytes/vertex

The difference is staggering. Raw Dijkstra takes 3 seconds per query — unacceptable for a map app. Contraction hierarchies answer in 0.3 milliseconds — fast enough for real-time turn-by-turn routing with live traffic updates.

Debug Scenarios

Scenario 1: Dijkstra returns wrong distances. Systematic checklist: (a) Does the graph have negative edges? If yes, switch to Bellman-Ford. (b) Are you accidentally allowing re-finalization? The if u in dist: continue guard must be present. (c) Is your adjacency list correct? Print it and verify. (d) Are vertex indices 0-based or 1-based? Off-by-one errors are the #1 Dijkstra bug.

Scenario 2: Bellman-Ford reports a negative cycle unexpectedly. Check: (a) Sign errors in weight parsing. (b) Unsigned integers that underflow to large positive values. (c) Floating-point rounding creating artificial small negatives — use epsilon tolerance: if dist[u] + w < dist[v] - 1e-9. (d) Graph is directed but you treated it as undirected (added both directions of each edge, creating 2-cycles).

Scenario 3: A* returns a suboptimal path. Your heuristic is inadmissible. Debug: print h(v) and the true shortest distance from v to goal for vertices near the goal. If h(v) > true distance for any v, the heuristic overestimates. Common cause: using Euclidean distance heuristic on a grid that only allows 4-directional movement (Manhattan distance is the correct heuristic).

Scenario 4: Dijkstra is too slow. Profile first. (a) If the heap is the bottleneck, consider 0-1 BFS if weights are 0/1, or bucket queues if weights are small integers. (b) If graph construction is slow, stream edges lazily. (c) For point-to-point, switch to A*. (d) For repeated queries, preprocess with contraction hierarchies. (e) For very dense graphs (E ≈ V²), switch to array-based Dijkstra O(V²).

Scenario 5: Memory limit exceeded. Dijkstra with lazy deletion stores up to O(E) entries in the heap. For very large graphs: (a) use indexed priority queue (maintains at most V entries), (b) store the graph as edge list + adjacency offsets (CSR format) instead of adjacency list, (c) process the graph in chunks if it does not fit in memory.

Difference Constraints

A system of difference constraints is a set of inequalities of the form x_j - x_i ≤ b_ij, where x_i are unknown variables and b_ij are constants. This arises in scheduling (task j cannot start more than b_ij time units after task i), temporal reasoning, and layout problems.

The connection to shortest paths is direct: model each variable x_i as a vertex, each constraint x_j - x_i ≤ b_ij as an edge (i, j) with weight b_ij. Add a source vertex s with zero-weight edges to all vertices. Then:

// Shortest-path distances solve the system:
x_i = δ(s, i)   // assign each variable its shortest-path distance

// This satisfies all constraints because:
δ(s, j) ≤ δ(s, i) + b_ij   // triangle inequality!
x_j - x_i = δ(s, j) - δ(s, i) ≤ b_ij   // exactly the constraint

// If Bellman-Ford detects a negative cycle:
// the system has NO feasible solution

This is one of the most elegant applications of shortest paths: a seemingly unrelated optimization problem (solving a system of inequalities) reduces to a graph algorithm.

Contraction Hierarchies: How They Actually Work

Since contraction hierarchies are the most practically important shortest-path speedup, let us understand the mechanics in detail.

Step 1: Vertex Ordering. Assign an "importance" to each vertex. Simple heuristic: a vertex is important if removing it creates many shortcut edges (it is a "highway node"). A vertex is unimportant if removing it creates few shortcuts (it is a "dead end" or "local street"). Sort vertices by importance: least important first.

Step 2: Contraction. Process vertices in importance order. To contract vertex v:

For each pair (u, w) of neighbors where u → v → w is a shortest path

Check: is u → v → w a shortest path from u to w? If yes, and if v is the ONLY shortest path (witness search), add shortcut edge u → w with weight w(u,v) + w(v,w).

↓

Remove v from the graph

The shortcuts preserve all pairwise shortest-path distances among the remaining vertices. The contracted graph has the same shortest paths as the original.

Step 3: Query. Given source s and target t, run two Dijkstra searches simultaneously:

// Forward search from s: only relax edges to MORE important vertices
// Backward search from t: only relax edges to MORE important vertices
// Both searches go "upward" in the hierarchy

// When they meet at some vertex m, the shortest path is:
// d(s, m) + d(m, t) = d_forward(m) + d_backward(m)
// Take the minimum over all meeting points

// Typical search space: ~500 vertices (out of millions!)

The path found may use shortcut edges. To get the actual road-level path, recursively "unpack" each shortcut into its original sub-path. This is stored during preprocessing.

0-1 BFS: The Forgotten Gem

When edge weights are restricted to 0 and 1, you do not need Dijkstra's heap at all. A deque (double-ended queue) suffices, giving O(V+E) time — the same as BFS.

The idea: when you relax an edge with weight 0, the new distance equals the current distance. Push the neighbor to the front of the deque (it should be processed next, like the current level in BFS). When you relax an edge with weight 1, push to the back (it is one level deeper). This maintains the deque in sorted order by distance.

// 0-1 BFS: O(V+E) for 0/1 weighted graphs
// Deque replaces priority queue

while deque is not empty:
    u = deque.pop_front()
    for each (u, v, w) where w ∈ {0, 1}:
        if d[u] + w < d[v]:
            d[v] = d[u] + w
            if w == 0: deque.push_front(v)
            else: deque.push_back(v)

When do 0-1 weights appear? More often than you think:

Problem	Weight 0 Means	Weight 1 Means
Grid with free/paid cells	Move to free cell	Move to paid cell
Graph where some edges can be reversed	Traverse in original direction	Reverse an edge
Min operations to make string palindrome	Character already matches	Need to change a character
Min road modifications for one-way streets	Road goes in correct direction	Road must be reversed

Dial's Algorithm: Bucket Queues for Small Integer Weights

If edge weights are non-negative integers in the range [0, C], Dial's algorithm replaces the binary heap with a circular array of C+1 buckets. Bucket i holds all vertices with d[v] = current_min + i. This gives O(V + E + C) time — no log factor!

For road networks where weights are travel times in seconds (C ~ 10,000), Dial's algorithm is competitive with binary heap Dijkstra and simpler to implement. It is essentially a radix-sort-inspired priority queue.

Game AI: Navigation Meshes and Pathfinding

In video games, characters navigate 3D environments. The world is decomposed into a navigation mesh (navmesh) — a set of convex polygons covering walkable surfaces. The pathfinding graph has polygons as vertices and shared edges as graph edges. The weight is typically the Euclidean distance between polygon centers.

Game studios use A* on the navmesh graph, often with a hierarchical approach: coarse path on a high-level graph (regions), then refined path within each region. This gives near-instant pathfinding even in large open worlds. Detour (from Recast Navigation) is the industry-standard library, used in games from Skyrim to Fortnite.

Robotics: Motion Planning as Shortest Paths

In robotics, shortest paths appear in motion planning. A robot's configuration space (C-space) is a graph where each vertex is a valid configuration (position, joint angles) and each edge is a feasible motion. The weight is the energy cost, time, or distance of the motion.

For robot arms with many joints, the C-space is high-dimensional (6-12 dimensions for a typical industrial robot). Dijkstra and A* are impractical in high dimensions (too many vertices). Instead, roboticists use sampling-based planners like RRT* (Rapidly-exploring Random Trees) that build a sparse graph by random sampling, then run shortest paths on the sparse graph.

Self-driving cars use a variant: the C-space includes position (x, y), heading (θ), and velocity (v). The graph has edges that respect the car's dynamics (turning radius, acceleration limits). A* with a dubins/reeds-shepp heuristic finds smooth, drivable paths.

Social Networks: Six Degrees of Separation

In social network analysis, BFS (unweighted shortest paths) finds the degree of separation between two people. The famous "six degrees" conjecture states that any two people on Earth are connected by at most six intermediaries. Facebook research confirmed that the average separation on their network is 3.57 (in 2016, with 1.59 billion users).

For weighted social networks (where edges have strengths like interaction frequency), Dijkstra finds the "strongest connection path" — maximizing the minimum edge weight along the path. This is a bottleneck shortest path variant, solvable by modifying Dijkstra's relaxation to: d[v] = max(d[v], min(d[u], w(u,v))).

Supply Chain and Logistics

In supply chain optimization, shortest paths model the flow of goods through a network of warehouses, distribution centers, and delivery points. Edge weights represent shipping costs, transit times, or carbon footprint. Companies like Amazon, FedEx, and UPS solve millions of shortest-path queries daily to optimize delivery routes and warehouse selection.

The vehicle routing problem (VRP) — "visit all customers at minimum cost" — is NP-hard in general, but shortest-path subroutines (finding cheapest routes between stops) are solved with Dijkstra or contraction hierarchies as building blocks inside heuristic VRP solvers.

Bioinformatics: Sequence Alignment as Shortest Paths

DNA sequence alignment can be modeled as a shortest path in an edit distance graph. Two sequences of length m and n define an (m+1) x (n+1) grid graph. Moving right = insert, moving down = delete, moving diagonally = match/mismatch. The shortest path from (0,0) to (m,n) gives the optimal alignment.

This is solved by dynamic programming (the Needleman-Wunsch algorithm), which is essentially DAG shortest paths on the grid — the grid is a DAG because all edges go right, down, or diagonally right-down. The O(mn) time complexity is a direct consequence of the grid having mn vertices and 3mn edges.

Machine Learning: Shortest Paths in Computation Graphs

Modern ML frameworks (PyTorch, JAX) represent neural network computations as directed acyclic graphs. The forward pass processes operations in topological order. The backward pass (backpropagation) processes operations in reverse topological order to compute gradients.

When optimizing inference, the critical path through the computation graph determines the minimum latency. Operations not on the critical path can be parallelized or delayed. This is exactly the DAG longest-path problem we solved in Chapter 3 — applied to computation instead of construction projects.

Graph compilers like XLA and TensorRT analyze the computation DAG to find opportunities for operator fusion, memory optimization, and scheduling. The shortest/longest path algorithms are foundational tools in these compilers.

Practice check: You need to detect arbitrage opportunities in a currency exchange market. Each exchange rate is an edge weight. What algorithm and graph representation do you use?

Bellman-Ford on a graph where edge weight = -log(exchange_rate). A negative-weight cycle means the product of exchange rates around the cycle > 1, which equals guaranteed profit. We use -log because: log(r₁ × r₂ × ... × r_k) = ∑log(r_i), so product > 1 iff ∑log(r_i) > 0 iff ∑(-log(r_i)) < 0, which is a negative-weight cycle Dijkstra on a graph where edges are exchange rates directly DAG shortest paths, since currency exchanges form a DAG

Chapter 9: Interview Arsenal

Master Cheat Sheet

Algorithm	Neg Edges?	Neg Cycle Detect?	Time	Space	Key Idea
Bellman-Ford	Yes	Yes	O(VE)	O(V)	Relax all edges V-1 times
DAG Shortest	Yes (DAGs)	N/A	O(V+E)	O(V)	Topo sort + one relaxation pass
Dijkstra (heap)	No	No	O((V+E) log V)	O(V+E)	Greedy extract-min + relax
Dijkstra (array)	No	No	O(V²+E)	O(V)	Array scan for min, O(1) decrease-key
Dijkstra (Fib)	No	No	O(V log V + E)	O(V)	O(1) amortized decrease-key
A*	No	No	h-dependent	O(V)	f(v) = g(v) + h(v) priority
0-1 BFS	No	No	O(V+E)	O(V)	Deque: 0-weight front, 1-weight back

The Five Interview Dimensions

Strong shortest-path interview performance requires competence across five orthogonal dimensions. The cheat sheet above handles Dimension 1 (Concept). The coding drills handle Dimension 3 (Code). But interviewers probe all five:

Dimension	What They Test	Example Question
1. Concept	Do you understand the theory?	"Why does Dijkstra fail with negative edges?"
2. Design	Can you architect a system?	"Design a route planner for 1M intersections"
3. Code	Can you implement it correctly?	"Write Dijkstra in 15 minutes"
4. Debug	Can you fix broken code?	"This Dijkstra returns wrong answers. Find the bug."
5. Frontier	Do you know recent advances?	"What is the state of the art for road routing?"

Dimension 1: Concept Questions

Q: Explain relaxation in one sentence.

A: Relaxation checks whether routing to v through u yields a shorter path than our current best estimate, and if so, updates v's distance and predecessor.

Q: Why does Dijkstra require non-negative weights?

A: Because it greedily finalizes the vertex with smallest d[v], assuming no future path through un-finalized vertices can improve it. A negative edge from an un-finalized vertex could decrease an already-finalized distance, violating the greedy choice property that makes the proof work.

Q: Can BFS solve shortest paths?

A: BFS solves unweighted shortest paths (all edges weight 1). It is equivalent to Dijkstra with uniform weights — the FIFO queue is a valid priority queue when all priorities increase by the same amount.

Q: What is the relationship between shortest paths and dynamic programming?

A: Bellman-Ford IS dynamic programming. The subproblem: d_k[v] = shortest path from s to v using at most k edges. The recurrence: d_k[v] = min(d_k-1[v], min over all (u,v) of d_k-1[u] + w(u,v)). The V-1 passes compute d₁, d₂, ..., d_V-1 in order.

Q: What is the difference between Dijkstra and Prim's MST algorithm?

A: Both use a priority queue and greedily extract the minimum. The difference is the key. Dijkstra's key for vertex v is d[v] = total distance from source to v. Prim's key is the weight of the lightest edge connecting v to the current tree. Dijkstra builds shortest paths from a source; Prim builds a minimum spanning tree. They look similar in code but solve fundamentally different problems.

Q: Can you run Dijkstra backward from the target to find the shortest path?

A: Yes, on undirected graphs (where every edge can be traversed in either direction). On directed graphs, you need to reverse all edge directions first, then run Dijkstra from the target on the reversed graph. This finds shortest paths TO the target from all vertices — equivalent to running Dijkstra forward from the target in the reversed graph. Bidirectional Dijkstra exploits this: run forward from source AND backward from target, meet in the middle.

Dimension 2: Design Questions

Q: Design a route planner for a city with 1M intersections.

A: Preprocess with contraction hierarchies (5-10 minutes, store in ~500 MB). At query time, bidirectional Dijkstra on the hierarchy. For real-time traffic: maintain a small overlay graph updated every 30 seconds from probe data. For ETA: Dijkstra on the overlay + CH for the static portion. For alternative routes: penalty method — run Dijkstra, penalize edges on the first route, run again. Three architecturally distinct routes that share at most 50% of edges.

Q: Design a system to detect currency arbitrage in real-time.

A: Model currencies as vertices, exchange rates as edges with weight = -log(rate). Run Bellman-Ford from each currency. Negative cycle = arbitrage. For real-time: maintain edge weights incrementally. When a rate changes, re-run BF only on affected subgraphs (edges within 2 hops of the changed edge). Alert latency target: < 100 ms. Add a minimum profit threshold to filter out micro-arbitrage that disappears after transaction fees.

Dimension 3: Coding Drills

python
# LeetCode 743: Network Delay Time
# Given n nodes, times[i] = (u, v, w), signal from source k.
# Return min time for ALL nodes to receive signal, or -1.
# Key insight: this is max(shortest distances from k), since
# the last node to be reached determines the total delay.

import heapq
from collections import defaultdict

def networkDelayTime(times, n, k):
    adj = defaultdict(list)
    for u, v, w in times:
        adj[u].append((v, w))

    dist = {}
    pq = [(0, k)]

    while pq:
        d, u = heapq.heappop(pq)
        if u in dist:
            continue  # already finalized
        dist[u] = d
        for v, w in adj[u]:
            if v not in dist:
                heapq.heappush(pq, (d + w, v))

    if len(dist) < n:
        return -1  # some nodes unreachable
    return max(dist.values())

python
# LeetCode 787: Cheapest Flights Within K Stops
# Bellman-Ford variant: at most k+1 relaxation passes.
# Key insight: pass i finds shortest paths using at most i edges.
# k stops = k+1 edges, so we run k+1 passes.
# CRITICAL: snapshot dist before each pass to avoid using
# current round's updates (which would allow more edges).

def findCheapestPrice(n, flights, src, dst, k):
    dist = [float('inf')] * n
    dist[src] = 0

    for _ in range(k + 1):
        prev = dist[:]  # snapshot! prevents chain relaxation
        for u, v, w in flights:
            if prev[u] + w < dist[v]:
                dist[v] = prev[u] + w

    return dist[dst] if dist[dst] != float('inf') else -1

python
# 0-1 BFS: Shortest path when edge weights are 0 or 1
# Use a deque: push 0-weight edges to front, 1-weight to back.
# This maintains the deque in sorted order = valid priority queue.
# Time: O(V+E) — no log factor!

from collections import deque

def zero_one_bfs(adj, source, n):
    """adj[u] = [(v, w)] where w is 0 or 1."""
    dist = [float('inf')] * n
    dist[source] = 0
    dq = deque([source])

    while dq:
        u = dq.popleft()
        for v, w in adj[u]:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                if w == 0:
                    dq.appendleft(v)  # front: same distance
                else:
                    dq.append(v)      # back: +1 distance

    return dist

python
# Detect and extract a negative cycle
# Run BF for V passes (not V-1). If pass V changes a vertex,
# that vertex is on (or reachable from) a negative cycle.
# Trace predecessors V times to ensure we are IN the cycle,
# then walk until we return to the same vertex.

def find_negative_cycle(n, edges):
    """Returns the cycle as a list of vertex indices, or None."""
    dist = [0] * n   # start all at 0 to find ANY neg cycle
    pred = [-1] * n
    changed = -1

    for i in range(n):
        changed = -1
        for u, v, w in edges:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                pred[v] = u
                changed = v

    if changed == -1:
        return None  # no negative cycle

    # Walk back n times to land on the cycle
    v = changed
    for _ in range(n):
        v = pred[v]

    # Now v is on the cycle. Walk until we loop.
    cycle = []
    u = v
    while True:
        cycle.append(u)
        u = pred[u]
        if u == v:
            break
    cycle.append(v)
    return cycle[::-1]

Dimension 4: Debug Questions

Q: Your Dijkstra returns ∞ for a vertex you know is reachable.

A: Systematic checklist: (1) Is the graph directed? Maybe edges go the wrong way. For undirected graphs, did you add both directions? (2) Is the adjacency list correctly built? Common bug: overwriting instead of appending. (3) Is the source vertex correct? (4) Print the adjacency list for the source and verify edges exist. (5) Check for off-by-one if vertices are 0-indexed vs 1-indexed.

Q: Bellman-Ford is timing out on a graph with 100K vertices and 500K edges.

A: (1) Add early termination: if no d[v] changes in a pass, stop. (2) Use SPFA (Shortest Path Faster Algorithm): maintain a queue of "active" vertices whose d[v] changed; only relax edges from those vertices. Average case O(E), worst case still O(VE). (3) If the graph is a DAG, switch to topological-sort-based algorithm for O(V+E). (4) If all weights are non-negative, switch to Dijkstra for O((V+E) log V).

Q: Two different paths have the same total weight. Which one does Dijkstra return?

A: Whichever path's final edge was relaxed first. This depends on the edge ordering in the adjacency list and tie-breaking in the priority queue. To get the lexicographically smallest path, break ties in the priority queue by vertex label. To get all shortest paths, track multiple predecessors per vertex.

python
# Multi-state Dijkstra: Shortest Path with Alternating Colors
# LeetCode 1129. Each edge is red or blue. Find shortest path
# from 0 to each vertex using alternating colors.
# Key insight: state = (vertex, last_color_used).
# Run Dijkstra on this expanded state space.

def shortestAlternatingPaths(n, redEdges, blueEdges):
    adj = defaultdict(list)
    for u, v in redEdges:
        adj[(u, 'r')].append((v, 'b'))  # red edge, next must be blue
    for u, v in blueEdges:
        adj[(u, 'b')].append((v, 'r'))  # blue edge, next must be red

    dist = {}
    pq = [(0, 0, 'r'), (0, 0, 'b')]  # start with either color

    while pq:
        d, u, color = heapq.heappop(pq)
        if (u, color) in dist:
            continue
        dist[(u, color)] = d
        for v, next_color in adj.get((u, color), []):
            if (v, next_color) not in dist:
                heapq.heappush(pq, (d + 1, v, next_color))

    result = []
    for i in range(n):
        best = min(dist.get((i, 'r'), float('inf')),
                   dist.get((i, 'b'), float('inf')))
        result.append(best if best != float('inf') else -1)
    return result

The multi-state pattern. Whenever a shortest-path problem has an extra constraint (alternating colors, limited fuel, must-visit waypoints), expand the state space. Instead of (vertex), use (vertex, constraint_state). Then run Dijkstra on the expanded graph. The number of states is V × |constraint_states|, so the complexity grows proportionally. This pattern solves a huge family of "Dijkstra with a twist" interview problems.

python
# Template: Johnson's Algorithm for All-Pairs Shortest Paths
# with negative edges but no negative cycles.

def johnson(n, edges):
    """
    n: number of vertices (0-indexed)
    edges: list of (u, v, w) tuples
    Returns: n x n matrix of shortest distances, or None if neg cycle.
    """
    # Step 1: Add dummy source n with 0-weight edges to all vertices
    aug_edges = edges + [(n, v, 0) for v in range(n)]

    # Step 2: Bellman-Ford from dummy source to get potentials h[]
    h = [float('inf')] * (n + 1)
    h[n] = 0
    for _ in range(n):
        for u, v, w in aug_edges:
            if h[u] + w < h[v]:
                h[v] = h[u] + w

    # Check for negative cycle
    for u, v, w in aug_edges:
        if h[u] + w < h[v]:
            return None  # negative cycle exists

    # Step 3: Re-weight edges
    adj = defaultdict(list)
    for u, v, w in edges:
        adj[u].append((v, w + h[u] - h[v]))  # w' ≥ 0

    # Step 4: Dijkstra from each vertex
    result = [[float('inf')] * n for _ in range(n)]
    for src in range(n):
        d, _ = dijkstra(adj, src)
        for v in range(n):
            if v in d:
                result[src][v] = d[v] - h[src] + h[v]  # un-reweight

    return result

python
# Template: Bidirectional Dijkstra
# Run Dijkstra forward from source and backward from target.
# Stop when both searches "meet." Often 2-3x faster than
# unidirectional Dijkstra for point-to-point queries.

def bidirectional_dijkstra(adj_fwd, adj_bwd, source, target):
    """
    adj_fwd: forward adjacency list
    adj_bwd: backward adjacency list (reverse edges)
    """
    dist_f = {source: 0}
    dist_b = {target: 0}
    pq_f = [(0, source)]
    pq_b = [(0, target)]
    best = float('inf')

    while pq_f or pq_b:
        # Termination: both frontiers exhausted or can't improve
        min_f = pq_f[0][0] if pq_f else float('inf')
        min_b = pq_b[0][0] if pq_b else float('inf')
        if min_f + min_b >= best:
            break

        # Expand forward
        if min_f <= min_b and pq_f:
            d, u = heapq.heappop(pq_f)
            if d > dist_f.get(u, float('inf')):
                continue
            for v, w in adj_fwd.get(u, []):
                nd = d + w
                if nd < dist_f.get(v, float('inf')):
                    dist_f[v] = nd
                    heapq.heappush(pq_f, (nd, v))
                    if v in dist_b:
                        best = min(best, nd + dist_b[v])
        else:
            # Expand backward (symmetric)
            if pq_b:
                d, u = heapq.heappop(pq_b)
                if d > dist_b.get(u, float('inf')):
                    continue
                for v, w in adj_bwd.get(u, []):
                    nd = d + w
                    if nd < dist_b.get(v, float('inf')):
                        dist_b[v] = nd
                        heapq.heappush(pq_b, (nd, v))
                        if v in dist_f:
                            best = min(best, nd + dist_f[v])

    return best if best < float('inf') else -1

Dimension 4: Debug Questions (continued)

Q: Your shortest-path algorithm works on small test cases but fails on the full dataset. The distances are correct for some vertices but wrong for others.

A: Classic symptoms of integer overflow. If edge weights can be up to 10⁹ and paths can have up to 10⁵ edges, the total weight can reach 10¹⁴ — exceeding 32-bit integer range. Switch to 64-bit integers or use float('inf') initialization carefully (avoid adding to infinity).

Q: You run Dijkstra on an undirected graph and get TLE. The graph has V=10⁵ vertices and E=10⁵ edges. Your code should be O((V+E) log V) which is about 3M operations. Why is it slow?

A: Undirected graph means each edge appears twice in the adjacency list (both directions), so you have 2E = 2 × 10⁵ effective edges. But the real problem is likely the heap implementation. If you are not using lazy deletion properly, you may be pushing duplicate entries without the skip guard, causing the heap to grow to O(E²) entries. Or your vertex comparison in the heap is expensive (comparing strings instead of integers).

Dimension 5: Frontier Research

Contraction Hierarchies (Geisberger et al., 2008) — The gold standard for road network routing. Preprocessing: minutes. Query: microseconds. Used by OSRM, GraphHopper, and Google Maps' static routing layer. The key idea: repeatedly remove the "least important" vertex, adding shortcut edges to preserve distances. At query time, bidirectional Dijkstra only moves "upward" in the importance hierarchy, exploring ~500 nodes instead of millions.

Hub Labeling (Abraham et al., 2011) — Precompute a "label" for each vertex: a set of hub vertices with precomputed distances to each hub. Query = intersect the labels of source and target, find the hub that minimizes d(s, hub) + d(hub, t). Sub-microsecond queries but requires gigabytes of memory for large graphs. The theoretical foundation is the highway dimension of the graph.

Transit Node Routing (Bast et al., 2007) — For road networks, identify a small set (~10,000) of "transit nodes" (major highway intersections). Precompute all-pairs shortest paths between transit nodes. For any long-distance query, find the nearest transit nodes to source and target, then look up the precomputed table. Microsecond queries. Does not work well for local queries (within the same city).

Delta-Stepping (Meyer & Sanders, 1998) — A parallel shortest-path algorithm. Partition edges into "light" (weight ≤ Δ) and "heavy" (weight > Δ). Process light edges eagerly (they may cause chain relaxations within the same "bucket"), heavy edges lazily. Achieves near-linear speedup on multicore CPUs and GPUs. Used in graph analytics frameworks like Gunrock and GraphBLAS.

Customizable Route Planning (Delling et al., 2011) — Separate the graph structure from the edge weights. Precompute a weight-independent structure (like a contraction order), then quickly adapt to different weight functions (fastest, shortest, avoid tolls, prefer scenic roads). This enables real-time personalized routing without full re-preprocessing.

Learned Heuristics for A* (2020s) — Train graph neural networks to predict h(v) for specific graph families. The GNN learns structural features that correlate with shortest-path distances. Outperforms hand-crafted heuristics on structured domains like game maps, protein folding graphs, and VLSI circuit layouts. Still limited to domains with enough training data and consistent graph structure.

Thorup's Algorithm (1999) — For undirected graphs with integer weights, achieves O(V + E) time — linear! Uses a component hierarchy and clever RAM operations. Theoretically optimal but very complex to implement. Demonstrates that the O(log V) factor in Dijkstra is not fundamental but an artifact of comparison-based priority queues.

Quantum Shortest Paths — Grover's algorithm can speed up unstructured graph search from O(V) to O(√V). For APSP (all-pairs shortest paths), quantum algorithms achieve O(V^2.5) vs classical O(V³) for Floyd-Warshall. Practical quantum advantage for shortest paths remains a long way off, but it is an active area of research.

LeetCode Pattern Map

Pattern	Problems	Algorithm	Key Trick
Standard Dijkstra	743, 1514, 1631, 2290	Dijkstra + heap	Lazy deletion
K-limited Bellman-Ford	787	BF with pass limit	Snapshot dist before each pass
0-1 BFS	1368, 2290	Deque	0-weight front, 1-weight back
Multi-state Dijkstra	1129, 1786, 882	Dijkstra	State = (vertex, extra_info)
Negative cycles	Arbitrage, 1334	Bellman-Ford	V-th pass still relaxes
DAG shortest	Topo sort problems	Topo + relax	One pass suffices
Grid shortest path	1091, 1293, 490	BFS or Dijkstra	Unweighted = BFS, weighted = Dijkstra

The interview meta-strategy. When you see a shortest-path problem, use this decision tree: (1) Negative edges? → Bellman-Ford. (2) DAG? → Topological sort + relax. (3) Unweighted? → BFS. (4) Weights 0/1? → 0-1 BFS. (5) Limited stops/edges? → BF with pass limit. (6) Non-negative, need point-to-point? → A*. (7) Non-negative, general? → Dijkstra with heap. Master these seven patterns and you can solve any shortest-path problem in an interview.

How to Recognize a Shortest-Path Problem in Disguise

Many interview problems do not say "find the shortest path." They are disguised. Here are the telltale signs:

Problem Statement Contains...	Graph Model	Algorithm
"minimum cost to reach..."	States = vertices, transitions = edges	Dijkstra / BFS
"minimum number of operations to transform..."	States = configurations, ops = edges (weight 1)	BFS
"cheapest way with at most K stops..."	Direct graph with K-limited relaxation	BF with K+1 passes
"can you detect a profitable cycle..."	-log(rates) on complete graph	Bellman-Ford neg cycle
"minimum time to complete all tasks..."	Task DAG	Critical path (longest in DAG)
"minimum cost to connect grid cells..."	Grid graph, cell costs as weights	Dijkstra on grid
"shortest path with constraint X..."	Expanded state: (vertex, X_state)	Multi-state Dijkstra

Time Complexity Summary Table

This comprehensive table covers every algorithm variant discussed in this lesson. Reference it when comparing approaches.

Algorithm	Time	Space	Neg Edges	Neg Cycles	Notes
BFS	O(V+E)	O(V)	No	No	Unweighted only
0-1 BFS	O(V+E)	O(V)	No	No	Weights 0 or 1 only
Dial's	O(V+E+C)	O(V+C)	No	No	Integer weights [0,C]
DAG Shortest	O(V+E)	O(V+E)	Yes	N/A	DAGs only
Dijkstra (array)	O(V²+E)	O(V)	No	No	Best for dense graphs
Dijkstra (heap)	O((V+E) log V)	O(V+E)	No	No	Default choice
Dijkstra (Fib)	O(V log V + E)	O(V)	No	No	Theoretical only
Bellman-Ford	O(VE)	O(V)	Yes	Detects	Most general
SPFA	O(E) avg, O(VE) worst	O(V)	Yes	Detects	Queue-based BF
A*	h-dependent	O(V)	No	No	Point-to-point
Bidirectional Dijkstra	O((V+E) log V)	O(V)	No	No	~2x speedup P2P
Contraction Hierarchies	~μs per query	O(V+E)	No	No	Minutes preprocess

Complete Solution Templates

Here are copy-paste templates for the most common interview patterns. Each is tested and correct.

python
# Template 1: Dijkstra on a grid
# Find minimum cost to go from (0,0) to (n-1,m-1)
# where grid[r][c] is the cost to enter cell (r,c).

def min_cost_grid(grid):
    n, m = len(grid), len(grid[0])
    dist = [[float('inf')] * m for _ in range(n)]
    dist[0][0] = grid[0][0]
    pq = [(grid[0][0], 0, 0)]

    while pq:
        d, r, c = heapq.heappop(pq)
        if d > dist[r][c]:
            continue  # stale
        if r == n-1 and c == m-1:
            return d
        for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
            nr, nc = r+dr, c+dc
            if 0 <= nr < n and 0 <= nc < m:
                nd = d + grid[nr][nc]
                if nd < dist[nr][nc]:
                    dist[nr][nc] = nd
                    heapq.heappush(pq, (nd, nr, nc))
    return dist[n-1][m-1]

python
# Template 1b: Dijkstra with path reconstruction
# Returns both the distance and the actual path.

def dijkstra_with_path(adj, source, target):
    """Returns (distance, path) where path is a list of vertices."""
    dist = {}
    pred = {}
    pq = [(0, source)]
    best = {source: 0}

    while pq:
        d, u = heapq.heappop(pq)
        if u in dist:
            continue
        dist[u] = d
        if u == target:
            # Reconstruct path
            path = []
            v = target
            while v is not None:
                path.append(v)
                v = pred.get(v)
            return d, path[::-1]

        for v, w in adj.get(u, []):
            if v not in dist:
                nd = d + w
                if nd < best.get(v, float('inf')):
                    best[v] = nd
                    pred[v] = u
                    heapq.heappush(pq, (nd, v))

    return float('inf'), []

python
# Template 2: BFS shortest path (unweighted)
# Minimum number of operations to transform start to goal.

from collections import deque

def min_operations(start, goal, get_neighbors):
    """get_neighbors(state) returns list of next states."""
    if start == goal:
        return 0
    visited = {start}
    q = deque([(start, 0)])
    while q:
        state, d = q.popleft()
        for nxt in get_neighbors(state):
            if nxt == goal:
                return d + 1
            if nxt not in visited:
                visited.add(nxt)
                q.append((nxt, d + 1))
    return -1  # unreachable

python
# Template 3: Dijkstra with state expansion
# Shortest path with at most K obstacles removed.
# State = (row, col, obstacles_removed)

def shortest_path_k_obstacles(grid, k):
    n, m = len(grid), len(grid[0])
    dist = {}
    pq = [(0, 0, 0, 0)]  # (dist, r, c, removed)

    while pq:
        d, r, c, rem = heapq.heappop(pq)
        if (r, c, rem) in dist:
            continue
        dist[(r, c, rem)] = d
        if r == n-1 and c == m-1:
            return d
        for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
            nr, nc = r+dr, c+dc
            if 0 <= nr < n and 0 <= nc < m:
                new_rem = rem + grid[nr][nc]
                if new_rem <= k and (nr,nc,new_rem) not in dist:
                    heapq.heappush(pq, (d+1, nr, nc, new_rem))
    return -1

Common Interview Mistakes and How to Avoid Them

Mistake	Frequency	How to Avoid
Using Dijkstra on negative edges	Very common	Always ask: "Are there negative edges?" before choosing algorithm
Forgetting to handle unreachable vertices	Common	Check dist == inf before using distances
Off-by-one in k-stop Bellman-Ford	Common	k stops = k+1 edges = k+1 passes
Not snapshotting dist in k-limited BF	Very common	Always use prev = dist[:] to prevent chain relaxation
Forgetting to build reverse graph for backward search	Moderate	Draw the graph on paper first
Inadmissible A* heuristic	Moderate	Verify h(v) ≤ true distance for test cases
String vertex labels crashing heapq	Moderate	Use (dist, int_id) as heap tuples
Not handling self-loops or parallel edges	Rare but tricky	Test on edge cases: loops, multi-edges, disconnected graphs

Interview Warm-Up: Quick Decision Problems

For each scenario below, immediately identify the correct algorithm. No waffling — in an interview you need instant pattern recognition.

#	Problem	Answer
1	Find shortest path in a maze (unweighted grid)	BFS
2	Find minimum fuel cost between cities (positive weights)	Dijkstra
3	Find minimum time with at most 2 transfers	BF with 3 passes
4	Find build order and critical path for a project	DAG shortest (negated)
5	Detect if currency exchange cycle is profitable	BF negative cycle detection
6	Shortest path from A to B on a map with coordinates	A* with Euclidean heuristic
7	Shortest path where some edges are free (weight 0 or 1)	0-1 BFS
8	Shortest path avoiding certain nodes (blocked cells)	Dijkstra (exclude blocked from adj)
9	Shortest path where you can remove up to K walls	Multi-state Dijkstra: (r,c,walls_removed)
10	All-pairs shortest paths with some negative edges	Johnson's (BF re-weight + V Dijkstras)

Closing Thoughts

Edsger Dijkstra published his algorithm in 1959 — in a three-page paper. Richard Bellman and Lester Ford developed theirs independently in the late 1950s. These algorithms are over 65 years old and remain central to computer science. Every time you open a map app, send a packet across the internet, or play a video game, you are using their ideas.

The algorithms themselves are simple. The art lies in knowing which one to apply, how to model the problem as a graph, and how to handle the edge cases (negative weights, disconnected components, limited resources). Master the decision tree above, practice the coding templates, and you will handle any shortest-path problem thrown at you in an interview.

Complete Algorithm Decision Flowchart

Is the graph unweighted?

Yes → BFS (O(V+E)). No → continue.

↓ No

Are all edge weights 0 or 1?

Yes → 0-1 BFS with deque (O(V+E)). No → continue.

↓ No

Is the graph a DAG?

Yes → Topological sort + single relaxation pass (O(V+E)). Works with negative edges. No → continue.

↓ No

Are there negative edges?

Yes → Bellman-Ford (O(VE)). Also detects negative cycles. No → continue.

↓ No negative edges

Do you need single-source or point-to-point?

Single-source → Dijkstra with binary heap (O((V+E) log V)).
Point-to-point with coordinates → A* with admissible heuristic.

↓ Need repeated queries

Static graph with many queries?

Yes → Preprocess with contraction hierarchies. Microsecond queries.

Dijkstra's own words. When asked about the design of his algorithm, Dijkstra said: "What is the shortest way to travel from Rotterdam to Groningen? It is the algorithm for the shortest path, which I designed in about twenty minutes. One morning I was shopping in Amsterdam with my young fiancee, and tired, we sat down on the cafe terrace to drink a cup of coffee. I was just thinking about whether I could do this, and I then designed the algorithm for the shortest path. It was a twenty-minute invention."

Final check: A graph has negative edge weights but NO negative-weight cycles. You need shortest paths from a single source to all vertices as fast as possible. What is the best approach?

Dijkstra, since there are no negative cycles Bellman-Ford in O(VE) — correct but not the fastest possible Johnson's re-weighting: run Bellman-Ford ONCE to compute a potential function h(v), then re-weight every edge as w'(u,v) = w(u,v) + h(u) - h(v). All re-weighted edges are non-negative (proven by the triangle inequality). Then run Dijkstra on the re-weighted graph. For single-source: O(VE + (V+E) log V). The Bellman-Ford cost dominates, but for repeated queries (all-pairs), this gives O(VE + V(V+E) log V) vs BF's O(V²E)

Key Relationships Between Chapters

This lesson's chapters form a dependency DAG of their own (fitting, given the topic):

If you are short on time: read Chapters 0, 1, 4 (Dijkstra), and 9 (Arsenal). This covers the most interview-relevant material. Come back for Bellman-Ford (Ch 2) and DAG shortest paths (Ch 3) when you encounter problems that need negative-edge handling or DAG structure. The A* chapter (Ch 7) is valuable for game AI and robotics interviews.

References

For the deepest understanding, read CLRS Chapter 24 alongside this lesson. The textbook provides formal proofs with full mathematical rigor; this lesson provides the intuition, interactive simulations, and practical implementation details that the textbook intentionally omits.

Source	Coverage
CLRS Ch. 24 (Cormen et al.)	Bellman-Ford, DAG shortest paths, Dijkstra, correctness proofs
Dijkstra, "A note on two problems in connexion with graphs" (1959)	Original Dijkstra paper
Hart, Nilsson, Raphael, "A Formal Basis for the Heuristic Determination of Minimum Cost Paths" (1968)	A* search algorithm
Geisberger et al., "Contraction Hierarchies" (2008)	Modern road routing
Fredman & Tarjan, "Fibonacci heaps" (1987)	O(V log V + E) Dijkstra
Meyer & Sanders, "Delta-Stepping" (1998)	Parallel shortest paths
Abraham et al., "Hub Labeling" (2011)	Sub-microsecond queries
Bast et al., "Route Planning in Transportation Networks" (2016)	Survey of practical algorithms
Bellman, "On a routing problem" (1958)	Bellman-Ford algorithm
Ford, "Network Flow Theory" (1956)	Ford's contribution to Bellman-Ford
Korf, "Depth-first iterative-deepening: An optimal admissible tree search" (1985)	IDA* search
Johnson, "Efficient algorithms for shortest paths in sparse networks" (1977)	Johnson's re-weighting technique
Thorup, "Undirected single-source shortest paths with positive integer weights in linear time" (1999)	O(V+E) algorithm
Delling et al., "Customizable Route Planning in Road Networks" (2017)	Separating structure from weights