Dijkstra, Bellman-Ford, DAG shortest paths — finding optimal routes through weighted graphs.
You open Google Maps, type "coffee shop near me," and it returns a route in 200 milliseconds. Behind that response is one of the most important algorithms in computer science: a shortest path algorithm running on a graph with millions of intersections and road segments, each weighted by travel time.
Let us build the intuition from scratch. A weighted graph is a set of vertices (cities, intersections, routers) connected by edges, where each edge has a numerical weight (distance, time, cost). The single-source shortest paths problem asks: given a starting vertex s, find the shortest-weight path from s to every other vertex in the graph.
This sounds simple. Just pick the shortest edge at every step, right? That greedy strategy fails spectacularly. The simulation below shows exactly why.
The greedy path (always picks the lightest outgoing edge) is shown in orange. The true shortest path is shown in teal. Click "Run" to animate both strategies.
In the demo above, the greedy algorithm takes the edge with weight 1 from A, then gets stuck on a long path costing 10 total. The optimal path takes a heavier first edge (weight 3) but reaches the goal with total cost 5. Greedy fails because locally optimal choices do not guarantee a globally optimal path.
Think of it like driving. The freeway entrance is 3 miles away, but once you are on the freeway, you travel at 60 mph. The side street is right here (0 miles), but it is clogged with traffic and takes 30 minutes. Greedy picks the side street because the first step is shorter. A smarter algorithm considers the total journey.
Shortest paths show up everywhere in computer science and beyond:
| Domain | Vertices | Edges | Weights |
|---|---|---|---|
| GPS navigation | Intersections | Road segments | Travel time or distance |
| Network routing | Routers | Links | Latency or bandwidth cost |
| Social networks | People | Friendships | Degrees of separation |
| Game AI | Grid cells or waypoints | Movement | Terrain cost |
| Currency arbitrage | Currencies | Exchange pairs | -log(exchange rate) |
| Project scheduling | Tasks | Dependencies | Task duration |
Over the next chapters, we will build three different algorithms, each suited to a different type of graph:
The key theoretical property that makes shortest paths tractable is optimal substructure: sub-paths of shortest paths are themselves shortest paths.
Formally: if p = v0 → v1 → ... → vk is a shortest path from v0 to vk, then for any 0 ≤ i ≤ j ≤ k, the sub-path vi → ... → vj is a shortest path from vi to vj.
Proof by contradiction: If there were a shorter path from vi to vj, we could splice it into p, replacing the sub-path vi → ... → vj. This would give a path from v0 to vk with smaller total weight than p, contradicting the assumption that p is a shortest path.
This property is crucial because it means we can solve shortest paths incrementally: knowing the shortest path to u, we can extend it by one edge to get a candidate shortest path to v. This is exactly what relaxation does.
Let us nail down the notation before diving in. Given a weighted, directed graph G = (V, E) with weight function w : E → R:
That last case is critical: if you can reach a negative-weight cycle on the way from s to v, you can loop around that cycle forever, driving the total weight to negative infinity. There is no finite shortest path.
Before moving on, let us distinguish shortest paths from related problems that students often confuse:
| Problem | Question | Algorithm | Time |
|---|---|---|---|
| Reachability | Can we get from s to v? | BFS or DFS | O(V+E) |
| Shortest path (unweighted) | Fewest edges from s to v? | BFS | O(V+E) |
| Shortest path (weighted) | Minimum total weight from s to v? | Dijkstra / BF / DAG | Varies |
| Longest path (general) | Maximum total weight from s to v? | NP-hard in general! | No efficient algo |
| Longest path (DAG) | Maximum weight in a DAG? | Negate + DAG shortest | O(V+E) |
| All-pairs shortest | Shortest between every pair? | Floyd-Warshall / Johnson | O(V³) / O(VE + V(V+E)logV) |
Before we dive into any specific algorithm, there is one fundamental operation that all three share: relaxation. That is Chapter 1.
Every shortest path algorithm in this chapter — Bellman-Ford, DAG shortest paths, Dijkstra — is built from a single primitive operation called relaxation. Master this one operation and the algorithms become trivial variations of "which order do we relax edges in?"
Before any algorithm runs, we initialize two arrays for every vertex v in the graph:
Think of d[v] as a sticky note on each vertex saying "the best we have found so far." Initially, we know nothing (d = ∞) except that the source is distance 0 from itself. As the algorithm runs, these sticky notes get updated to smaller and smaller values until they reach the true shortest-path weight δ(s, v).
Relaxing an edge (u, v) with weight w(u, v) means: "Can we improve the shortest-path estimate to v by going through u?" If the current best path to u, plus the edge from u to v, is shorter than the current best path to v directly, then we have found a better path. Update d[v] and record u as v's predecessor.
The name comes from physics. Think of d[v] as a spring stretched to some length. Relaxation tries to let the spring relax to a shorter, less tense state. The value d[v] can only decrease over time — it is an upper bound that gradually tightens toward the true shortest-path weight δ(s, v).
Imagine stretching a rubber band from s to v. The length of the rubber band is d[v] — our current estimate. The relaxation operation checks: "If I route the rubber band through u instead, can I let it relax to a shorter length?" If yes, the band snaps to the shorter configuration. It only gets shorter, never longer, and eventually reaches its minimum length δ(s, v) — the natural rest state. This is why the operation is called relaxation.
The analogy is precise. In physics, a system relaxes toward its minimum energy state. In shortest paths, the estimate d[v] relaxes toward the minimum-weight path δ(s, v). Both processes are monotonically decreasing and convergent.
Relaxation has beautiful mathematical properties that guarantee correctness. Let δ(s, v) denote the true shortest-path weight from s to v.
1. Upper-bound property. We always have d[v] ≥ δ(s, v) after initialization, and relaxation can only decrease d[v]. It never goes below the true shortest-path weight. Once d[v] = δ(s, v), it stays there forever.
2. No-path property. If there is no path from s to v, then δ(s, v) = ∞ and d[v] = ∞ for all time. No sequence of relaxations can create a path that does not exist in the graph.
3. Convergence property. If s ↝ u → v is a shortest path (meaning the shortest path to v ends with edge (u, v)), and if d[u] = δ(s, u) at some point before we relax edge (u, v), then after that relaxation, d[v] = δ(s, v). In other words: once the predecessor on the shortest path has the correct distance, one relaxation of the last edge locks in the correct answer for v.
4. Path-relaxation property. If p = v0 → v1 → ... → vk is a shortest path from s = v0 to vk, and the edges (v0, v1), (v1, v2), ..., (vk-1, vk) are relaxed in that order (not necessarily consecutively), then d[vk] = δ(s, vk). This is why Bellman-Ford works: V-1 passes guarantee that the edges of any shortest path are relaxed in the correct order.
Let us trace relaxation on a concrete graph. Vertices: s, A, B, C. Edges: s→A (weight 5), s→B (weight 10), A→B (weight 3), A→C (weight 2), B→C (weight 1).
After initialization: d[s]=0, d[A]=∞, d[B]=∞, d[C]=∞.
| Step | Relax | Test | Result | d[s] | d[A] | d[B] | d[C] |
|---|---|---|---|---|---|---|---|
| 1 | (s, A, 5) | ∞ > 0+5? | Yes, d[A]=5 | 0 | 5 | ∞ | ∞ |
| 2 | (s, B, 10) | ∞ > 0+10? | Yes, d[B]=10 | 0 | 5 | 10 | ∞ |
| 3 | (A, B, 3) | 10 > 5+3? | Yes, d[B]=8 | 0 | 5 | 8 | ∞ |
| 4 | (A, C, 2) | ∞ > 5+2? | Yes, d[C]=7 | 0 | 5 | 8 | 7 |
| 5 | (B, C, 1) | 7 > 8+1? | No change | 0 | 5 | 8 | 7 |
Notice step 3: the direct path s→B has weight 10, but going s→A→B has weight 5+3=8. Relaxation discovered the shorter route through A. And step 5 did nothing, because the path through B (cost 8+1=9) is worse than the path through A (cost 5+2=7).
Click "Step" to relax edges one at a time. Watch d[v] values decrease. The currently relaxed edge highlights in orange.
Click Step to begin.python def initialize_single_source(graph, s): """Set d[v] = inf, pi[v] = None for all v, then d[s] = 0.""" d = {v: float('inf') for v in graph.vertices} pi = {v: None for v in graph.vertices} d[s] = 0 return d, pi def relax(u, v, w, d, pi): """Try to improve d[v] by going through u. Returns True if the estimate was improved.""" if d[v] > d[u] + w: d[v] = d[u] + w # shorter path found! pi[v] = u # update predecessor return True # relaxation changed something return False # no improvement
The predecessor pointers π[v] define a tree rooted at s. This is the shortest-path tree: for each vertex v reachable from s, the unique path from s to v in this tree is a shortest path in the original graph.
Properties of the shortest-path tree:
| Property | What It Means |
|---|---|
| Rooted at s | π[s] = NIL; all paths start at s |
| V-1 edges (if all reachable) | A tree on V vertices has exactly V-1 edges |
| Sub-path of tree path = shortest | Any prefix of a tree path from s is also a shortest path |
| Not unique | Multiple shortest-path trees may exist (different tie-breaking) |
| Subgraph of G | Every edge in the tree is an edge in the original graph |
Visualize the shortest-path tree: if you draw only the edges (u, v) where π[v] = u, you get a tree that fans out from s. In a road network, this tree looks like a river system — a trunk road from s branching into smaller roads to reach every neighborhood.
Path reconstruction is O(V) in the worst case (following predecessors from v back to s). To reconstruct the path to a specific vertex:
python def reconstruct(pi, v): """Follow predecessors from v to source.""" path = [] while v is not None: path.append(v) v = pi[v] return path[::-1] # reverse: source → v
The same set of relaxations can produce different intermediate results depending on the order. Consider two edges: s→A (5), s→B (3), B→A (1). If we relax s→A first, d[A] = 5. Then s→B: d[B] = 3. Then B→A: d[A] = min(5, 3+1) = 4. But if we relax in the order s→B, B→A, s→A: after s→B d[B]=3, after B→A d[A]=4, after s→A d[A]=min(4,5)=4. Same final answer, but different intermediate states. The final answer is always the same — this is guaranteed by the path-relaxation property — but the journey differs.
Now that we have the RELAX primitive, let us build our first complete shortest-path algorithm. The Bellman-Ford algorithm is brutally simple: relax every edge, and do it V-1 times. That is the entire algorithm.
A shortest path in a graph with V vertices has at most V-1 edges. Why? Because a shortest path never revisits a vertex (if it did, it would contain a cycle, and removing that cycle would give a shorter or equal-weight path — unless the cycle has negative weight). So V vertices means at most V-1 edges in any shortest path.
After pass k of Bellman-Ford, every vertex reachable from s via a shortest path of at most k edges has its correct d[v] value. After V-1 passes, all shortest paths (which have at most V-1 edges) are correct. This follows directly from the path-relaxation property we proved in Chapter 1.
The genius of Bellman-Ford is the second phase. After V-1 passes, all shortest-path distances should be finalized. If we scan all edges one more time and can still relax some edge (u, v) — meaning d[v] > d[u] + w(u, v) — it means there is a negative-weight cycle reachable from s.
Why? If all shortest paths had finite weight, V-1 passes would have found them all. The only reason relaxation is still possible is that going around a cycle keeps reducing the total weight. You can loop forever, so no finite shortest path exists to any vertex reachable from that cycle.
Consider a graph with vertices {s, A, B, C, D} and edges including some negative weights: s→A (6), s→B (7), A→B (8), A→C (5), A→D (-4), B→C (-3), B→D (9), C→B (-2), D→C (7).
The negative edges A→D (-4) and B→C (-3) and C→B (-2) make Dijkstra fail, but Bellman-Ford handles them perfectly. Let us trace it.
Watch all edges get relaxed in each pass. Distance estimates update (shown in boxes). After V-1 passes, the algorithm checks for negative cycles. Click "Pass" to execute one full pass over all edges.
Pass 0 of 4. Click Pass to begin.In the simulation, watch how edge B→C with weight -3 allows Bellman-Ford to find a shorter path to C through B. And C→B with weight -2 creates a feedback loop between B and C that settles over multiple passes. This cascading effect of negative edges is exactly what Dijkstra cannot handle — but Bellman-Ford patiently resolves by repeated passes.
Let us trace d[v] after each complete pass, relaxing edges in the order they appear above:
| After Pass | d[s] | d[A] | d[B] | d[C] | d[D] |
|---|---|---|---|---|---|
| Init | 0 | ∞ | ∞ | ∞ | ∞ |
| 1 | 0 | 6 | 7 | 4 | 2 |
| 2 | 0 | 6 | 2 | 4 | 2 |
| 3 | 0 | 6 | 2 | -1 | 2 |
| 4 | 0 | 6 | -3 | -1 | 2 |
After pass 4 (V-1 = 5-1 = 4), we run the negative-cycle check: scan all edges and see if any can still be relaxed. In this graph, none can, so there is no negative cycle. The final distances are correct.
This canvas shows a smaller graph where you can trace individual edge relaxations within a pass. Click "Relax Next" to process one edge at a time. The currently tested edge turns orange; green flash = update happened.
Click Relax Next to begin.| Component | Cost | Why |
|---|---|---|
| Initialization | O(V) | Set d[v] = ∞ for each vertex |
| Main loop | O(VE) | V-1 passes, each relaxing all E edges |
| Neg-cycle check | O(E) | One scan of all edges |
| Total | O(VE) |
For dense graphs where E = O(V²), this is O(V³). That is slow. But it is the price for handling negative edges and detecting negative cycles.
Bellman-Ford needs only O(V) extra space: the d[] and π[] arrays. The graph itself takes O(V + E) for an adjacency list or O(E) for an edge list. Bellman-Ford works naturally with an edge list (it iterates over all edges per pass), so you do not even need an adjacency list. This makes it slightly simpler to implement than Dijkstra.
After pass i, every vertex reachable from s via a shortest path of at most i edges has its correct d[v]. This follows from the path-relaxation property: if the shortest path to v is s = v0 → v1 → ... → vk with k ≤ i, then during pass 1 we relax (v0, v1), making d[v1] correct. During pass 2 we relax (v1, v2), making d[v2] correct (since d[v1] is already correct from pass 1). And so on. After pass k ≤ V-1, d[vk] = δ(s, vk).
The negative-cycle detection works because: if no negative cycle exists, then all shortest paths have at most V-1 edges, so V-1 passes suffice. If a negative cycle exists, then some edges can be relaxed forever (going around the cycle always reduces the total). The V-th pass catches this.
python def bellman_ford(vertices, edges, source): """ vertices: list of vertex labels edges: list of (u, v, weight) tuples source: starting vertex Returns: (dist, pred, has_neg_cycle) """ # Initialize dist = {v: float('inf') for v in vertices} pred = {v: None for v in vertices} dist[source] = 0 # Phase 1: V-1 relaxation passes for i in range(len(vertices) - 1): changed = False for u, v, w in edges: if dist[u] + w < dist[v]: dist[v] = dist[u] + w pred[v] = u changed = True if not changed: break # early termination: no changes = done # Phase 2: Negative-cycle detection for u, v, w in edges: if dist[u] + w < dist[v]: return dist, pred, True # negative cycle! return dist, pred, False # Example verts = ['s', 'A', 'B', 'C', 'D'] edges = [ ('s', 'A', 6), ('s', 'B', 7), ('A', 'B', 8), ('A', 'C', 5), ('A', 'D', -4), ('B', 'C', -3), ('B', 'D', 9), ('C', 'B', -2), ('D', 'C', 7), ] d, pi, neg = bellman_ford(verts, edges, 's') print(d) # shortest distances from s print(neg) # False (no negative cycle)
changed flag.This is a common interview question. The answer comes from a simple counting argument. A shortest path in a graph with V vertices visits each vertex at most once (assuming no negative cycles). A path that visits k distinct vertices has k-1 edges. Since k ≤ V, the maximum number of edges in any shortest path is V-1.
After pass i of Bellman-Ford, we have correct distances for all vertices reachable via shortest paths with at most i edges. After pass V-1, we have correct distances for all shortest paths with at most V-1 edges — which is all of them. So V-1 passes suffice.
Could V-2 passes suffice? No. Consider a chain: s → v1 → v2 → ... → vV-1. The shortest path to vV-1 has exactly V-1 edges. After V-2 passes, d[vV-1] might not be correct yet (it depends on the edge relaxation order within each pass). So V-1 is tight.
What about the V-th pass? The V-th pass is ONLY used for negative-cycle detection. If it produces any relaxation, a negative cycle exists. If it does not, V-1 passes were already sufficient.
Let us make negative cycles concrete. Consider three currencies: USD, EUR, GBP. Exchange rates: 1 USD = 0.9 EUR, 1 EUR = 0.8 GBP, 1 GBP = 1.5 USD. Starting with 1 USD, you can convert:
In graph terms: edge weights are -log(rate). The cycle USD → EUR → GBP → USD has total weight -log(0.9) + (-log(0.8)) + (-log(1.5)) = 0.105 + 0.223 + (-0.405) = -0.077. The sum is negative, so Bellman-Ford's V-th pass detects it as a relaxable edge.
The Shortest Path Faster Algorithm (SPFA) is a queue-based optimization of Bellman-Ford. Instead of relaxing all edges in every pass, SPFA maintains a queue of "active" vertices — those whose d[v] changed recently. Only edges from active vertices are relaxed.
python from collections import deque def spfa(adj, source, n): """SPFA: Queue-based Bellman-Ford optimization. Average case O(E), worst case still O(VE).""" dist = [float('inf')] * n dist[source] = 0 in_queue = [False] * n count = [0] * n # how many times each vertex entered queue q = deque([source]) in_queue[source] = True while q: u = q.popleft() in_queue[u] = False for v, w in adj[u]: if dist[u] + w < dist[v]: dist[v] = dist[u] + w if not in_queue[v]: q.append(v) in_queue[v] = True count[v] += 1 if count[v] >= n: return None # negative cycle! return dist
SPFA is popular in competitive programming because its average case is O(E) — much faster than Bellman-Ford's O(VE). But its worst case is the same, and it can be slower in practice due to cache misses from random queue access patterns. Use it when you need speed and are willing to accept worst-case O(VE).
Let us compare the two most general algorithms head-to-head:
| Aspect | Bellman-Ford | Dijkstra (binary heap) |
|---|---|---|
| Time | O(VE) | O((V+E) log V) |
| Space | O(V) + edge list | O(V+E) + heap |
| Negative edges | Yes | No (incorrect results) |
| Negative cycle detection | Yes | No |
| Graph representation | Edge list (simplest) | Adjacency list (for fast neighbor access) |
| Distributed? | Naturally (each vertex = independent agent) | Hard (needs global priority queue) |
| Early termination | If no changes in a pass | When target extracted (point-to-point) |
| Implementation complexity | ~15 lines | ~25 lines |
| When to use | Negative edges, small graphs, distributed settings | Non-negative edges, any size graph |
The speed difference is substantial. On a graph with V=10,000 and E=50,000:
If your graph has no cycles — a directed acyclic graph (DAG) — you can solve shortest paths in just O(V + E) time. That is linear in the size of the graph. You cannot do better, because you have to look at every vertex and every edge at least once.
In a DAG, vertices can be arranged in a topological order: a linear ordering where every edge goes from left to right. The magic is that when you process a vertex u in topological order, every vertex that could be on a shortest path to u has already been processed. So d[u] is already correct when you relax u's outgoing edges.
This means one pass through the vertices (in topological order), relaxing each vertex's outgoing edges, is sufficient. Compare this to Bellman-Ford, which needs V-1 passes because it does not know the right order.
That is the entire algorithm. No priority queue, no repeated passes, no negative-cycle check (there are no cycles in a DAG by definition). Just one scan in topological order. Elegant, fast, and correct.
Let us verify the time complexity. Topological sort (via DFS or Kahn's algorithm) takes O(V+E). The main loop processes each vertex once and each edge once (each edge (u,v) is relaxed exactly once, when u is processed). Total: O(V+E) + O(V+E) = O(V+E).
Space: O(V) for the distance and predecessor arrays, plus O(V+E) for the adjacency list and topological order. Total O(V+E).
Consider a shortest path s = v0 → v1 → ... → vk. In topological order, v0 appears before v1, which appears before v2, and so on. When we process v0, we relax (v0, v1), setting d[v1] = δ(s, v1). When we later process v1, we relax (v1, v2), and since d[v1] is already correct, d[v2] gets its correct value. This cascade continues along the entire path.
The path-relaxation property guarantees correctness: since the edges of any shortest path are relaxed in order, all d[v] values are correct after one pass.
Let us trace the algorithm on our example DAG. Vertices: 0, 1, 2, 3, 4, 5. Source: 0. Topological order: 0, 1, 2, 3, 4, 5. Edges include some negative weights.
| Process Vertex | Relax Edges | d[0] | d[1] | d[2] | d[3] | d[4] | d[5] |
|---|---|---|---|---|---|---|---|
| (init) | - | 0 | ∞ | ∞ | ∞ | ∞ | ∞ |
| 0 | 0→1(5), 0→2(3) | 0 | 5 | 3 | ∞ | ∞ | ∞ |
| 1 | 1→3(6), 1→2(2) | 0 | 5 | 3 | 11 | ∞ | ∞ |
| 2 | 2→4(4), 2→5(2), 2→3(7) | 0 | 5 | 3 | 10 | 7 | 5 |
| 3 | 3→4(-1), 3→5(1) | 0 | 5 | 3 | 10 | 7 | 5 |
| 4 | 4→5(-2) | 0 | 5 | 3 | 10 | 7 | 5 |
| 5 | (none) | 0 | 5 | 3 | 10 | 7 | 5 |
Wait — something seems off. Let us recheck. When we process vertex 2, we relax 2→3 with weight 7: d[3] = min(11, 3+7) = min(11, 10) = 10. So d[3] updates from 11 to 10. When we process vertex 3, we relax 3→4 with weight -1: d[4] = min(7, 10+(-1)) = min(7, 9) = 7. No change. And 3→5 with weight 1: d[5] = min(5, 10+1) = min(5, 11) = 5. No change.
When we process vertex 4, we relax 4→5 with weight -2: d[5] = min(5, 7+(-2)) = min(5, 5) = 5. No change (ties do not count as updates). Final distances: {0:0, 1:5, 2:3, 3:10, 4:7, 5:5}.
The shortest path from 0 to 5 has weight 5, achieved via 0→2→5 (3+2=5). The alternative 0→2→4→5 also gives 3+4+(-2)=5 — same weight, different path. The algorithm finds one of them (whichever edge was relaxed first sets the predecessor).
Vertices are laid out in topological order (left to right). Green = processed. Orange = currently being processed. Click "Step" to process the next vertex and relax its outgoing edges. Some edges have negative weights (shown in red).
Ready. Source is vertex 0.Here is a beautiful real-world application. In project scheduling, tasks are vertices and dependencies are directed edges. The edge weight is the task duration. A project is modeled as a DAG (tasks have dependencies but no circular ones).
The longest path from the project start to the project end is the critical path — the minimum time to complete the entire project. Any delay on a critical-path task delays the entire project. Non-critical tasks have slack: they can be delayed without affecting the project completion time.
How to find the longest path in a DAG? Negate all edge weights and run DAG shortest paths. The shortest path in the negated graph is the longest path in the original graph.
Project tasks: Start → Design (3 days) → Code (5 days) → Test (2 days). Also Start → Write Docs (4 days) → Test. We want the critical path (longest).
| Edge | Original w | Negated w' |
|---|---|---|
| Start → Design | 3 | -3 |
| Start → Write Docs | 4 | -4 |
| Design → Code | 5 | -5 |
| Write Docs → Test | 0 | 0 |
| Code → Test | 2 | -2 |
DAG shortest paths on negated weights gives d[Test] = -10 (path: Start → Design → Code → Test). The longest path = -(-10) = 10 days. The docs path takes only 4 days, so it has 6 days of slack.
The DAG shortest paths algorithm is a special case of a much broader pattern: dynamic programming on DAGs. Any DP problem where the dependency structure is acyclic can be solved by topological sort + single pass. This includes:
| Problem | DAG Structure | Relaxation |
|---|---|---|
| Shortest paths | Graph edges | d[v] = min(d[u] + w(u,v)) |
| Longest paths (critical path) | Graph edges (negated) | d[v] = max(d[u] + w(u,v)) |
| Number of paths | Graph edges | count[v] += count[u] |
| Longest increasing subseq. | a[i] < a[j] for i < j | LIS[j] = max(LIS[i] + 1) |
python from collections import defaultdict, deque def topological_sort(adj, vertices): """Kahn's algorithm: BFS-based topological sort.""" in_deg = {v: 0 for v in vertices} for u in adj: for v, w in adj[u]: in_deg[v] += 1 q = deque([v for v in vertices if in_deg[v] == 0]) order = [] while q: u = q.popleft() order.append(u) for v, w in adj.get(u, []): in_deg[v] -= 1 if in_deg[v] == 0: q.append(v) return order def dag_shortest_paths(adj, vertices, source): """O(V+E) shortest paths on a DAG.""" order = topological_sort(adj, vertices) dist = {v: float('inf') for v in vertices} pred = {v: None for v in vertices} dist[source] = 0 for u in order: if dist[u] == float('inf'): continue # unreachable from source, skip for v, w in adj.get(u, []): if dist[u] + w < dist[v]: dist[v] = dist[u] + w pred[v] = u return dist, pred # Example DAG with negative edges adj = defaultdict(list) adj[0] = [(1, 5), (2, 3)] adj[1] = [(3, 6), (2, 2)] adj[2] = [(4, 4), (5, 2), (3, 7)] adj[3] = [(4, -1), (5, 1)] adj[4] = [(5, -2)] d, pi = dag_shortest_paths(adj, range(6), 0) print(d) # {0:0, 1:5, 2:3, 3:7, 4:6, 5:4}
| Algorithm | Time | Negative Edges? | Negative Cycles? | Graph Type |
|---|---|---|---|---|
| BFS | O(V+E) | No | No | Unweighted |
| DAG Shortest | O(V+E) | Yes | N/A (no cycles) | DAGs only |
| Dijkstra | O((V+E) log V) | No | No | General |
| Bellman-Ford | O(VE) | Yes | Detects | General |
DAGs appear everywhere in software systems:
| Domain | Vertices | Edges | Application |
|---|---|---|---|
| Build systems (Make, Bazel) | Build targets | Dependencies | Determine build order, find critical build path |
| Package managers (pip, npm) | Packages | Dependencies | Resolve installation order |
| Spreadsheets | Cells | Formula references | Recalculation order |
| Neural networks | Layers/ops | Data flow | Forward pass execution order |
| Git commit history | Commits | Parent pointers | Find merge base, rebase order |
| Course prerequisites | Courses | Prerequisites | Valid enrollment order |
Whenever you see a dependency structure with no circular dependencies, think "DAG" and "topological sort." The O(V+E) shortest path algorithm is one of many algorithms that become possible once you have a topological order.
A common extension: count the NUMBER of shortest paths from s to each vertex. This is easily done alongside the shortest-path computation:
python def dag_shortest_with_count(adj, vertices, source): """Returns (dist, count) where count[v] = number of shortest paths to v.""" order = topological_sort(adj, vertices) dist = {v: float('inf') for v in vertices} count = {v: 0 for v in vertices} dist[source] = 0 count[source] = 1 # one path from source to itself: the empty path for u in order: if dist[u] == float('inf'): continue for v, w in adj.get(u, []): new_d = dist[u] + w if new_d < dist[v]: dist[v] = new_d count[v] = count[u] # only paths through u elif new_d == dist[v]: count[v] += count[u] # add paths through u return dist, count
This works because in a DAG, when we process u in topological order, all paths to u have already been counted. So count[u] is finalized before we use it to update count[v]. The same technique works with Dijkstra (since vertices are finalized in increasing distance order, count[u] is correct when u is extracted).
Dijkstra's algorithm is the workhorse of shortest paths. It handles general graphs (not just DAGs), runs in O((V+E) log V) with a binary heap, and is the basis for almost every practical routing algorithm. The catch: all edge weights must be non-negative.
Edsger Dijkstra invented this algorithm in 1956 while sitting at a cafe in Amsterdam with his fiancee. He published it in 1959 in a remarkably concise three-page paper titled "A note on two problems in connexion with graphs." The algorithm was originally designed for weighted undirected graphs, but it works identically for directed graphs with non-negative weights.
The algorithm's importance was recognized immediately. Today it is one of the most widely implemented algorithms in the world, running billions of times daily in routing engines, game engines, and network protocols. Dijkstra later said: "The algorithm was a twenty-minute invention. I was just thinking about whether I could do this, and I then designed the algorithm for the shortest path."
The key innovation was the priority queue. Before Dijkstra, people either used Bellman-Ford (O(VE)) or tried ad-hoc greedy strategies that did not guarantee optimality. Dijkstra showed that a greedy strategy DOES work, IF you greedily process vertices in order of their estimated distance, AND all edge weights are non-negative.
Interestingly, Dijkstra's original paper did not use a heap — it used a linear scan to find the minimum, giving O(V²) time. The binary heap improvement came later, and the Fibonacci heap (giving the theoretically optimal O(V log V + E)) was introduced by Fredman and Tarjan in 1984, a quarter century after Dijkstra's paper.
The algorithm's elegance lies in its simplicity: just three lines of pseudocode in the main loop (extract-min, add to S, relax neighbors). The correctness proof is short but deep, and the non-negative weight requirement is both necessary and sufficient for the greedy choice to work.
Maintain a set S of vertices whose shortest-path distances are finalized (proven correct). Initially, S is empty. At each step:
Think of Dijkstra as an expanding wavefront. The source s is a pebble dropped in a pond. The ripples expand outward, reaching nearby vertices first. The vertex with the smallest d[v] is the next ripple to arrive — we finalize it because no future ripple can arrive earlier (since all edge weights are non-negative, paths through un-reached vertices can only be longer).
If you already understand BFS (breadth-first search), Dijkstra is a natural generalization. BFS finds shortest paths in unweighted graphs by exploring vertices level by level. In an unweighted graph, all edges have weight 1, so the FIFO queue naturally processes vertices in order of increasing distance.
With weighted edges, a FIFO queue no longer works — a vertex discovered later might have a shorter total distance than one discovered earlier. The priority queue replaces the FIFO queue, sorting by d[v] instead of discovery order.
| Aspect | BFS | Dijkstra |
|---|---|---|
| Data structure | FIFO queue | Min-priority queue (heap) |
| Edge weights | All equal (or all 1) | Non-negative, arbitrary |
| Processing order | By BFS level (# edges) | By shortest distance (d[v]) |
| Time complexity | O(V+E) | O((V+E) log V) |
| Relaxation | Implicit (first visit = shortest) | Explicit RELAX with DECREASE-KEY |
At every step of Dijkstra, the following invariant holds:
The invariant is maintained because: (1) when we extract u (the vertex with smallest d[v] not in S), the proof above shows d[u] = δ(s, u), and (2) we immediately relax all edges out of u, updating d[v] for v's neighbors. The priority queue ensures we always extract the correct next vertex to finalize.
The priority queue is the heart of Dijkstra. It supports two operations: EXTRACT-MIN (remove and return the vertex with smallest d[v]) and DECREASE-KEY (reduce a vertex's key when relaxation finds a shorter path). The choice of data structure determines the time complexity:
| Data Structure | EXTRACT-MIN | DECREASE-KEY | Total Dijkstra | Best For |
|---|---|---|---|---|
| Unsorted array | O(V) | O(1) | O(V² + E) | Dense graphs (E ≈ V²) |
| Binary min-heap | O(log V) | O(log V) | O((V+E) log V) | Sparse graphs (E ≈ V) |
| Fibonacci heap | O(log V) amortized | O(1) amortized | O(V log V + E) | Theoretical optimality |
For sparse graphs (E = O(V)), the binary heap gives O(V log V) — excellent. For dense graphs (E = O(V²)), the simple array at O(V²) actually beats the binary heap's O(V² log V). The Fibonacci heap achieves the best of both worlds theoretically, but its constant factors are so large that binary heaps dominate in practice up to very large graph sizes.
heapq module provides one. The Fibonacci heap is a theoretical talking point, not a production choice. If an interviewer asks: "Fibonacci heap gives O(V log V + E) but the constant factors and implementation complexity make binary heaps faster in practice for graphs under ~10M vertices."Green = finalized (in S). Yellow = in frontier (in priority queue with finite d[v]). Gray = undiscovered (d = ∞). Teal edges = shortest-path tree. Priority queue contents shown below the graph. Click "Step" to extract-min and relax.
Source: A. Click Step.Python's heapq does not support DECREASE-KEY. The standard workaround is lazy deletion: push duplicate entries into the heap and skip stale ones when popping.
python import heapq def dijkstra(adj, source): """ adj: dict of {u: [(v, weight), ...]} Returns: (dist, pred) dicts Uses lazy deletion: push (distance, vertex) pairs. When we pop, skip if vertex already finalized. """ dist = {} # finalized distances pred = {} # predecessor map pq = [(0, source)] # min-heap: (distance, vertex) best = {source: 0} # best known distance (pre-finalization) while pq: d_u, u = heapq.heappop(pq) if u in dist: continue # stale entry — u already finalized dist[u] = d_u # finalize u for v, w in adj.get(u, []): if v not in dist: # only relax non-finalized new_d = d_u + w if new_d < best.get(v, float('inf')): best[v] = new_d pred[v] = u heapq.heappush(pq, (new_d, v)) return dist, pred def reconstruct_path(pred, target): """Follow predecessors to build the path.""" path = [] v = target while v is not None: path.append(v) v = pred.get(v) return path[::-1] # Example adj = { 'A': [('B', 4), ('C', 2)], 'B': [('D', 3), ('C', 5)], 'C': [('B', 1), ('D', 8), ('E', 10)], 'D': [('E', 2)], 'E': [], } d, pi = dijkstra(adj, 'A') print(d) # {'A': 0, 'C': 2, 'B': 3, 'D': 6, 'E': 8} print(reconstruct_path(pi, 'E')) # ['A', 'C', 'B', 'D', 'E']
if u in dist and skip it because v was already finalized with the correct (shorter) distance. The heap may contain O(E) entries in the worst case, so each operation costs O(log E) = O(log V) (since E ≤ V², log E ≤ 2 log V).Let us be precise about the inputs and outputs of Dijkstra, since this is what you need to implement it from scratch:
| Bug | Symptom | Fix |
|---|---|---|
Missing if u in dist: continue | Correct results but TLE (processes vertices multiple times) | Add the skip guard at the top of the while loop |
| Initializing d[source] = 0 but forgetting to push to heap | Returns only d[source] = 0, everything else is ∞ | Push (0, source) into heap at the start |
Using <= instead of < in relaxation | Pushes unnecessary duplicates, causes TLE | Use strict < — equal distance means no improvement |
Storing vertices as strings with heapq | Crash when two distances are equal (Python compares the second element) | Use (dist, int_id) or (dist, tie_breaker, vertex) |
| Not handling disconnected vertices | Loops forever or crashes on empty heap | Disconnected vertices stay at d = ∞, check after Dijkstra returns |
Now that you know the mechanics, let us understand why Dijkstra works and exactly when it breaks. The correctness proof is one of the most elegant in all of algorithms.
The core claim: when we extract vertex u from the priority queue, d[u] is already the true shortest-path distance δ(s, u). This is the most important theorem in this entire lesson. If you understand this proof, you understand why Dijkstra works and exactly when it fails.
Proof (by contradiction). Suppose u is the first vertex extracted from Q for which d[u] ≠ δ(s, u). Since d[u] ≥ δ(s, u) by the upper-bound property, we must have d[u] > δ(s, u). There must exist a shortest path p from s to u (otherwise δ(s, u) = ∞ = d[u]).
Since s ∈ S (finalized) and u ∉ S at the time of extraction, path p must cross from S to V\S at some point. Let y be the first vertex along p that is not in S, and let x be its predecessor on p (so x ∈ S).
The proof relies on one critical step: δ(s, y) ≤ δ(s, u). This holds because y is on the path from s to u and all subsequent edges have non-negative weight, so the path can only get longer. With negative edges, a later vertex on the path could have a smaller shortest-path weight than an earlier vertex. The greedy extraction order breaks.
Here is a minimal counterexample that every student should memorize. Graph: s→A (weight 1), s→B (weight 5), B→A (weight -10). Only 3 vertices and 3 edges.
| Step | Action | d[s] | d[A] | d[B] | Problem? |
|---|---|---|---|---|---|
| Init | - | 0 | ∞ | ∞ | |
| 1 | Extract s, relax | 0 | 1 | 5 | |
| 2 | Extract A (d=1, min) | 0 | 1 (FINAL) | 5 | A finalized at d=1 |
| 3 | Extract B (d=5), relax B→A | 0 | 1 | 5 | 5+(-10)=-5 < 1, but A is already finalized! |
The true shortest path to A is s→B→A with weight 5+(-10) = -5. But Dijkstra finalized A at d=1 in step 2 and never corrects it. The negative edge from B to A provides a shortcut that Dijkstra discovered too late.
Watch Dijkstra finalize vertex A with d=1, then later discover (too late) that the path through B gives d=-5. Click "Step" to advance through the algorithm.
Click Step to begin.Let us build complete muscle memory by tracing Dijkstra on a 6-node graph. Edges: A→B (2), A→C (4), B→C (1), B→D (7), C→E (3), D→F (1), E→D (2), E→F (5). Source: A.
| Step | Extract | Relax | d[A] | d[B] | d[C] | d[D] | d[E] | d[F] |
|---|---|---|---|---|---|---|---|---|
| Init | - | - | 0 | ∞ | ∞ | ∞ | ∞ | ∞ |
| 1 | A (0) | A→B, A→C | 0 | 2 | 4 | ∞ | ∞ | ∞ |
| 2 | B (2) | B→C, B→D | 0 | 2 | 3 | 9 | ∞ | ∞ |
| 3 | C (3) | C→E | 0 | 2 | 3 | 9 | 6 | ∞ |
| 4 | E (6) | E→D, E→F | 0 | 2 | 3 | 8 | 6 | 11 |
| 5 | D (8) | D→F | 0 | 2 | 3 | 8 | 6 | 9 |
| 6 | F (9) | (none) | 0 | 2 | 3 | 8 | 6 | 9 |
Key observations from this trace:
(1) DECREASE-KEY in action. Step 2: B→C improves d[C] from 4 to 3. The direct path A→C (4) was found first, but A→B→C (2+1=3) is shorter. This is a DECREASE-KEY operation in the priority queue.
(2) Cascade of improvements. Step 4: E→D improves d[D] from 9 (via B→D) to 8 (via C→E→D). Step 5: D→F further improves d[F] from 11 (via E→F) to 9 (via D→F). Each improvement enables subsequent improvements — this cascading is what makes shortest paths interesting.
(3) The shortest-path tree. The final predecessor chain reveals the tree: A→B (pred[B]=A), A→B→C (pred[C]=B), A→B→C→E (pred[E]=C), A→B→C→E→D (pred[D]=E), A→B→C→E→D→F (pred[F]=D). The shortest path to F is A→B→C→E→D→F with total weight 2+1+3+2+1 = 9.
(4) Not every edge is on a shortest path. The edges A→C (4), B→C (5), C→D (8), and E→F (5) are NOT used in any shortest path from A. They were explored but superseded by better alternatives.
What if your graph has negative edges but you want Dijkstra's speed? Johnson's algorithm provides a clever solution: re-weight all edges to be non-negative while preserving shortest paths, then run Dijkstra.
The trick is a potential function h(v) computed by running Bellman-Ford once from a dummy source connected to all vertices with zero-weight edges:
After re-weighting, run Dijkstra from every source vertex. Recover original distances by subtracting the potential difference: δ(u, v) = δ'(u, v) - h(u) + h(v).
Total time for all-pairs: O(VE) for Bellman-Ford + O(V(V+E) log V) for V runs of Dijkstra. For sparse graphs, this beats Floyd-Warshall's O(V³).
Adjust the graph size to see how the number of relaxation operations compares. Dijkstra (green) grows as (V+E) log V. Bellman-Ford (orange) grows as V×E.
This is the big interactive simulation. Build a graph, pick a source, and watch Dijkstra construct the shortest-path tree in real time. You can also inject a negative edge to see the algorithm break.
Build: Click empty space to place vertices (up to 12). Click one vertex then another to add a directed edge (you will be prompted for weight).
Run: Select a source vertex (click it, it turns orange), then click "Run Dijkstra" for animation or "Step" for one-step-at-a-time.
Watch: Green = finalized. Yellow = frontier. Teal edges = shortest-path tree building up.
After Dijkstra finishes, the predecessor pointers π[v] form a tree rooted at the source. Every path in this tree from the root to any vertex is the shortest path in the original graph. This tree has V-1 edges (assuming all vertices are reachable) and contains exactly one path from s to each vertex.
In the visualization, the teal edges form this tree. Notice that it is a subgraph of the original graph — it only uses edges that are part of some shortest path. Multiple shortest paths with the same weight may exist, but Dijkstra commits to one (the first discovered).
To cement the difference between our three algorithms, here is how they would process the same 5-node graph with edges including a negative weight:
| Property | Bellman-Ford | DAG Shortest (if DAG) | Dijkstra |
|---|---|---|---|
| Handles this graph? | Yes (any graph) | Only if no cycles | Only if no neg edges |
| Number of relaxations | Up to (V-1) × E = 36 | Exactly E = 9 | At most E = 9 |
| Data structure | Edge list | Adjacency list + topo order | Min-heap |
| Processes vertices in | Arbitrary order (all edges) | Topological order | Increasing d[v] order |
| Early termination | If no changes in a pass | Never needed (1 pass) | When Q is empty |
With the preset graph (8 nodes, 11 edges), Dijkstra performs:
Dijkstra explores outward in all directions equally, like ripples expanding from a pebble in a pond. If you know where the goal is, that is wasteful — you are searching behind you as much as in front. A* search fixes this by steering the search toward the goal using a heuristic function.
Instead of prioritizing by d[v] alone (the cost to reach v from the source), A* prioritizes by:
where g(v) = d[v] is the known cost from source to v, and h(v) is a heuristic estimate of the cost from v to the goal. The function f(v) estimates the total cost of the best path through v.
Vertices that are closer to the goal (according to h) get explored first. The search is biased in the right direction.
A heuristic h is admissible if it never overestimates the true distance to the goal: h(v) ≤ δ(v, goal) for all v. If h is admissible, A* is guaranteed to find the optimal shortest path. Why? Because an admissible heuristic never makes the priority of a vertex artificially high, so the true shortest path is never deprioritized below a suboptimal path.
Common admissible heuristics:
| Domain | Heuristic | Formula | Why Admissible |
|---|---|---|---|
| Grid (4-connected) | Manhattan distance | |x1-x2| + |y1-y2| | Shortest grid path without obstacles |
| Grid (8-connected) | Chebyshev distance | max(|Δx|, |Δy|) | Shortest with diagonal moves |
| Euclidean plane | Straight line | √(Δx² + Δy²) | Straight line ≤ any path |
| Road network | Haversine / max speed | Great-circle dist / speed limit | Fastest possible travel time |
A* is a spectrum between two extremes:
h(v) = 0 for all v: A* degenerates to Dijkstra. No heuristic guidance. Explores uniformly in all directions. Guaranteed optimal but potentially slow.
h(v) = δ(v, goal) exactly: A* goes straight to the goal with zero wasted exploration. Every vertex extracted is on the shortest path. But you would need to already know the answer to compute this h.
h(v) > δ(v, goal) for some v: The heuristic is inadmissible. A* may find paths faster but can return suboptimal solutions. Useful when approximate answers are acceptable (weighted A*, where h is inflated by a factor ε).
A* with a consistent heuristic expands the minimum number of nodes among all algorithms that use the same heuristic information. No other algorithm using the same h(v) can find the optimal path while exploring fewer nodes. This is because A* never expands a node with f(v) > f*(goal) = optimal cost, and every node with f(v) < f*(goal) must be expanded by any correct algorithm (it could potentially be on a shorter path).
This optimality is relative to the heuristic. A better heuristic (closer to the true distance) leads to fewer expanded nodes. The perfect heuristic h(v) = δ(v, goal) leads to expanding only the nodes on the shortest path itself.
Standard A* stores all generated nodes in memory (the open and closed lists). For very large search spaces (e.g., 15-puzzle, Rubik's cube), this can exhaust memory. IDA* (Iterative Deepening A*) solves this by using iterative deepening with an f-value cutoff:
IDA* uses O(depth) memory (just the recursion stack) while retaining A*'s optimality guarantee. The trade-off: it may re-expand nodes across iterations, making it slower in practice for graphs with many distinct f-values. But for domains like sliding puzzles, it is the standard approach.
SMA* is another memory-efficient variant. It keeps expanding the best leaf node (like A*) but when memory runs out, it drops the worst leaf and stores its f-value in the parent. If the dropped node is later needed, it is regenerated. SMA* uses all available memory, unlike IDA* which uses almost none.
The memory hierarchy of A* variants:
| Algorithm | Memory | Optimality | Best For |
|---|---|---|---|
| A* | O(V) — all generated nodes | Yes (with admissible h) | Small-medium graphs |
| IDA* | O(depth) — recursion stack | Yes (with admissible h) | Puzzles, combinatorial search |
| SMA* | User-specified bound | Yes (if enough memory) | When memory is limited but not tiny |
| Weighted A* | O(V) | ε-optimal | Real-time AI, game engines |
Green = start (S), Red = goal (G). Blue shading = explored by Dijkstra. Purple shading = explored by A*. Teal/orange = final paths (same optimal path, but A* finds it with far fewer explored cells). Click cells to toggle walls before running.
The difference is dramatic. On an open grid, Dijkstra explores roughly a circle around the start before reaching the goal. A* explores a narrow corridor aimed at the goal. Both find the same optimal path, but A* visits far fewer cells. On real road networks, A* typically explores 10-100x fewer nodes than Dijkstra for point-to-point queries.
Consider a 4-node graph with coordinates: A(0,0), B(3,0), C(1,2), D(3,2). Edges: A→B(5), A→C(3), B→D(2), C→D(4). Goal: shortest path from A to D. Heuristic: Euclidean distance to D.
| Vertex | h(v) = Euclidean to D(3,2) |
|---|---|
| A(0,0) | √(9+4) = 3.61 |
| B(3,0) | √(0+4) = 2.00 |
| C(1,2) | √(4+0) = 2.00 |
| D(3,2) | 0.00 |
| Step | Extract | f = g + h | Relax | Queue |
|---|---|---|---|---|
| Init | - | - | - | {A: 0+3.61=3.61} |
| 1 | A (f=3.61) | g=0 | A→B: g=5, A→C: g=3 | {C: 3+2=5, B: 5+2=7} |
| 2 | C (f=5) | g=3 | C→D: g=7 | {B: 7, D: 7+0=7} |
| 3 | B (f=7) | g=5 | B→D: g=7 | {D: 7} (no improvement) |
| 4 | D (f=7) | GOAL! | - | Done. Cost = 7 |
A* found the optimal path A→B→D (cost 7) = A→C→D (also cost 7). It explored all 4 nodes. Dijkstra would also explore all 4 nodes on this small graph. The benefit of A* becomes dramatic on larger graphs where the heuristic prunes large swaths of the search space.
On an N × N grid with no obstacles, Dijkstra explores approximately πD²/4 cells (a quarter-circle with radius D = distance to goal), while A* with Manhattan heuristic explores approximately 2D cells (a narrow diamond). The ratio:
With obstacles, the speedup depends on the obstacle density. Dense mazes reduce A*'s advantage because the heuristic becomes less informative — the straight-line distance is far from the actual maze distance. But even in mazes, A* rarely does worse than Dijkstra.
Sometimes you want speed more than optimality. Weighted A* inflates the heuristic by a factor ε > 1:
This makes A* more aggressively goal-directed, exploring fewer nodes but potentially finding suboptimal paths. The guarantee: the returned path has cost at most ε times the optimal cost. With ε = 1.5, you get paths within 50% of optimal but explore far fewer nodes. This trade-off is used in real-time game AI where speed matters more than perfect optimality.
python import heapq, math def a_star(adj, start, goal, pos): """ adj: {u: [(v, weight), ...]} pos: {v: (x, y)} for heuristic computation Returns: (path, cost) or (None, inf) """ def h(v): # Euclidean distance (admissible) x1, y1 = pos[v] x2, y2 = pos[goal] return math.sqrt((x1-x2)**2 + (y1-y2)**2) g = {start: 0} # best known cost from start pred = {start: None} pq = [(h(start), 0, start)] # (f, g, vertex) closed = set() while pq: f_u, g_u, u = heapq.heappop(pq) if u == goal: # Reconstruct path path = [] while u is not None: path.append(u) u = pred[u] return path[::-1], g[goal] if u in closed: continue closed.add(u) for v, w in adj.get(u, []): new_g = g[u] + w if v not in g or new_g < g[v]: g[v] = new_g pred[v] = u heapq.heappush(pq, (new_g + h(v), new_g, v)) return None, float('inf') # goal unreachable
A* is not always the best choice, even for point-to-point queries:
| Scenario | Why A* Struggles | Better Alternative |
|---|---|---|
| No good heuristic available | h=0 makes it Dijkstra anyway | Plain Dijkstra (simpler code) |
| Need shortest paths to ALL vertices | A* targets one goal; no benefit | Dijkstra from source |
| Graph is a maze/labyrinth | Euclidean h is very inaccurate | Bidirectional Dijkstra or IDA* |
| Edge weights are 0/1 | Overhead of f-value management | 0-1 BFS (deque, no heap) |
| Static graph, many queries | No preprocessing benefit | Contraction hierarchies |
| Negative edges | A* inherits Dijkstra's limitation | Bellman-Ford |
The algorithms we have learned are not academic exercises. They are running right now in your phone, your internet router, and the servers that planned your last delivery.
Google Maps, Apple Maps, and Waze all use Dijkstra-based algorithms. But the US road network has ~24 million intersections and ~58 million road segments. Running raw Dijkstra for every query would take seconds. Real mapping services use contraction hierarchies (CH) to answer queries in microseconds.
The idea: preprocess the graph by repeatedly "contracting" the least-important vertex — removing it and adding shortcut edges to preserve shortest-path distances between its neighbors. This creates a hierarchy where highways are at the top and local streets at the bottom.
The internet is a graph of autonomous systems (AS) — large networks like Comcast, Google, AT&T. The Border Gateway Protocol (BGP) uses a distributed version of Bellman-Ford to find routes between them.
Each router maintains a distance vector: its best-known distance (or cost) to every destination. Periodically, it sends this vector to its neighbors. Each neighbor updates its own distances using relaxation. This is Bellman-Ford running in parallel across thousands of routers.
Why Bellman-Ford and not Dijkstra? Because no single router has the complete graph topology. Each router only knows its own distances and what its neighbors tell it. Bellman-Ford is inherently distributed — relaxation requires only local information. Dijkstra requires global knowledge (the entire priority queue).
We saw in Chapter 3 that the longest path in a DAG gives the critical path. This is the foundation of CPM (Critical Path Method), used in construction, manufacturing, and software project planning.
A project DAG with task durations. The critical path (longest path from Start to End) is highlighted in orange. Tasks not on the critical path have slack — they can be delayed without affecting the project deadline.
Critical path total: computed on load.This is the table you should have memorized for interviews. Given a graph problem, how do you choose an algorithm?
| Scenario | Algorithm | Time | Why This One |
|---|---|---|---|
| Non-negative, sparse | Dijkstra (binary heap) | O((V+E) log V) | Fastest practical option |
| Non-negative, dense | Dijkstra (array) | O(V²+E) | Array beats heap when E = O(V²) |
| Negative edges possible | Bellman-Ford | O(VE) | Only correct option for negative edges |
| Detect negative cycles | Bellman-Ford | O(VE) | Extra pass after V-1 passes |
| DAG | DAG shortest paths | O(V+E) | Fastest possible, handles neg edges |
| Point-to-point with coords | A* | Depends on h | Heuristic focuses search |
| Massive road network | Contraction hierarchies | μs per query | Minutes preprocess, instant queries |
| Distributed (no global view) | Bellman-Ford (dist.) | O(VE) messages | Only needs local info |
| Unweighted graph | BFS | O(V+E) | All edges weight 1 |
| Edge weights 0 or 1 | 0-1 BFS (deque) | O(V+E) | Push 0-edges front, 1-edges back |
Real-world benchmarks on the US road network (24M vertices, 58M edges):
| Algorithm | Query Time | Preprocessing | Extra Space |
|---|---|---|---|
| Dijkstra (binary heap) | ~3 seconds | None | O(V) |
| A* (Euclidean) | ~500 ms | None | O(V) |
| Bidirectional Dijkstra | ~1.5 seconds | None | O(V) |
| Contraction Hierarchies | ~0.3 ms | ~10 minutes | ~2x edges |
| Hub Labeling | ~0.001 ms | ~30 minutes | ~100 bytes/vertex |
The difference is staggering. Raw Dijkstra takes 3 seconds per query — unacceptable for a map app. Contraction hierarchies answer in 0.3 milliseconds — fast enough for real-time turn-by-turn routing with live traffic updates.
Scenario 1: Dijkstra returns wrong distances. Systematic checklist: (a) Does the graph have negative edges? If yes, switch to Bellman-Ford. (b) Are you accidentally allowing re-finalization? The if u in dist: continue guard must be present. (c) Is your adjacency list correct? Print it and verify. (d) Are vertex indices 0-based or 1-based? Off-by-one errors are the #1 Dijkstra bug.
Scenario 2: Bellman-Ford reports a negative cycle unexpectedly. Check: (a) Sign errors in weight parsing. (b) Unsigned integers that underflow to large positive values. (c) Floating-point rounding creating artificial small negatives — use epsilon tolerance: if dist[u] + w < dist[v] - 1e-9. (d) Graph is directed but you treated it as undirected (added both directions of each edge, creating 2-cycles).
Scenario 3: A* returns a suboptimal path. Your heuristic is inadmissible. Debug: print h(v) and the true shortest distance from v to goal for vertices near the goal. If h(v) > true distance for any v, the heuristic overestimates. Common cause: using Euclidean distance heuristic on a grid that only allows 4-directional movement (Manhattan distance is the correct heuristic).
Scenario 4: Dijkstra is too slow. Profile first. (a) If the heap is the bottleneck, consider 0-1 BFS if weights are 0/1, or bucket queues if weights are small integers. (b) If graph construction is slow, stream edges lazily. (c) For point-to-point, switch to A*. (d) For repeated queries, preprocess with contraction hierarchies. (e) For very dense graphs (E ≈ V²), switch to array-based Dijkstra O(V²).
Scenario 5: Memory limit exceeded. Dijkstra with lazy deletion stores up to O(E) entries in the heap. For very large graphs: (a) use indexed priority queue (maintains at most V entries), (b) store the graph as edge list + adjacency offsets (CSR format) instead of adjacency list, (c) process the graph in chunks if it does not fit in memory.
A system of difference constraints is a set of inequalities of the form xj - xi ≤ bij, where xi are unknown variables and bij are constants. This arises in scheduling (task j cannot start more than bij time units after task i), temporal reasoning, and layout problems.
The connection to shortest paths is direct: model each variable xi as a vertex, each constraint xj - xi ≤ bij as an edge (i, j) with weight bij. Add a source vertex s with zero-weight edges to all vertices. Then:
This is one of the most elegant applications of shortest paths: a seemingly unrelated optimization problem (solving a system of inequalities) reduces to a graph algorithm.
Since contraction hierarchies are the most practically important shortest-path speedup, let us understand the mechanics in detail.
Step 1: Vertex Ordering. Assign an "importance" to each vertex. Simple heuristic: a vertex is important if removing it creates many shortcut edges (it is a "highway node"). A vertex is unimportant if removing it creates few shortcuts (it is a "dead end" or "local street"). Sort vertices by importance: least important first.
Step 2: Contraction. Process vertices in importance order. To contract vertex v:
Step 3: Query. Given source s and target t, run two Dijkstra searches simultaneously:
The path found may use shortcut edges. To get the actual road-level path, recursively "unpack" each shortcut into its original sub-path. This is stored during preprocessing.
When edge weights are restricted to 0 and 1, you do not need Dijkstra's heap at all. A deque (double-ended queue) suffices, giving O(V+E) time — the same as BFS.
The idea: when you relax an edge with weight 0, the new distance equals the current distance. Push the neighbor to the front of the deque (it should be processed next, like the current level in BFS). When you relax an edge with weight 1, push to the back (it is one level deeper). This maintains the deque in sorted order by distance.
When do 0-1 weights appear? More often than you think:
| Problem | Weight 0 Means | Weight 1 Means |
|---|---|---|
| Grid with free/paid cells | Move to free cell | Move to paid cell |
| Graph where some edges can be reversed | Traverse in original direction | Reverse an edge |
| Min operations to make string palindrome | Character already matches | Need to change a character |
| Min road modifications for one-way streets | Road goes in correct direction | Road must be reversed |
If edge weights are non-negative integers in the range [0, C], Dial's algorithm replaces the binary heap with a circular array of C+1 buckets. Bucket i holds all vertices with d[v] = current_min + i. This gives O(V + E + C) time — no log factor!
For road networks where weights are travel times in seconds (C ~ 10,000), Dial's algorithm is competitive with binary heap Dijkstra and simpler to implement. It is essentially a radix-sort-inspired priority queue.
In video games, characters navigate 3D environments. The world is decomposed into a navigation mesh (navmesh) — a set of convex polygons covering walkable surfaces. The pathfinding graph has polygons as vertices and shared edges as graph edges. The weight is typically the Euclidean distance between polygon centers.
Game studios use A* on the navmesh graph, often with a hierarchical approach: coarse path on a high-level graph (regions), then refined path within each region. This gives near-instant pathfinding even in large open worlds. Detour (from Recast Navigation) is the industry-standard library, used in games from Skyrim to Fortnite.
In robotics, shortest paths appear in motion planning. A robot's configuration space (C-space) is a graph where each vertex is a valid configuration (position, joint angles) and each edge is a feasible motion. The weight is the energy cost, time, or distance of the motion.
For robot arms with many joints, the C-space is high-dimensional (6-12 dimensions for a typical industrial robot). Dijkstra and A* are impractical in high dimensions (too many vertices). Instead, roboticists use sampling-based planners like RRT* (Rapidly-exploring Random Trees) that build a sparse graph by random sampling, then run shortest paths on the sparse graph.
Self-driving cars use a variant: the C-space includes position (x, y), heading (θ), and velocity (v). The graph has edges that respect the car's dynamics (turning radius, acceleration limits). A* with a dubins/reeds-shepp heuristic finds smooth, drivable paths.
In social network analysis, BFS (unweighted shortest paths) finds the degree of separation between two people. The famous "six degrees" conjecture states that any two people on Earth are connected by at most six intermediaries. Facebook research confirmed that the average separation on their network is 3.57 (in 2016, with 1.59 billion users).
For weighted social networks (where edges have strengths like interaction frequency), Dijkstra finds the "strongest connection path" — maximizing the minimum edge weight along the path. This is a bottleneck shortest path variant, solvable by modifying Dijkstra's relaxation to: d[v] = max(d[v], min(d[u], w(u,v))).
In supply chain optimization, shortest paths model the flow of goods through a network of warehouses, distribution centers, and delivery points. Edge weights represent shipping costs, transit times, or carbon footprint. Companies like Amazon, FedEx, and UPS solve millions of shortest-path queries daily to optimize delivery routes and warehouse selection.
The vehicle routing problem (VRP) — "visit all customers at minimum cost" — is NP-hard in general, but shortest-path subroutines (finding cheapest routes between stops) are solved with Dijkstra or contraction hierarchies as building blocks inside heuristic VRP solvers.
DNA sequence alignment can be modeled as a shortest path in an edit distance graph. Two sequences of length m and n define an (m+1) x (n+1) grid graph. Moving right = insert, moving down = delete, moving diagonally = match/mismatch. The shortest path from (0,0) to (m,n) gives the optimal alignment.
This is solved by dynamic programming (the Needleman-Wunsch algorithm), which is essentially DAG shortest paths on the grid — the grid is a DAG because all edges go right, down, or diagonally right-down. The O(mn) time complexity is a direct consequence of the grid having mn vertices and 3mn edges.
Modern ML frameworks (PyTorch, JAX) represent neural network computations as directed acyclic graphs. The forward pass processes operations in topological order. The backward pass (backpropagation) processes operations in reverse topological order to compute gradients.
When optimizing inference, the critical path through the computation graph determines the minimum latency. Operations not on the critical path can be parallelized or delayed. This is exactly the DAG longest-path problem we solved in Chapter 3 — applied to computation instead of construction projects.
Graph compilers like XLA and TensorRT analyze the computation DAG to find opportunities for operator fusion, memory optimization, and scheduling. The shortest/longest path algorithms are foundational tools in these compilers.
| Algorithm | Neg Edges? | Neg Cycle Detect? | Time | Space | Key Idea |
|---|---|---|---|---|---|
| Bellman-Ford | Yes | Yes | O(VE) | O(V) | Relax all edges V-1 times |
| DAG Shortest | Yes (DAGs) | N/A | O(V+E) | O(V) | Topo sort + one relaxation pass |
| Dijkstra (heap) | No | No | O((V+E) log V) | O(V+E) | Greedy extract-min + relax |
| Dijkstra (array) | No | No | O(V²+E) | O(V) | Array scan for min, O(1) decrease-key |
| Dijkstra (Fib) | No | No | O(V log V + E) | O(V) | O(1) amortized decrease-key |
| A* | No | No | h-dependent | O(V) | f(v) = g(v) + h(v) priority |
| 0-1 BFS | No | No | O(V+E) | O(V) | Deque: 0-weight front, 1-weight back |
Strong shortest-path interview performance requires competence across five orthogonal dimensions. The cheat sheet above handles Dimension 1 (Concept). The coding drills handle Dimension 3 (Code). But interviewers probe all five:
| Dimension | What They Test | Example Question |
|---|---|---|
| 1. Concept | Do you understand the theory? | "Why does Dijkstra fail with negative edges?" |
| 2. Design | Can you architect a system? | "Design a route planner for 1M intersections" |
| 3. Code | Can you implement it correctly? | "Write Dijkstra in 15 minutes" |
| 4. Debug | Can you fix broken code? | "This Dijkstra returns wrong answers. Find the bug." |
| 5. Frontier | Do you know recent advances? | "What is the state of the art for road routing?" |
Q: Explain relaxation in one sentence.
A: Relaxation checks whether routing to v through u yields a shorter path than our current best estimate, and if so, updates v's distance and predecessor.
Q: Why does Dijkstra require non-negative weights?
A: Because it greedily finalizes the vertex with smallest d[v], assuming no future path through un-finalized vertices can improve it. A negative edge from an un-finalized vertex could decrease an already-finalized distance, violating the greedy choice property that makes the proof work.
Q: Can BFS solve shortest paths?
A: BFS solves unweighted shortest paths (all edges weight 1). It is equivalent to Dijkstra with uniform weights — the FIFO queue is a valid priority queue when all priorities increase by the same amount.
Q: What is the relationship between shortest paths and dynamic programming?
A: Bellman-Ford IS dynamic programming. The subproblem: dk[v] = shortest path from s to v using at most k edges. The recurrence: dk[v] = min(dk-1[v], min over all (u,v) of dk-1[u] + w(u,v)). The V-1 passes compute d1, d2, ..., dV-1 in order.
Q: What is the difference between Dijkstra and Prim's MST algorithm?
A: Both use a priority queue and greedily extract the minimum. The difference is the key. Dijkstra's key for vertex v is d[v] = total distance from source to v. Prim's key is the weight of the lightest edge connecting v to the current tree. Dijkstra builds shortest paths from a source; Prim builds a minimum spanning tree. They look similar in code but solve fundamentally different problems.
Q: Can you run Dijkstra backward from the target to find the shortest path?
A: Yes, on undirected graphs (where every edge can be traversed in either direction). On directed graphs, you need to reverse all edge directions first, then run Dijkstra from the target on the reversed graph. This finds shortest paths TO the target from all vertices — equivalent to running Dijkstra forward from the target in the reversed graph. Bidirectional Dijkstra exploits this: run forward from source AND backward from target, meet in the middle.
Q: Design a route planner for a city with 1M intersections.
A: Preprocess with contraction hierarchies (5-10 minutes, store in ~500 MB). At query time, bidirectional Dijkstra on the hierarchy. For real-time traffic: maintain a small overlay graph updated every 30 seconds from probe data. For ETA: Dijkstra on the overlay + CH for the static portion. For alternative routes: penalty method — run Dijkstra, penalize edges on the first route, run again. Three architecturally distinct routes that share at most 50% of edges.
Q: Design a system to detect currency arbitrage in real-time.
A: Model currencies as vertices, exchange rates as edges with weight = -log(rate). Run Bellman-Ford from each currency. Negative cycle = arbitrage. For real-time: maintain edge weights incrementally. When a rate changes, re-run BF only on affected subgraphs (edges within 2 hops of the changed edge). Alert latency target: < 100 ms. Add a minimum profit threshold to filter out micro-arbitrage that disappears after transaction fees.
python # LeetCode 743: Network Delay Time # Given n nodes, times[i] = (u, v, w), signal from source k. # Return min time for ALL nodes to receive signal, or -1. # Key insight: this is max(shortest distances from k), since # the last node to be reached determines the total delay. import heapq from collections import defaultdict def networkDelayTime(times, n, k): adj = defaultdict(list) for u, v, w in times: adj[u].append((v, w)) dist = {} pq = [(0, k)] while pq: d, u = heapq.heappop(pq) if u in dist: continue # already finalized dist[u] = d for v, w in adj[u]: if v not in dist: heapq.heappush(pq, (d + w, v)) if len(dist) < n: return -1 # some nodes unreachable return max(dist.values())
python # LeetCode 787: Cheapest Flights Within K Stops # Bellman-Ford variant: at most k+1 relaxation passes. # Key insight: pass i finds shortest paths using at most i edges. # k stops = k+1 edges, so we run k+1 passes. # CRITICAL: snapshot dist before each pass to avoid using # current round's updates (which would allow more edges). def findCheapestPrice(n, flights, src, dst, k): dist = [float('inf')] * n dist[src] = 0 for _ in range(k + 1): prev = dist[:] # snapshot! prevents chain relaxation for u, v, w in flights: if prev[u] + w < dist[v]: dist[v] = prev[u] + w return dist[dst] if dist[dst] != float('inf') else -1
python # 0-1 BFS: Shortest path when edge weights are 0 or 1 # Use a deque: push 0-weight edges to front, 1-weight to back. # This maintains the deque in sorted order = valid priority queue. # Time: O(V+E) — no log factor! from collections import deque def zero_one_bfs(adj, source, n): """adj[u] = [(v, w)] where w is 0 or 1.""" dist = [float('inf')] * n dist[source] = 0 dq = deque([source]) while dq: u = dq.popleft() for v, w in adj[u]: if dist[u] + w < dist[v]: dist[v] = dist[u] + w if w == 0: dq.appendleft(v) # front: same distance else: dq.append(v) # back: +1 distance return dist
python # Detect and extract a negative cycle # Run BF for V passes (not V-1). If pass V changes a vertex, # that vertex is on (or reachable from) a negative cycle. # Trace predecessors V times to ensure we are IN the cycle, # then walk until we return to the same vertex. def find_negative_cycle(n, edges): """Returns the cycle as a list of vertex indices, or None.""" dist = [0] * n # start all at 0 to find ANY neg cycle pred = [-1] * n changed = -1 for i in range(n): changed = -1 for u, v, w in edges: if dist[u] + w < dist[v]: dist[v] = dist[u] + w pred[v] = u changed = v if changed == -1: return None # no negative cycle # Walk back n times to land on the cycle v = changed for _ in range(n): v = pred[v] # Now v is on the cycle. Walk until we loop. cycle = [] u = v while True: cycle.append(u) u = pred[u] if u == v: break cycle.append(v) return cycle[::-1]
Q: Your Dijkstra returns ∞ for a vertex you know is reachable.
A: Systematic checklist: (1) Is the graph directed? Maybe edges go the wrong way. For undirected graphs, did you add both directions? (2) Is the adjacency list correctly built? Common bug: overwriting instead of appending. (3) Is the source vertex correct? (4) Print the adjacency list for the source and verify edges exist. (5) Check for off-by-one if vertices are 0-indexed vs 1-indexed.
Q: Bellman-Ford is timing out on a graph with 100K vertices and 500K edges.
A: (1) Add early termination: if no d[v] changes in a pass, stop. (2) Use SPFA (Shortest Path Faster Algorithm): maintain a queue of "active" vertices whose d[v] changed; only relax edges from those vertices. Average case O(E), worst case still O(VE). (3) If the graph is a DAG, switch to topological-sort-based algorithm for O(V+E). (4) If all weights are non-negative, switch to Dijkstra for O((V+E) log V).
Q: Two different paths have the same total weight. Which one does Dijkstra return?
A: Whichever path's final edge was relaxed first. This depends on the edge ordering in the adjacency list and tie-breaking in the priority queue. To get the lexicographically smallest path, break ties in the priority queue by vertex label. To get all shortest paths, track multiple predecessors per vertex.
python # Multi-state Dijkstra: Shortest Path with Alternating Colors # LeetCode 1129. Each edge is red or blue. Find shortest path # from 0 to each vertex using alternating colors. # Key insight: state = (vertex, last_color_used). # Run Dijkstra on this expanded state space. def shortestAlternatingPaths(n, redEdges, blueEdges): adj = defaultdict(list) for u, v in redEdges: adj[(u, 'r')].append((v, 'b')) # red edge, next must be blue for u, v in blueEdges: adj[(u, 'b')].append((v, 'r')) # blue edge, next must be red dist = {} pq = [(0, 0, 'r'), (0, 0, 'b')] # start with either color while pq: d, u, color = heapq.heappop(pq) if (u, color) in dist: continue dist[(u, color)] = d for v, next_color in adj.get((u, color), []): if (v, next_color) not in dist: heapq.heappush(pq, (d + 1, v, next_color)) result = [] for i in range(n): best = min(dist.get((i, 'r'), float('inf')), dist.get((i, 'b'), float('inf'))) result.append(best if best != float('inf') else -1) return result
python # Template: Johnson's Algorithm for All-Pairs Shortest Paths # with negative edges but no negative cycles. def johnson(n, edges): """ n: number of vertices (0-indexed) edges: list of (u, v, w) tuples Returns: n x n matrix of shortest distances, or None if neg cycle. """ # Step 1: Add dummy source n with 0-weight edges to all vertices aug_edges = edges + [(n, v, 0) for v in range(n)] # Step 2: Bellman-Ford from dummy source to get potentials h[] h = [float('inf')] * (n + 1) h[n] = 0 for _ in range(n): for u, v, w in aug_edges: if h[u] + w < h[v]: h[v] = h[u] + w # Check for negative cycle for u, v, w in aug_edges: if h[u] + w < h[v]: return None # negative cycle exists # Step 3: Re-weight edges adj = defaultdict(list) for u, v, w in edges: adj[u].append((v, w + h[u] - h[v])) # w' ≥ 0 # Step 4: Dijkstra from each vertex result = [[float('inf')] * n for _ in range(n)] for src in range(n): d, _ = dijkstra(adj, src) for v in range(n): if v in d: result[src][v] = d[v] - h[src] + h[v] # un-reweight return result
python # Template: Bidirectional Dijkstra # Run Dijkstra forward from source and backward from target. # Stop when both searches "meet." Often 2-3x faster than # unidirectional Dijkstra for point-to-point queries. def bidirectional_dijkstra(adj_fwd, adj_bwd, source, target): """ adj_fwd: forward adjacency list adj_bwd: backward adjacency list (reverse edges) """ dist_f = {source: 0} dist_b = {target: 0} pq_f = [(0, source)] pq_b = [(0, target)] best = float('inf') while pq_f or pq_b: # Termination: both frontiers exhausted or can't improve min_f = pq_f[0][0] if pq_f else float('inf') min_b = pq_b[0][0] if pq_b else float('inf') if min_f + min_b >= best: break # Expand forward if min_f <= min_b and pq_f: d, u = heapq.heappop(pq_f) if d > dist_f.get(u, float('inf')): continue for v, w in adj_fwd.get(u, []): nd = d + w if nd < dist_f.get(v, float('inf')): dist_f[v] = nd heapq.heappush(pq_f, (nd, v)) if v in dist_b: best = min(best, nd + dist_b[v]) else: # Expand backward (symmetric) if pq_b: d, u = heapq.heappop(pq_b) if d > dist_b.get(u, float('inf')): continue for v, w in adj_bwd.get(u, []): nd = d + w if nd < dist_b.get(v, float('inf')): dist_b[v] = nd heapq.heappush(pq_b, (nd, v)) if v in dist_f: best = min(best, nd + dist_f[v]) return best if best < float('inf') else -1
Q: Your shortest-path algorithm works on small test cases but fails on the full dataset. The distances are correct for some vertices but wrong for others.
A: Classic symptoms of integer overflow. If edge weights can be up to 109 and paths can have up to 105 edges, the total weight can reach 1014 — exceeding 32-bit integer range. Switch to 64-bit integers or use float('inf') initialization carefully (avoid adding to infinity).
Q: You run Dijkstra on an undirected graph and get TLE. The graph has V=105 vertices and E=105 edges. Your code should be O((V+E) log V) which is about 3M operations. Why is it slow?
A: Undirected graph means each edge appears twice in the adjacency list (both directions), so you have 2E = 2 × 105 effective edges. But the real problem is likely the heap implementation. If you are not using lazy deletion properly, you may be pushing duplicate entries without the skip guard, causing the heap to grow to O(E²) entries. Or your vertex comparison in the heap is expensive (comparing strings instead of integers).
Contraction Hierarchies (Geisberger et al., 2008) — The gold standard for road network routing. Preprocessing: minutes. Query: microseconds. Used by OSRM, GraphHopper, and Google Maps' static routing layer. The key idea: repeatedly remove the "least important" vertex, adding shortcut edges to preserve distances. At query time, bidirectional Dijkstra only moves "upward" in the importance hierarchy, exploring ~500 nodes instead of millions.
Hub Labeling (Abraham et al., 2011) — Precompute a "label" for each vertex: a set of hub vertices with precomputed distances to each hub. Query = intersect the labels of source and target, find the hub that minimizes d(s, hub) + d(hub, t). Sub-microsecond queries but requires gigabytes of memory for large graphs. The theoretical foundation is the highway dimension of the graph.
Transit Node Routing (Bast et al., 2007) — For road networks, identify a small set (~10,000) of "transit nodes" (major highway intersections). Precompute all-pairs shortest paths between transit nodes. For any long-distance query, find the nearest transit nodes to source and target, then look up the precomputed table. Microsecond queries. Does not work well for local queries (within the same city).
Delta-Stepping (Meyer & Sanders, 1998) — A parallel shortest-path algorithm. Partition edges into "light" (weight ≤ Δ) and "heavy" (weight > Δ). Process light edges eagerly (they may cause chain relaxations within the same "bucket"), heavy edges lazily. Achieves near-linear speedup on multicore CPUs and GPUs. Used in graph analytics frameworks like Gunrock and GraphBLAS.
Customizable Route Planning (Delling et al., 2011) — Separate the graph structure from the edge weights. Precompute a weight-independent structure (like a contraction order), then quickly adapt to different weight functions (fastest, shortest, avoid tolls, prefer scenic roads). This enables real-time personalized routing without full re-preprocessing.
Learned Heuristics for A* (2020s) — Train graph neural networks to predict h(v) for specific graph families. The GNN learns structural features that correlate with shortest-path distances. Outperforms hand-crafted heuristics on structured domains like game maps, protein folding graphs, and VLSI circuit layouts. Still limited to domains with enough training data and consistent graph structure.
Thorup's Algorithm (1999) — For undirected graphs with integer weights, achieves O(V + E) time — linear! Uses a component hierarchy and clever RAM operations. Theoretically optimal but very complex to implement. Demonstrates that the O(log V) factor in Dijkstra is not fundamental but an artifact of comparison-based priority queues.
Quantum Shortest Paths — Grover's algorithm can speed up unstructured graph search from O(V) to O(√V). For APSP (all-pairs shortest paths), quantum algorithms achieve O(V2.5) vs classical O(V3) for Floyd-Warshall. Practical quantum advantage for shortest paths remains a long way off, but it is an active area of research.
| Pattern | Problems | Algorithm | Key Trick |
|---|---|---|---|
| Standard Dijkstra | 743, 1514, 1631, 2290 | Dijkstra + heap | Lazy deletion |
| K-limited Bellman-Ford | 787 | BF with pass limit | Snapshot dist before each pass |
| 0-1 BFS | 1368, 2290 | Deque | 0-weight front, 1-weight back |
| Multi-state Dijkstra | 1129, 1786, 882 | Dijkstra | State = (vertex, extra_info) |
| Negative cycles | Arbitrage, 1334 | Bellman-Ford | V-th pass still relaxes |
| DAG shortest | Topo sort problems | Topo + relax | One pass suffices |
| Grid shortest path | 1091, 1293, 490 | BFS or Dijkstra | Unweighted = BFS, weighted = Dijkstra |
Many interview problems do not say "find the shortest path." They are disguised. Here are the telltale signs:
| Problem Statement Contains... | Graph Model | Algorithm |
|---|---|---|
| "minimum cost to reach..." | States = vertices, transitions = edges | Dijkstra / BFS |
| "minimum number of operations to transform..." | States = configurations, ops = edges (weight 1) | BFS |
| "cheapest way with at most K stops..." | Direct graph with K-limited relaxation | BF with K+1 passes |
| "can you detect a profitable cycle..." | -log(rates) on complete graph | Bellman-Ford neg cycle |
| "minimum time to complete all tasks..." | Task DAG | Critical path (longest in DAG) |
| "minimum cost to connect grid cells..." | Grid graph, cell costs as weights | Dijkstra on grid |
| "shortest path with constraint X..." | Expanded state: (vertex, X_state) | Multi-state Dijkstra |
This comprehensive table covers every algorithm variant discussed in this lesson. Reference it when comparing approaches.
| Algorithm | Time | Space | Neg Edges | Neg Cycles | Notes |
|---|---|---|---|---|---|
| BFS | O(V+E) | O(V) | No | No | Unweighted only |
| 0-1 BFS | O(V+E) | O(V) | No | No | Weights 0 or 1 only |
| Dial's | O(V+E+C) | O(V+C) | No | No | Integer weights [0,C] |
| DAG Shortest | O(V+E) | O(V+E) | Yes | N/A | DAGs only |
| Dijkstra (array) | O(V²+E) | O(V) | No | No | Best for dense graphs |
| Dijkstra (heap) | O((V+E) log V) | O(V+E) | No | No | Default choice |
| Dijkstra (Fib) | O(V log V + E) | O(V) | No | No | Theoretical only |
| Bellman-Ford | O(VE) | O(V) | Yes | Detects | Most general |
| SPFA | O(E) avg, O(VE) worst | O(V) | Yes | Detects | Queue-based BF |
| A* | h-dependent | O(V) | No | No | Point-to-point |
| Bidirectional Dijkstra | O((V+E) log V) | O(V) | No | No | ~2x speedup P2P |
| Contraction Hierarchies | ~μs per query | O(V+E) | No | No | Minutes preprocess |
Here are copy-paste templates for the most common interview patterns. Each is tested and correct.
python # Template 1: Dijkstra on a grid # Find minimum cost to go from (0,0) to (n-1,m-1) # where grid[r][c] is the cost to enter cell (r,c). def min_cost_grid(grid): n, m = len(grid), len(grid[0]) dist = [[float('inf')] * m for _ in range(n)] dist[0][0] = grid[0][0] pq = [(grid[0][0], 0, 0)] while pq: d, r, c = heapq.heappop(pq) if d > dist[r][c]: continue # stale if r == n-1 and c == m-1: return d for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]: nr, nc = r+dr, c+dc if 0 <= nr < n and 0 <= nc < m: nd = d + grid[nr][nc] if nd < dist[nr][nc]: dist[nr][nc] = nd heapq.heappush(pq, (nd, nr, nc)) return dist[n-1][m-1]
python # Template 1b: Dijkstra with path reconstruction # Returns both the distance and the actual path. def dijkstra_with_path(adj, source, target): """Returns (distance, path) where path is a list of vertices.""" dist = {} pred = {} pq = [(0, source)] best = {source: 0} while pq: d, u = heapq.heappop(pq) if u in dist: continue dist[u] = d if u == target: # Reconstruct path path = [] v = target while v is not None: path.append(v) v = pred.get(v) return d, path[::-1] for v, w in adj.get(u, []): if v not in dist: nd = d + w if nd < best.get(v, float('inf')): best[v] = nd pred[v] = u heapq.heappush(pq, (nd, v)) return float('inf'), []
python # Template 2: BFS shortest path (unweighted) # Minimum number of operations to transform start to goal. from collections import deque def min_operations(start, goal, get_neighbors): """get_neighbors(state) returns list of next states.""" if start == goal: return 0 visited = {start} q = deque([(start, 0)]) while q: state, d = q.popleft() for nxt in get_neighbors(state): if nxt == goal: return d + 1 if nxt not in visited: visited.add(nxt) q.append((nxt, d + 1)) return -1 # unreachable
python # Template 3: Dijkstra with state expansion # Shortest path with at most K obstacles removed. # State = (row, col, obstacles_removed) def shortest_path_k_obstacles(grid, k): n, m = len(grid), len(grid[0]) dist = {} pq = [(0, 0, 0, 0)] # (dist, r, c, removed) while pq: d, r, c, rem = heapq.heappop(pq) if (r, c, rem) in dist: continue dist[(r, c, rem)] = d if r == n-1 and c == m-1: return d for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]: nr, nc = r+dr, c+dc if 0 <= nr < n and 0 <= nc < m: new_rem = rem + grid[nr][nc] if new_rem <= k and (nr,nc,new_rem) not in dist: heapq.heappush(pq, (d+1, nr, nc, new_rem)) return -1
| Mistake | Frequency | How to Avoid |
|---|---|---|
| Using Dijkstra on negative edges | Very common | Always ask: "Are there negative edges?" before choosing algorithm |
| Forgetting to handle unreachable vertices | Common | Check dist == inf before using distances |
| Off-by-one in k-stop Bellman-Ford | Common | k stops = k+1 edges = k+1 passes |
| Not snapshotting dist in k-limited BF | Very common | Always use prev = dist[:] to prevent chain relaxation |
| Forgetting to build reverse graph for backward search | Moderate | Draw the graph on paper first |
| Inadmissible A* heuristic | Moderate | Verify h(v) ≤ true distance for test cases |
| String vertex labels crashing heapq | Moderate | Use (dist, int_id) as heap tuples |
| Not handling self-loops or parallel edges | Rare but tricky | Test on edge cases: loops, multi-edges, disconnected graphs |
For each scenario below, immediately identify the correct algorithm. No waffling — in an interview you need instant pattern recognition.
| # | Problem | Answer |
|---|---|---|
| 1 | Find shortest path in a maze (unweighted grid) | BFS |
| 2 | Find minimum fuel cost between cities (positive weights) | Dijkstra |
| 3 | Find minimum time with at most 2 transfers | BF with 3 passes |
| 4 | Find build order and critical path for a project | DAG shortest (negated) |
| 5 | Detect if currency exchange cycle is profitable | BF negative cycle detection |
| 6 | Shortest path from A to B on a map with coordinates | A* with Euclidean heuristic |
| 7 | Shortest path where some edges are free (weight 0 or 1) | 0-1 BFS |
| 8 | Shortest path avoiding certain nodes (blocked cells) | Dijkstra (exclude blocked from adj) |
| 9 | Shortest path where you can remove up to K walls | Multi-state Dijkstra: (r,c,walls_removed) |
| 10 | All-pairs shortest paths with some negative edges | Johnson's (BF re-weight + V Dijkstras) |
Edsger Dijkstra published his algorithm in 1959 — in a three-page paper. Richard Bellman and Lester Ford developed theirs independently in the late 1950s. These algorithms are over 65 years old and remain central to computer science. Every time you open a map app, send a packet across the internet, or play a video game, you are using their ideas.
The algorithms themselves are simple. The art lies in knowing which one to apply, how to model the problem as a graph, and how to handle the edge cases (negative weights, disconnected components, limited resources). Master the decision tree above, practice the coding templates, and you will handle any shortest-path problem thrown at you in an interview.
This lesson's chapters form a dependency DAG of their own (fitting, given the topic):
If you are short on time: read Chapters 0, 1, 4 (Dijkstra), and 9 (Arsenal). This covers the most interview-relevant material. Come back for Bellman-Ford (Ch 2) and DAG shortest paths (Ch 3) when you encounter problems that need negative-edge handling or DAG structure. The A* chapter (Ch 7) is valuable for game AI and robotics interviews.
The material in this lesson draws from:
| Source | Coverage |
|---|---|
| CLRS Ch. 24 (Cormen et al.) | Bellman-Ford, DAG shortest paths, Dijkstra, correctness proofs |
| Dijkstra, "A note on two problems in connexion with graphs" (1959) | Original Dijkstra paper |
| Hart, Nilsson, Raphael, "A Formal Basis for the Heuristic Determination of Minimum Cost Paths" (1968) | A* search algorithm |
| Geisberger et al., "Contraction Hierarchies" (2008) | Modern road routing |
| Fredman & Tarjan, "Fibonacci heaps" (1987) | O(V log V + E) Dijkstra |
| Meyer & Sanders, "Delta-Stepping" (1998) | Parallel shortest paths |
| Abraham et al., "Hub Labeling" (2011) | Sub-microsecond queries |
| Bast et al., "Route Planning in Transportation Networks" (2016) | Survey of practical algorithms |
| Bellman, "On a routing problem" (1958) | Bellman-Ford algorithm |
| Ford, "Network Flow Theory" (1956) | Ford's contribution to Bellman-Ford |
| Korf, "Depth-first iterative-deepening: An optimal admissible tree search" (1985) | IDA* search |
| Johnson, "Efficient algorithms for shortest paths in sparse networks" (1977) | Johnson's re-weighting technique |
| Thorup, "Undirected single-source shortest paths with positive integer weights in linear time" (1999) | O(V+E) algorithm |
| Delling et al., "Customizable Route Planning in Road Networks" (2017) | Separating structure from weights |
For the deepest understanding, read CLRS Chapter 24 alongside this lesson. The textbook provides formal proofs with full mathematical rigor; this lesson provides the intuition, interactive simulations, and practical implementation details that the textbook intentionally omits.