State spaces, graph search, value iteration, Dijkstra, A* — the algorithmic foundations everything else builds on.
A warehouse robot wakes up at the loading dock — call it cell (0,0). Its job: deliver a pallet to shelf #247, sitting at cell (8,8) on a 9×9 grid. Between them: rows of shelves, some aisles blocked by pallets stacked on the floor. The robot can move up, down, left, or right — one tile per step. It cannot pass through blocked cells.
The question sounds simple: how does the robot find a path? But three deeper questions hide inside it, and Chapter 2 answers all three. First: how do we represent this problem mathematically — precisely enough for a computer to reason about it? Second: how do we search for a path without looping forever or wasting exponential work? Third: when some routes are cheaper than others — slippery floors, detour penalties — how do we find the optimal path?
The canvas below is the warehouse. Click any gray cell to toggle a wall (blocked aisle). The green cell is the start; the red cell is the goal. Try building mazes. Notice which configurations trap the robot completely — no path exists at all. Detecting "no path" is just as important as finding a path: a robot that searches forever for an impossible route is useless in production.
Click any gray cell to toggle a wall. Green = start (0,0). Red = goal (8,8). Every cell is a state. Every valid move is an action. This IS the state transition graph.
The naive approach — enumerate all possible paths, compute their costs, return the minimum — is computationally catastrophic. In a 9×9 grid, the number of self-avoiding paths from corner to corner can be astronomical (tens of millions). We need smarter algorithms that avoid ever explicitly enumerating paths.
The key insight: instead of thinking about paths (exponentially many), think about states (only 81). There are only 81 possible positions the robot can be in. If we track which states we've visited and build up costs incrementally, the computation is polynomial in |X| — not exponential in path length. This is the fundamental reason graph search algorithms work.
One more thing worth noticing: the number of states explodes in harder problems. Our 9×9 grid has 81 states — trivial for any computer. A Rubik's cube has 4.3 × 1019 states. A robot arm with 6 joints, each discretized to 360 angles, has 3606 ≈ 2.2 × 1015 states. The algorithms we build need to work on spaces too large to enumerate explicitly — which is why the theory matters.
The warehouse scenario neatly illustrates the three sub-problems Chapter 2 addresses. Notice how each one builds on the previous:
| Problem | Question | Answer algorithm |
|---|---|---|
| Feasibility | Does any path exist from start to goal? | BFS or DFS |
| Minimum steps | Which path uses the fewest moves? | BFS (wavefront guarantees it) |
| Minimum cost | Which path has lowest total cost (muddy/paved mix)? | Dijkstra or A* |
These three problems share the same mathematical framework — Formulation 2.1 — but differ in what they optimize. A key lesson from LaValle: choosing the right algorithm starts with identifying which problem you actually have. Running Dijkstra when you want minimum steps is correct but wasteful. Running BFS when you want minimum cost gives wrong answers.
Chapter 2 also draws a boundary LaValle is careful to respect: these algorithms assume the world is deterministic and fully observable. The robot knows exactly where it is (full observability), and actions always have their intended effect (determinism). Relax either assumption — the robot's sensors are noisy, or the floor is slippery — and you're in Chapters 10–12 (probabilistic planning). The discrete planning of Chapter 2 is the foundation: master it, and the probabilistic extensions become natural generalizations.
Before writing a single line of search code, LaValle insists on nailing the problem down precisely. Formulation 2.1 gives every discrete planning problem exactly five ingredients. Get these five right and the algorithm almost writes itself.
| Symbol | Name | Meaning |
|---|---|---|
| X | State space | All possible situations the system can be in |
| U(x) | Action space | Actions available at state x (varies by state) |
| f(x, u) | Transition function | Next state when taking action u from state x |
| xI | Initial state | Where we start |
| XG | Goal set | The set of states we want to reach |
1. The state space X. For our 9×9 warehouse grid: X = {(i, j) : i, j ∈ {0, …, 8}, (i, j) not a wall}. With no walls, |X| = 81. But the concept scales brutally: a Rubik's cube has |X| = 43,252,003,274,489,856,000 states. A 19×19 Go board has more states than atoms in the observable universe. The state must capture all and only the information needed to determine what happens next.
2. The action space U(x). At most interior cells: U(x) = {up, down, left, right} = {(−1,0), (1,0), (0,−1), (0,1)}. But at a boundary or next to a wall, some moves leave X, so they're removed from U(x). This dependence on x is critical — it's why we write U(x) and not just U. At different states, different actions are legal.
3. The transition function f(x, u) = x + u. For our grid this is pure vector addition. Worked example: x = (3, 4), u = (0, 1) → f((3,4), (0,1)) = (3, 5). The robot moved one tile up. For more complex systems — robot arms, game boards, autonomous vehicles — f is much more involved, but the concept is identical: it maps (state, action) to next state.
4. The initial state xI. The loading dock, xI = (0, 0). This is where planning starts. In some problems, xI is also a set (multiple possible starts), but Formulation 2.1 takes it as a single state.
5. The goal set XG. For one target shelf: XG = {(8, 8)}. But goals can be sets: any charging dock, any empty parking space, any configuration with the cargo delivered. Planning succeeds as soon as the robot reaches any x ∈ XG, regardless of which goal state.
Once you have X, U(x), and f, you implicitly define a directed graph. Vertices are states in X. There is a directed edge from x to x' if and only if there exists some u ∈ U(x) such that f(x, u) = x'. This is the state transition graph, and it is the central object of Chapter 2.
Planning = find a directed path in G from xI to any vertex in XG. That's the whole game. BFS, DFS, Dijkstra, A*, and value iteration are all just different strategies for searching this graph.
For a 9×9 grid, you could in principle build the full transition graph in memory: 81 vertices, ~288 edges, an adjacency list of ~1KB. But for the Rubik's cube with 4.3 × 1019 states, building the graph explicitly would require more RAM than exists on Earth. Instead, the graph is kept implicit — defined by the functions X, U(x), and f(x,u) rather than a stored data structure.
This is why Formulation 2.1 lists X, U(x), f rather than "an adjacency list." The functions are a compact description of a potentially enormous graph. The search algorithms — BFS, Dijkstra, A* — only generate the nodes they need. A* on the Rubik's cube might explore a few million nodes out of 4.3 × 1019 before finding the solution. The rest of the graph is never instantiated.
A 5×5 grid with the transition graph overlaid. Arrows show valid moves (directed edges). Click a cell to place a wall — watch its edges disappear and edges pointing to it vanish too.
For a 3×3 grid with no walls, let's explicitly list the state transition graph. This helps build intuition before we move to larger spaces:
python # Build the explicit state transition graph for a 3×3 grid prob = GridPlanningProblem(rows=3, cols=3) edges = [] for x in prob.states(): for u in prob.actions(x): xp = prob.f(x, u) edges.append((x, xp)) print(f"|V| = {len(prob.states())}") # 9 states print(f"|E| = {len(edges)}") # 24 directed edges # Show adjacency for (1,1) — the center cell center = (1, 1) neighbors = [prob.f(center, u) for u in prob.actions(center)] print(f"Neighbors of {center}: {neighbors}") # → [(0,1), (2,1), (1,0), (1,2)] — all 4 directions, center has them all corner = (0, 0) neighbors_corner = [prob.f(corner, u) for u in prob.actions(corner)] print(f"Neighbors of {corner}: {neighbors_corner}") # → [(1,0), (0,1)] — only 2 directions; corner has fewer options
The center cell (1,1) has 4 edges. Corner cells have 2 edges. Edge cells have 3 edges. For N×N: total edges ≈ 2×N×(N−1)×2 = 4N(N−1). For N=9: 4×9×8 = 288 edges — matching our earlier estimate.
An important subtlety: f is only applied when u ∈ U(x). Before calling f, the algorithm checks "is this action legal?" If (5,3) is adjacent to a wall on the left, then (−1,0) ∉ U((5,3)) and f is never called. The state space definition handles this implicitly by only including valid actions in U(x).
python class GridPlanningProblem: """Formulation 2.1 for a rectangular 2D grid.""" def __init__(self, rows, cols, walls=None, x_I=(0,0), X_G=None): self.rows = rows self.cols = cols self.walls = walls or set() # set of (i,j) tuples self.x_I = x_I self.X_G = X_G or {(rows-1, cols-1)} # State space X — all non-wall cells (implicit) def in_bounds(self, x): i, j = x return 0 <= i < self.rows and 0 <= j < self.cols # Action space U(x) — only moves that stay in X MOVES = [(-1,0), (1,0), (0,-1), (0,1)] # up, down, left, right def actions(self, x): i, j = x return [u for u in self.MOVES if self.in_bounds((i+u[0], j+u[1])) and (i+u[0], j+u[1]) not in self.walls] # Transition function f(x, u) = x + u (vector addition) def f(self, x, u): return (x[0] + u[0], x[1] + u[1]) # Edge cost l(x, u) — uniform for now, overridden later def cost(self, x, u): return 1 # Example usage — Formulation 2.1 for a 9×9 warehouse grid prob = GridPlanningProblem( rows=9, cols=9, walls={(3,3),(3,4),(4,3),(5,5),(6,4)}, x_I=(0,0), X_G={(8,8)} ) # Check: what actions are available at (3,2)? print("U((3,2)):", prob.actions((3,2))) # → [(-1,0), (1,0), (0,-1)] — (0,1) blocked by wall at (3,3) # Apply transition: f((3,2), (-1,0)) print("f((3,2),(-1,0)):", prob.f((3,2), (-1,0))) # → (2, 2) — moved up
The Markov property says: everything you need to know about the future is encoded in the current state x. History doesn't matter — only where you are right now. For our grid robot, position (i, j) satisfies this: knowing you're at (3, 4) tells you exactly what moves are available and where they lead, regardless of how you got there.
But Markov breaks down in surprising ways. Consider a robot that needs to carry a box from shelf A to dock B. Just knowing the robot's position isn't enough — the state must also include whether it's currently holding the box. Fail to include "holding" in x, and f(x, u) is undefined (does "move right" take the box along or leave it behind?). The fix: x = (robot_position, is_holding). State space doubles, but now f is well-defined.
Formulation 2.1 can represent a surprising variety of problems beyond navigation. The key is choosing X, U(x), and f correctly:
| Problem | State x | Action u | f(x, u) |
|---|---|---|---|
| Grid navigation | (row, col) | (Δrow, Δcol) | x + u |
| Puzzle (8-tile) | Tile configuration | Slide a tile | New configuration |
| Rubik's cube | Face permutation | Rotate a face | New permutation |
| Pick-and-place robot | (position, holding?) | Move/grab/release | Updated (pos, holding) |
| Turn-based game | Board position + turn | Legal move | New board + other turn |
Once you've cast a problem into Formulation 2.1, the same search algorithms — BFS, Dijkstra, A* — apply without modification. This universality is the deepest insight of Chapter 2.
The state space size |X| is the single most important factor in planning difficulty. Here's how different problems compare:
| Problem | |X| (states) | Tractable? |
|---|---|---|
| 9×9 warehouse grid | 81 | Instant |
| 100×100 city grid | 10,000 | Milliseconds |
| 10×10×10 3D maze | 1,000 | Instant |
| Rubik's cube | 4.3 × 1019 | Only with heuristics |
| 15-tile sliding puzzle | 1013 | Only with A* + pattern DBs |
| Chess (mid-game positions) | ~1047 | Requires game-specific methods |
Discrete planning is only feasible when |X| is manageable — or when heuristics focus the search on a small fraction of X. The choice of state representation is therefore crucial: a bad representation inflates |X| unnecessarily, while a clever one keeps it tight.
We have a state space, a transition function, and a goal. Now: how do we find the path? LaValle's Figure 2.4 presents a single forward search algorithm that subsumes BFS, DFS, Dijkstra, and A* as special cases. The only difference between them is how one data structure — the priority queue Q — is sorted. Learn this one algorithm, understand all four.
During the search, every state in X belongs to exactly one of three categories. Understanding these categories is key to understanding why the algorithm is correct.
| Category | Meaning | Where it lives |
|---|---|---|
| Unvisited | Never encountered. The algorithm doesn't know it exists yet. | Not in Q, not in visited set |
| Alive | Discovered — added to Q — but not yet fully processed. May have unvisited neighbors. | In Q |
| Dead | Fully processed. Every neighbor has been discovered. Will never be revisited. | Removed from Q, in visited set |
The algorithm starts with only xI alive. Every other state is unvisited. The algorithm ends when either (a) a goal state is removed from Q — SUCCESS — or (b) Q is empty and no goal was found — FAILURE.
LaValle uses the word systematic precisely. On a finite graph: systematic means eventually visiting every state reachable from xI. On an infinite graph: systematic means finding a solution in finite time, if one of finite length exists. The visited-state check at step 5 is exactly what makes the algorithm systematic — without it, cycles would cause infinite loops.
Being systematic is a guarantee, not a hope. A systematic algorithm can tell you definitively "no path exists" — because it will have exhausted every reachable state. A non-systematic algorithm that loops on cycles can never safely report failure. This is why the visited-state check is non-negotiable in production planners.
Everything so far searches forward from xI toward XG. LaValle also discusses backward search: start from XG and search backward toward xI. At each step, instead of asking "which states can I reach from x?", ask "which states can reach x?"
Backward search uses a backward action space U−1(x): the set of actions that could lead into state x. For our grid, backward actions are the same as forward actions (the grid is undirected), but in general problems (one-way doors, asymmetric transitions), they differ. Chapter 8 of this lesson covers backward search in detail.
The payoff: bidirectional search runs both simultaneously, meeting in the middle. Theoretical complexity drops from O(bd) to O(bd/2) — an exponential improvement. On large graphs, bidirectional BFS can be 1000× faster than forward BFS alone.
Watch BFS (FIFO queue) explore a 7×7 grid. Orange = alive (in Q). Teal = dead (processed). Gray = unvisited. Path shown in yellow when goal is reached.
Let's trace forward search (BFS) on a tiny 3×3 grid, no walls. Start xI = (0,0), goal XG = {(2,2)}. Neighbors explored in order: right, down (so (i,j+1) before (i+1,j)).
| Iter | x extracted | Q before | States discovered | Q after |
|---|---|---|---|---|
| Init | — | — | xI=(0,0) | [(0,0)] |
| 1 | (0,0) | [(0,0)] | (0,1),(1,0) | [(0,1),(1,0)] |
| 2 | (0,1) | [(0,1),(1,0)] | (0,2),(1,1) | [(1,0),(0,2),(1,1)] |
| 3 | (1,0) | [(1,0),(0,2),(1,1)] | (2,0),(1,1)→skip | [(0,2),(1,1),(2,0)] |
| 4 | (0,2) | [(0,2),(1,1),(2,0)] | (1,2) | [(1,1),(2,0),(1,2)] |
| 5 | (1,1) | [(1,1),(2,0),(1,2)] | (2,1),(1,2)→skip | [(2,0),(1,2),(2,1)] |
| 6 | (2,0) | [(2,0),(1,2),(2,1)] | (2,1)→skip | [(1,2),(2,1)] |
| 7 | (1,2) | [(1,2),(2,1)] | (2,2) | [(2,1),(2,2)] |
| 8 | (2,1) | [(2,1),(2,2)] | — | [(2,2)] |
| 9 | (2,2) | [(2,2)] | GOAL! Return | — |
Parent pointers: parent[(0,1)]=(0,0), parent[(1,0)]=(0,0), parent[(0,2)]=(0,1), parent[(1,1)]=(0,1), parent[(2,0)]=(1,0), parent[(1,2)]=(0,2), parent[(2,1)]=(1,1), parent[(2,2)]=(1,2).
Trace back: (2,2) → parent=(1,2) → parent=(0,2) → parent=(0,1) → parent=(0,0). Reversed: (0,0)→(0,1)→(0,2)→(1,2)→(2,2). A 4-step path — the optimal for a 3×3 grid with no walls.
The algorithm tells you the path exists; parent pointers tell you what it is. The trick: every time you discover a new state x' from x, record parent[x'] = x. At the end, you have a chain from goal back to start. Trace it:
Reverse this list and you have the forward path. The action at each step is uk = xk+1 − xk (for grid problems). The plan length K equals the path length minus 1.
A subtle point: parent pointers implicitly define a shortest-path tree rooted at xI. Every state in the tree has a unique parent — the state from which it was first discovered. This tree covers all reachable states, not just the goal. If you need paths to multiple goals, BFS builds this tree for free.
python from collections import deque def forward_search(prob, mode='bfs'): """ Generic forward search (LaValle Figure 2.4). mode='bfs' → FIFO queue (breadth-first, shortest path) mode='dfs' → LIFO stack (depth-first, any path) """ x_I = prob.x_I X_G = prob.X_G visited = {x_I} parent = {x_I: None} if mode == 'bfs': Q = deque([x_I]) def get_first(): return Q.popleft() # FIFO def insert(x): Q.append(x) else: Q = [x_I] def get_first(): return Q.pop() # LIFO def insert(x): Q.append(x) while Q: # step 2: is Q empty? x = get_first() # step 3: GetFirst if x in X_G: # step 4: goal check path = [] cur = x while cur is not None: path.append(cur) cur = parent[cur] return list(reversed(path)) # path from x_I to goal for u in prob.actions(x): # step 5: expand xp = prob.f(x, u) if xp not in visited: visited.add(xp) parent[xp] = x insert(xp) return None # FAILURE — no path exists
python # Complete end-to-end planning on the 9×9 warehouse grid prob = GridPlanningProblem( rows=9, cols=9, walls={(3,3),(3,4),(3,5),(4,3),(5,3),(5,4),(5,5)}, # U-shaped obstacle x_I=(0,0), X_G={(8,8)} ) # Run BFS (forward search with FIFO) path = forward_search(prob, mode='bfs') if path: print(f"Path found! Length = {len(path)-1} steps") print(f"Path: {' → '.join(str(s) for s in path)}") # Reconstruct actions from path actions = [(path[k+1][0]-path[k][0], path[k+1][1]-path[k][1]) for k in range(len(path)-1)] direction = {(-1,0):'↑',(1,0):'↓',(0,-1):'←',(0,1):'→'} print(f"Actions: {''.join(direction[a] for a in actions)}") else: print("No path exists — goal is unreachable")
Here is the key insight that makes the master algorithm so powerful: BFS and DFS are the same algorithm. Change one data structure — the order of Q — and the entire character of the search transforms. Understanding why this is true, and what the consequences are, is the subject of this chapter.
States discovered earliest are processed first. The effect: the search expands in concentric wavefronts. All states reachable in 1 step from xI are processed before any state reachable in 2 steps. All 2-step states before 3-step states. And so on.
Most recently discovered state is processed first. The effect: the search plunges deep — it follows one direction as far as possible before backtracking to try another. Like exploring a maze by always turning left.
Despite its drawbacks, DFS is preferred in several scenarios:
It's worth understanding why BFS gives the shortest path, not just that it does. The argument is short: BFS processes all states at distance d before any state at distance d+1. So when BFS first reaches the goal, it has explored all paths of length ≤ d — and found none that work (else it would have stopped). Therefore the d-step path it found must be optimal.
DFS doesn't have this property because it might find a long path — say d = 20 — even though a 3-step path exists. DFS just happens not to have explored that 3-step path yet when it found the 20-step one.
Start (0,0), Goal (4,4), wall at (2,2). Neighbors explored in priority order: down, right, up, left.
| Step | BFS — Queue [FIFO] | DFS — Stack [LIFO] |
|---|---|---|
| Init | Q = [(0,0)] | S = [(0,0)] |
| 1 | Pop (0,0); push (1,0),(0,1). Q=[(1,0),(0,1)] | Pop (0,0); push (0,1),(1,0). S=[(0,1),(1,0)] |
| 2 | Pop (1,0); push (2,0),(1,1). Q=[(0,1),(2,0),(1,1)] | Pop (1,0); push (1,1),(2,0). S=[(0,1),(1,1),(2,0)] |
| 3 | Pop (0,1); push (0,2). Q=[(2,0),(1,1),(0,2)] | Pop (2,0); push (3,0). S=[(0,1),(1,1),(3,0)] |
| 4 | Pop (2,0); push (3,0). Q=[(1,1),(0,2),(3,0)] | Pop (3,0); push (4,0). S=[(0,1),(1,1),(4,0)] |
| … | Wavefront expands uniformly outward | Dives: (4,0)→(4,1)→(4,2)→(4,3)→(4,4) GOAL |
| Final | 8-step optimal path | 10-step non-optimal path |
BFS finds the 8-step shortest path. DFS finds a valid but longer 10-step path — it dived south then east along the bottom edge rather than taking the diagonal shortcut. Both are correct plans; only BFS is guaranteed to be the shortest.
Same 7×7 grid, same start and goal. Left: BFS expands in uniform waves. Right: DFS dives aggressively. Orange = alive. Teal = dead. Counters show nodes expanded so far.
A natural question: can we get BFS's optimality guarantee with DFS's memory efficiency? Yes — Iterative Deepening Depth-First Search (IDDFS) does exactly this. It runs DFS with a depth limit d = 1, 2, 3, … until the goal is found. At each iteration, any path longer than the current limit is cut off.
IDDFS is optimal (finds minimum-step path), complete, and uses only O(bd) memory. The cost: it re-expands states on each iteration. But the work is dominated by the final iteration, so the total extra work is only a constant factor (about b/(b−1)). For b=4 (grid), IDDFS does 33% more work than BFS while using a fraction of the memory.
| Algorithm | Optimal? | Complete? | Time | Space |
|---|---|---|---|---|
| BFS | Yes (unit cost) | Yes (finite+infinite) | O(bd) | O(bd) |
| DFS | No | Yes (finite only) | O(bm) m=max depth | O(bm) |
| IDDFS | Yes (unit cost) | Yes (finite+infinite) | O(bd) | O(bd) |
| Dijkstra | Yes (weighted) | Yes (non-neg costs) | O((V+E)log V) | O(V) |
| A* | Yes (admissible h) | Yes (non-neg costs) | O(bd) worst | O(bd) |
python def iddfs(prob): """Iterative deepening DFS — optimal, complete, O(bd) memory.""" def dls(x, goal, depth, path, visited): """Depth-limited search.""" if x in goal: return path[:] # found! if depth == 0: return None # depth limit hit for u in prob.actions(x): xp = prob.f(x, u) if xp not in visited: visited.add(xp) path.append(xp) result = dls(xp, goal, depth-1, path, visited) if result is not None: return result path.pop() visited.discard(xp) return None for d in range(1, 10000): # depth limit grows: 1, 2, 3, ... result = dls( prob.x_I, prob.X_G, d, path=[prob.x_I], visited={prob.x_I} ) if result is not None: return result return None
BFS finds the path with fewest steps. But what if steps have different costs? Crossing a muddy aisle costs 3 time units; crossing a paved aisle costs 1. Two 8-step paths might cost 12 and 32 respectively. BFS cannot distinguish them. Dijkstra's algorithm sorts Q by cost instead of order, finding the minimum-cost path.
Dijkstra = forward search where Q is a min-heap sorted by cost-to-come C(x). The cost-to-come C(x) is the total cost of the cheapest path found so far from xI to x. When x is extracted from Q (the global minimum), its cost becomes proven optimal — written C*(x).
Here l(x, u) is the edge cost — the cost of taking action u at state x. When x' is first discovered, C(x') = C*(x) + l(x, u). If x' is later discovered again via a different, cheaper path, C(x') is updated downward (decrease-key in the heap).
The proof is elegant and worth knowing. Claim: when a state x is extracted from Q (minimum C element), C(x) = C*(x). Proof by induction:
Dijkstra is the backbone of GPS routing. A city road network has intersections as states and roads as edges, with travel time as edge cost. Hundreds of millions of intersections — far too large to enumerate explicitly. In practice, Google Maps and similar systems use Dijkstra with two key optimizations: (1) bidirectional Dijkstra runs two simultaneous searches (forward from source, backward from destination) that meet in the middle, reducing search to roughly √|V| nodes each; (2) highway hierarchies precompute long-distance routes so queries stay local. Both are Formulation 2.2 under the hood.
The Open Source Routing Machine (OSRM), used by many navigation apps, uses Contraction Hierarchies: a preprocessing step that ranks nodes by importance and adds "shortcut" edges (if the fastest A→B→C path is known, add direct A→C shortcut). After preprocessing, Dijkstra on the contracted graph answers continent-scale queries in under a millisecond. The core algorithm is still Dijkstra; the speedup comes from a smarter graph representation.
Dijkstra reduces to BFS when all edge costs are equal (l(x,u) = constant). In that case, cost-to-come is just the number of steps, so the min-heap ordering is identical to FIFO ordering. Running Dijkstra on unit-cost graphs is correct but slower than BFS (heap operations are O(log n) vs O(1) queue operations). Choose BFS when costs are uniform; choose Dijkstra when they're not.
Horizontal edges cost 1. Vertical edges cost 2. Start (0,0), Goal (3,3). We process states in cost order:
| Extract | C*(x) | Updates to neighbors |
|---|---|---|
| (0,0) | 0 | (0,1)→C=1, (1,0)→C=2 |
| (0,1) | 1 | (0,2)→C=2, (1,1)→C=3 |
| (1,0) | 2 | (1,1)→min(3,4)=3 no update, (2,0)→C=4 |
| (0,2) | 2 | (0,3)→C=3, (1,2)→C=4 |
| (1,1) | 3 | (1,2)→min(4,5)=4, (2,1)→C=5 |
| (0,3) | 3 | (1,3)→C=5 |
| … | … | Eventually (3,3) extracted with C*=6 |
The minimum-cost path goes right three times then down three times: (0,0)→(0,1)→(0,2)→(0,3)→(1,3)→(2,3)→(3,3). Cost = 3 horizontal moves × 1 + 3 vertical moves × 2 = 3 + 6 = 9. Any other path to (3,3) would mix horizontal/vertical differently — but since horizontal is cheaper, the optimal strategy is to exhaust horizontal movement first. Dijkstra discovers this automatically.
Contrast with BFS: it would find a path with 6 steps (the minimum step count), but that path might mix horizontal and vertical moves — say 3+3 — costing the same 9. Or it might find a different 6-step path that cost 11. BFS cannot distinguish; Dijkstra can.
Each cell shows its current C(x). Orange = alive (in heap). Teal = dead (settled). The min-heap extracts the cheapest alive state each step. Path shown in yellow at the end.
python import heapq from math import inf def dijkstra(prob): """ Dijkstra's algorithm. Returns (path, cost) or (None, inf). prob.cost(x, u) must return non-negative edge cost l(x, u). """ x_I = prob.x_I X_G = prob.X_G C = {x_I: 0} # cost-to-come estimates parent = {x_I: None} # for path reconstruction heap = [(0, x_I)] # min-heap: (cost, state) while heap: c, x = heapq.heappop(heap) # Skip stale entries (x was already settled with lower cost) if c > C.get(x, inf): continue # Goal check — C*(x) is now proven optimal if x in X_G: path = [] cur = x while cur is not None: path.append(cur) cur = parent[cur] return list(reversed(path)), c # Expand x for u in prob.actions(x): xp = prob.f(x, u) new_c = c + prob.cost(x, u) # C*(x) + l(x,u) if new_c < C.get(xp, inf): C[xp] = new_c parent[xp] = x heapq.heappush(heap, (new_c, xp)) return None, inf # FAILURE — no path exists
With a binary heap: O((|V| + |E|) log |V|). With a Fibonacci heap (more complex implementation): O(|V| log |V| + |E|) — faster for dense graphs. For our 9×9 grid: |V| = 81, |E| ≈ 288, so Dijkstra finishes in microseconds. For a city map with 106 intersections and 2.5×106 roads, it takes a few hundred milliseconds — still practical.
| Data structure for Q | Complexity | Best for |
|---|---|---|
| Binary heap (Python heapq) | O((|V| + |E|) log |V|) | Sparse graphs (most planning) |
| Fibonacci heap | O(|V| log |V| + |E|) | Dense graphs, large |E|/|V| |
| Array (unsorted) | O(|V|²) | Very dense graphs, small |V| |
Dijkstra is optimal but blind. It expands states in all directions equally — even away from the goal. If you're navigating from New York to Los Angeles, Dijkstra spends time exploring Boston. A* search adds a compass: a heuristic estimate of the remaining cost, so the algorithm spends time expanding states that look promising.
A* sorts Q by:
where C*(x) is the proven cost-to-come (same as Dijkstra) and Ĝ*(x) is a heuristic estimate of cost-to-go — how much it costs to reach the goal from x. We can't compute the true G*(x) exactly (that's the whole problem!), so we estimate it with a function we can evaluate cheaply.
A heuristic is admissible if it never overestimates the true cost-to-go:
If Ĝ* is admissible, A* is guaranteed to find the optimal path. The proof uses the same inductive argument as Dijkstra, extended: when a state x is extracted with minimum f = C*(x) + Ĝ*(x), any cheaper path would require a node y on the frontier with f(y) ≤ f(x) — but x was the minimum, so no such y exists.
For our warehouse grid with unit costs and goal g = (gi, gj):
This is admissible because the robot must take at least this many steps — it can't teleport, and grid movement is along axes only. Manhattan distance is exact when there are no walls (every step makes optimal progress). With walls, the true cost is larger, so the estimate remains a lower bound — admissible by construction.
For Rubik's cube and sliding puzzles, Manhattan distance is too weak — it ignores interactions between tiles. The state-of-the-art uses pattern databases: precompute the exact optimal cost to solve a sub-problem (e.g., just the corner cubie configuration) for every possible configuration of that sub-problem. Store this in a table. At runtime, look up the table value as Ĝ*. This is admissible (solving just corners is easier than the full cube) and extremely tight — usually within 2× of the true G*.
For our warehouse grid, we don't need this — Manhattan distance is tight and free to compute. But the pattern database idea illustrates a general principle: any precomputed lower bound on G*(x) is an admissible heuristic.
A useful recipe for building admissible heuristics: solve a relaxed version of the problem. Remove constraints from the original problem, solve the easier version, and use that optimal cost as Ĝ*. Since the relaxed problem is easier, its optimal cost ≤ the original's — admissibility by construction.
| Original problem | Relaxation | Resulting heuristic |
|---|---|---|
| Grid with walls | Remove all walls | Manhattan distance |
| Grid with walls | Allow diagonal moves | Chebyshev distance = max(|Δr|, |Δc|) |
| Grid with walls | Allow teleportation | h = 0 (admissible but useless) |
| Rubik's cube | Solve each face independently | Pattern database heuristic |
Tighter relaxations give better heuristics. Manhattan distance (remove walls, keep grid) is tighter than Euclidean distance (allow straight-line movement through walls). Both admissible; Manhattan gives fewer expanded nodes because it's closer to the true G*.
| Sort key for Q | Algorithm | Optimal? | Behavior |
|---|---|---|---|
| insertion order | BFS (unit costs) | Yes (fewest steps) | Uniform wavefront |
| C*(x) | Dijkstra | Yes (minimum cost) | Uniform cost wavefront |
| C*(x) + Ĝ*(x), admissible | A* | Yes (minimum cost) | Directed toward goal |
| Ĝ*(x) only | Greedy best-first | No | Fastest but can get stuck |
| C*(x) + Ĝ*(x), inadmissible | Weighted A* | No (bounded suboptimal) | Faster but potentially suboptimal |
If Ĝ*(x) = 0 for all x → A* degenerates to Dijkstra (the heuristic provides no guidance). If Ĝ*(x) = G*(x) exactly → A* expands only nodes on the optimal path (perfect, oracle heuristic — impossible in general). Admissibility is the sweet spot: any admissible Ĝ* gives you optimality. Better admissible heuristics (closer to G*) give faster search. The challenge is designing heuristics that are tight but still admissible.
A heuristic is consistent (also called monotone) if for every state x and action u leading to x':
This is the triangle inequality for heuristics: the estimated cost from x cannot be more than one step's cost plus the estimated cost from x'. Consistency implies admissibility (but not vice versa). The practical benefit: with a consistent heuristic, A* never needs to re-open already-closed states. Each state is settled exactly once — like Dijkstra — making the implementation simpler and avoiding a correctness edge case.
Manhattan distance on a grid with unit costs is consistent: |xi − gi| + |xj − gj| ≤ 1 + |x'i − gi| + |x'j − gj| holds because one step can decrease Manhattan distance by at most 1. Any admissible heuristic derived from a problem relaxation is typically consistent too.
Same maze. Left: Dijkstra expands uniformly. Right: A* (Manhattan heuristic) expands toward the goal. Both find the same optimal path. Counters show nodes expanded — watch A* do more with less.
python import heapq from math import inf def manhattan(x, goal): return abs(x[0] - goal[0]) + abs(x[1] - goal[1]) def astar(prob, heuristic=None): """ A* search with a pluggable heuristic. heuristic(x) → estimated cost-to-go from x. Must be admissible for optimal guarantee. """ x_I = prob.x_I goal = list(prob.X_G)[0] # single goal for simplicity h = heuristic or (lambda x: manhattan(x, goal)) C = {x_I: 0} parent = {x_I: None} heap = [(h(x_I), 0, x_I)] # (f = C + h, C, state) while heap: f, c, x = heapq.heappop(heap) if c > C.get(x, inf): # stale entry continue if x == goal: path = [] cur = x while cur is not None: path.append(cur) cur = parent[cur] return list(reversed(path)), c for u in prob.actions(x): xp = prob.f(x, u) new_c = c + prob.cost(x, u) if new_c < C.get(xp, inf): C[xp] = new_c parent[xp] = x # Sort by f = cost-to-come + heuristic heapq.heappush(heap, (new_c + h(xp), new_c, xp)) return None, inf # Inadmissible heuristic — faster but may be suboptimal def weighted_astar(prob, w=2.0): goal = list(prob.X_G)[0] h = lambda x: w * manhattan(x, goal) # overestimates by w× return astar(prob, heuristic=h) # still complete, not optimal
A* truly shines when the state space has a natural metric and the goal is in a "known direction." Grid planning is the canonical case: Manhattan distance tells the algorithm "the goal is down and to the right, so expand downward and rightward states first." Dijkstra doesn't know this — it expands a full disc of equal-cost states in every direction.
A* performs poorly when: (a) the heuristic is weak (Ĝ*(x) ≈ 0 everywhere → degenerates to Dijkstra), or (b) the optimal path requires going "backwards" relative to the goal — e.g., a maze that forces a long detour away from the goal before arriving. In case (b), Ĝ* misleads the search initially, and A* may expand more nodes than Dijkstra before discovering the true path requires backtracking.
5×5 grid, unit costs, wall at (1,2), (2,2), (3,2). Start (0,0), Goal (4,4). f(x) = C*(x) + Manhattan(x, (4,4)).
| Step | Dijkstra extracts (C*) | A* extracts (f = C* + h) |
|---|---|---|
| 1 | (0,0) C=0 | (0,0) f=0+8=8 |
| 2 | (1,0) C=1 or (0,1) C=1 | (1,0) f=1+7=8 or (0,1) f=1+7=8 |
| 3 | (2,0),(1,1),(0,2) C=2 (all tied) | (1,1) f=2+6=8 — same or (2,0) f=2+6=8 |
| … | Expands all of left side of grid | Focuses on lower-right quadrant |
| Final | ~20 nodes expanded | ~12 nodes expanded |
A* discovers the wall at column 2 and routes around it to the right — fewer wasted expansions because it knew to go right and down from the start. Dijkstra expends equal effort exploring the left side of the grid even though it leads away from the goal.
So far we've asked: does a path exist? Now we ask: among all paths, which is best? This is Formulation 2.2 — optimization layered on top of Formulation 2.1. It introduces a cost functional that evaluates the quality of any complete plan.
A plan of length K is a sequence of K actions (u1, u2, …, uK) applied starting from xI. Applying them produces a state sequence: x1 = xI, and xk+1 = f(xk, uk) for k = 1, …, K. The final state xF = xK+1.
The cost functional L assigns a number to any plan:
Two components:
LaValle defines the terminal cost with a beautiful trick:
The power of Formulation 2.2 is that one algorithm (Dijkstra / A*) solves all these problems by swapping in a different l:
| l(x, u) | L minimizes | Algorithm used |
|---|---|---|
| l(x, u) ≡ 0 | Nothing — just reach the goal | BFS (any path) |
| l(x, u) ≡ 1 | Number of steps in the plan | BFS (fewest steps) |
| l(x, u) = Euclidean distance | Total path length in meters | Dijkstra / A* |
| l(x, u) = energy(u) | Total energy consumed | Dijkstra / A* |
| l(x, u) = time(x, u) | Total travel time | Dijkstra / A* |
| l(x, u) = danger(x) | Total danger exposure | Dijkstra / A* |
The cost functional L has a recursive structure that is the foundation of dynamic programming and value iteration. Notice: if you split the plan πK into a first step (x1, u1) and the remaining plan πK-1 from x2:
This is Bellman's principle of optimality: the tail of an optimal plan is itself optimal. If π* is optimal from xI to XG, then the sub-plan from x2 onward must be optimal from x2 to XG. Otherwise, we could replace the tail with a cheaper one, contradicting optimality of π*.
One underappreciated feature of Formulation 2.2 is that it provides a language for expressing what you want — separate from how you compute it. You write down l(x, u) and lF(x) to describe your objective. Then you hand it to Dijkstra (or A* or value iteration) and get the answer. The algorithm doesn't care what l means physically — fuel, time, money, danger — it just minimizes it.
This separation of concerns — "what to optimize" vs "how to optimize it" — is the hallmark of good mathematical formalism. It's why Formulation 2.2 is worth memorizing: once you've cast your problem into it, the solution method follows automatically.
What if you want to minimize both total time AND total energy? These are two different cost functionals Ltime and Lenergy. Optimizing both simultaneously is a multi-objective planning problem — beyond Formulation 2.2, which handles only one scalar objective. The standard approach: form a weighted combination L = w1Ltime + w2Lenergy and solve with Dijkstra. Different weights give different Pareto-optimal plans. LaValle covers multi-objective extensions in later chapters.
In Formulation 2.2, we optimize over plans of a fixed length K. For each fixed K, this is a finite optimization — tractable by dynamic programming. But we don't know K in advance. LaValle handles this in two ways: (a) take the minimum over all K, which leads to the backward induction / value iteration framework (Chapter 7 of this lesson), or (b) use Dijkstra/A* which implicitly searches over all lengths simultaneously via the cost-to-come.
Click cells to cycle their move cost: 1 → 2 → 3 → wall → 1. Numbers shown in cells. The optimal path (yellow) updates via Dijkstra as you change costs. L(π*) displayed below.
python from math import inf import heapq class WeightedGridProblem(GridPlanningProblem): """Formulation 2.2: adds edge costs l(x,u) and terminal cost l_F.""" def __init__(self, rows, cols, walls=None, cell_costs=None, x_I=(0,0), X_G=None): super().__init__(rows, cols, walls, x_I, X_G) # cell_costs: dict (i,j) → cost to ENTER that cell self.cell_costs = cell_costs or {} # Running cost l(x, u): cost to enter the state we move to def cost(self, x, u): xp = self.f(x, u) return self.cell_costs.get(xp, 1) # default cost 1 # Terminal cost l_F(x): 0 if goal, ∞ otherwise def l_F(self, x): return 0 if x in self.X_G else inf # Cost functional L(π_K) def L(self, plan): """Evaluate cost functional on a plan (sequence of states).""" if len(plan) < 2: return self.l_F(plan[0]) total = 0 for k in range(len(plan) - 1): u = (plan[k+1][0] - plan[k][0], plan[k+1][1] - plan[k][1]) total += self.cost(plan[k], u) # running cost total += self.l_F(plan[-1]) # terminal cost return total # Example: robot must avoid expensive muddy zones prob = WeightedGridProblem( rows=7, cols=7, cell_costs={(2,3):5, (3,3):5, (4,3):5}, # muddy corridor x_I=(0,0), X_G={(6,6)} ) path, cost = dijkstra(prob) print(f"Optimal cost: L(π*) = {cost}") print(f"L evaluated directly: {prob.L(path)}") # same number
Dijkstra computes exactly L(π*) — the minimum value of L over all plans reaching XG. When Dijkstra extracts a goal state g from Q with cost C*(g), that IS the minimum of L. The lF = ∞ for non-goal states means any plan ending outside XG has L = ∞ — Dijkstra would never output such a plan as optimal. Formulation 2.2 and Dijkstra are two sides of the same coin.
Choosing l(x, u) well is as much art as science. A few pitfalls to avoid:
Forward search (Dijkstra, A*) starts from xI and races toward XG. There is a completely different strategy: start from XG and compute, for every state simultaneously, the optimal cost to reach the goal. This is backward value iteration — LaValle's central algorithm for optimal planning under a fixed horizon.
The result is a function G*: X → ℝ ∪ {∞} called the optimal cost-to-go. Once you have G*, extracting the optimal plan from any starting state is trivial — one greedy step at a time. Computing G* is the hard part; using it is free.
Fix a horizon K (plan length). The optimal cost-to-go from stage k is:
Read this as: starting at state xk at time k, what is the minimum cost achievable over the remaining K−k+1 steps? The boundary condition anchors the recursion:
Starting from the definition of G*k(xk), we derive the recursion that makes computation possible. The derivation has four steps — follow each carefully.
States {a, b, c, d, e}. Horizon K = 4. Start xI = a. Goal XG = {d}. We compute G*5 through G*1 using the Bellman equation. The complete table (every cell computed from the recurrence):
| State | G*5 | G*4 | G*3 | G*2 | G*1 |
|---|---|---|---|---|---|
| a | ∞ | ∞ | 6 | 4 | 6 |
| b | ∞ | 4 | 2 | 6 | 4 |
| c | ∞ | 1 | ∞ | 3 | 5 |
| d | 0 | ∞ | 2 | ∞ | 4 |
| e | ∞ | ∞ | ∞ | ∞ | ∞ |
Walk through key cells:
python from math import inf def backward_value_iteration(prob, K): """ Backward value iteration (LaValle eq. 2.11). Returns G[k][x] = G*_k(x) for k = 1..K+1. prob.X_G, prob.states(), prob.actions(x), prob.f(x,u), prob.cost(x,u) """ states = list(prob.states()) # G[K+1] = boundary condition: l_F(x) G = [{x: (0 if x in prob.X_G else inf) for x in states}] # Backward induction: compute G_K, G_{K-1}, ..., G_1 for k in range(K, 0, -1): G_next = G[-1] # G*_{k+1} G_k = {} for x in states: best = inf for u in prob.actions(x): xp = prob.f(x, u) val = prob.cost(x, u) + G_next.get(xp, inf) if val < best: best = val G_k[x] = best G.append(G_k) G.reverse() return G # Extract optimal plan greedily from G* def extract_plan(prob, G, x_I, K): path, x = [x_I], x_I for k in range(K): best_u, best_val = None, inf for u in prob.actions(x): xp = prob.f(x, u) val = prob.cost(x, u) + G[k+1].get(xp, inf) if val < best_val: best_u, best_val = u, val if best_u is None: break x = prob.f(x, best_u) path.append(x) if x in prob.X_G: break return path
Each cell shows its current G* value. Click Step to run one backward iteration — watch the wavefront of finite values expand outward from the goal (red). Cells that can never reach the goal stay dark. After all steps, click Show Plan to highlight the optimal path from the green start.
Chapter 7's backward value iteration required knowing K in advance. But the warehouse robot doesn't know how many steps it will need — it just needs to reach the goal. Formulation 2.3 fixes this by adding a termination action uT: once applied, the robot stays put and accumulates no further cost.
This elegant addition converts the fixed-horizon problem into a variable-length one. Plans (u1, u2) and (u1, u2, uT, uT, …) become equivalent — the termination action pads any plan to any length without changing its cost. Now we can optimize over all plan lengths simultaneously.
With uT in the action space, run backward value iterations without stopping at K. Instead, run until the values stop changing:
When this holds, the iterations have converged. The common value is simply called G*(x) — the stationary optimal cost-to-go. It satisfies:
Notice: no subscript k. This is the fixed-point equation — G* is its own "next iteration."
| State | iter 0 | iter 1 | iter 2 | iter 3 | iter 4 (stable) |
|---|---|---|---|---|---|
| a | ∞ | ∞ | 6 | 4 | 4 |
| b | ∞ | 4 | 2 | 2 | 2 |
| c | ∞ | 1 | 1 | 1 | 1 |
| d | 0 | 0 | 0 | 0 | 0 |
| e | ∞ | ∞ | ∞ | ∞ | ∞ |
Between iterations 3 and 4, values stop changing — convergence reached. G*(a) = 4: three steps, not four. The extra flexibility of variable-length plans found a cheaper route than the fixed-horizon K=4 case.
Once G* is known, the optimal action at any state x is the greedy minimizer:
Trace from xI = a:
python from math import inf def stationary_value_iteration(prob, tol=1e-9, max_iter=10000): """Variable-length backward value iteration (Formulation 2.3). Runs until convergence. Returns G_star: dict {state: G*(state)}.""" states = list(prob.states()) G = {x: (0 if x in prob.X_G else inf) for x in states} for _ in range(max_iter): G_new, delta = {}, 0 for x in states: best = 0 if x in prob.X_G else inf for u in prob.actions(x): xp = prob.f(x, u) val = prob.cost(x, u) + G.get(xp, inf) if val < best: best = val G_new[x] = best if G[x] < inf and G_new[x] < inf: delta = max(delta, abs(G_new[x] - G[x])) G = G_new if delta <= tol: break return G def optimal_policy(prob, G_star): policy = {} for x in prob.states(): if x in prob.X_G or G_star[x] == inf: continue best_u, best_val = None, inf for u in prob.actions(x): xp = prob.f(x, u) val = prob.cost(x, u) + G_star.get(xp, inf) if val < best_val: best_u, best_val = u, val policy[x] = best_u return policy
Values propagate outward from the goal (red) each iteration. The bar chart on the right tracks max delta per iteration — convergence happens when it hits zero. Click Step to advance one iteration and watch cells stop changing.
Everything we've built — BFS, Dijkstra, A* — runs side by side on the same maze. All three start simultaneously. Watch the wavefronts diverge. Count the expanded nodes. The differences that were abstract in earlier chapters become viscerally clear when you see them in motion.
| Algorithm | Q sorted by | Optimal? | Requires | Complexity |
|---|---|---|---|---|
| BFS | FIFO (insertion order) | Yes (unit cost) | — | O(V+E) |
| Dijkstra | min C*(x) | Yes | Non-neg costs | O(V log V + E) |
| A* | min C*(x) + Ĝ*(x) | Yes | Admissible h | O(V log V + E) |
| Greedy best-first | min Ĝ*(x) only | No | — | O(V log V + E) |
| DFS | LIFO | No | — | O(V+E) |
There is a deep connection between backward value iteration (Chapter 7) and Dijkstra. Both compute optimal costs — but from opposite directions and with different data structures.
Dijkstra is essentially forward value iteration with a priority queue that ensures states are processed in order of increasing C*(x). When x is popped from the heap, C*(x) is final — exactly like a state whose G* has stabilized in backward VI. The three categories (unvisited / alive / dead) in forward search map directly to (∞ / tentative / converged) in value iteration.
BFS (left) · Dijkstra (center) · A* with Manhattan (right). All three search the same maze simultaneously. Orange = alive. Teal = dead. Yellow = final path. A* typically finishes with fewest expansions. Click New Maze to generate a different layout.
Discrete planning algorithms are polynomially efficient in the size of the state space |X|. The problem is that |X| grows exponentially in the number of variables describing the state. A robot with n joints, each discretized to D angles, has |X| = Dn states. This is the curse of dimensionality — the central challenge of motion planning.
| Algorithm | Time | Space | Notes |
|---|---|---|---|
| BFS | O(|V|+|E|) | O(|V|) | All reachable states in queue |
| DFS | O(|V|+|E|) | O(b·d) | d = depth, b = branching factor |
| Dijkstra | O((|V|+|E|) log |V|) | O(|V|) | log factor from heap ops |
| A* | O(bd) worst | O(bd) | O(d) best with perfect heuristic |
| Backward VI (fixed K) | O(K·|X|·|U|) | O(|X|) | Two arrays: current and next G* |
| Stationary VI | O(D·|X|·|U|) | O(|X|) | D = graph diameter |
| Problem | |X| | At 109 states/sec |
|---|---|---|
| 9×9 grid | 81 | < 1 µs |
| 6-DOF arm, 360 angles each | 3606 ≈ 2×1015 | ~23 days |
| Rubik's cube | 4.3×1019 | ~1.4 billion years |
| Chess (mid-game) | ~1047 | Heat death of universe |
The algorithms of Chapter 2 are theoretically clean but practically limited to small state spaces. Everything in Chapters 4–6 of LaValle (configuration spaces, sampling-based planning) is a response to this explosion — planning in continuous spaces without ever discretizing them fully.
| Concept | Formula |
|---|---|
| Transition function | xk+1 = f(xk, uk) |
| Cost functional | L = ∑ l(xk, uk) + lF(xF) |
| Terminal cost trick | lF(x) = 0 if x ∈ XG, else ∞ |
| Bellman equation | G*k(x) = minu{l(x,u) + G*k+1(f(x,u))} |
| Stationary fixed point | G*(x) = minu{l(x,u) + G*(f(x,u))} |
| Optimal policy | u*(x) = argminu{l(x,u) + G*(f(x,u))} |
| Admissibility | Ĝ*(x) ≤ G*(x) for all x |
| A* sort key | f(x) = C*(x) + Ĝ*(x) |
| Dijkstra update | C(x') ← min(C(x'), C*(x) + l(x,u)) |
Chapter 2 is the bedrock. Every algorithm in the rest of LaValle either directly extends these ideas or needs them as prerequisite. This chapter maps the road ahead and closes the loop on the full arc of discrete planning.
| Chapter | Extension of Ch. 2 | Key new idea |
|---|---|---|
| Ch. 4: Configuration Space | X becomes continuous C-space | Robot geometry → obstacle regions |
| Ch. 5: Sampling-Based | Graph search on random roadmaps | RRT, PRM — random exploration |
| Ch. 10: MDPs | f(x,u) becomes probabilistic | Stochastic transitions, expected cost |
| Ch. 11: POMDPs | State not fully observable | Belief space planning |
| Ch. 13: Differential Constraints | U(x) constrained by dynamics | Kinodynamic planning |
The Bellman equation G*(x) = minu{l(x,u) + G*(f(x,u))} is deterministic. Replace f(x,u) with a transition probability P(x'|x,u) and you get the MDP Bellman equation. Value iteration works almost identically — replace min with expected value. Reinforcement learning is value iteration when the transition function and cost are unknown: estimated from experience rather than given analytically.
| Lesson | Connection to Ch. 2 |
|---|---|
| MDP | Extends Formulation 2.1 with stochastic transitions and expected costs |
| RL Algorithms | Q-learning = value iteration with unknown model, estimated online |
| Bayes Filter | State estimation in Ch. 10 (POMDPs) requires belief-space tracking |