Ch 2: Discrete Planning — LaValle Planning Algorithms

Chapter 0: The Problem

A warehouse robot wakes up at the loading dock — call it cell (0,0). Its job: deliver a pallet to shelf #247, sitting at cell (8,8) on a 9×9 grid. Between them: rows of shelves, some aisles blocked by pallets stacked on the floor. The robot can move up, down, left, or right — one tile per step. It cannot pass through blocked cells.

The question sounds simple: how does the robot find a path? But three deeper questions hide inside it, and Chapter 2 answers all three. First: how do we represent this problem mathematically — precisely enough for a computer to reason about it? Second: how do we search for a path without looping forever or wasting exponential work? Third: when some routes are cheaper than others — slippery floors, detour penalties — how do we find the optimal path?

The canvas below is the warehouse. Click any gray cell to toggle a wall (blocked aisle). The green cell is the start; the red cell is the goal. Try building mazes. Notice which configurations trap the robot completely — no path exists at all. Detecting "no path" is just as important as finding a path: a robot that searches forever for an impossible route is useless in production.

Warehouse Grid — Place Obstacles

Click any gray cell to toggle a wall. Green = start (0,0). Red = goal (8,8). Every cell is a state. Every valid move is an action. This IS the state transition graph.

9×9 grid · 81 states · up to 4 actions per state

The grid IS a graph. Every cell is a node. Every valid move between adjacent, unblocked cells is a directed edge. Finding a path from start to goal = finding a path in this graph. All of Chapter 2's machinery is just making this intuition precise enough to implement correctly and to prove things about.

Why discrete planning is hard (even in theory)

The naive approach — enumerate all possible paths, compute their costs, return the minimum — is computationally catastrophic. In a 9×9 grid, the number of self-avoiding paths from corner to corner can be astronomical (tens of millions). We need smarter algorithms that avoid ever explicitly enumerating paths.

The key insight: instead of thinking about paths (exponentially many), think about states (only 81). There are only 81 possible positions the robot can be in. If we track which states we've visited and build up costs incrementally, the computation is polynomial in |X| — not exponential in path length. This is the fundamental reason graph search algorithms work.

Thinking in states, not paths. A maze with 81 cells might have 10⁸ distinct self-avoiding paths from start to goal. Enumerating all of them is hopeless. But there are only 81 states. If we visit each state at most once, the algorithm terminates in at most 81 steps. Tracking states (not paths) converts an exponential problem into a polynomial one.

One more thing worth noticing: the number of states explodes in harder problems. Our 9×9 grid has 81 states — trivial for any computer. A Rubik's cube has 4.3 × 10¹⁹ states. A robot arm with 6 joints, each discretized to 360 angles, has 360⁶ ≈ 2.2 × 10¹⁵ states. The algorithms we build need to work on spaces too large to enumerate explicitly — which is why the theory matters.

Three problems disguised as one

The warehouse scenario neatly illustrates the three sub-problems Chapter 2 addresses. Notice how each one builds on the previous:

Problem	Question	Answer algorithm
Feasibility	Does any path exist from start to goal?	BFS or DFS
Minimum steps	Which path uses the fewest moves?	BFS (wavefront guarantees it)
Minimum cost	Which path has lowest total cost (muddy/paved mix)?	Dijkstra or A*

These three problems share the same mathematical framework — Formulation 2.1 — but differ in what they optimize. A key lesson from LaValle: choosing the right algorithm starts with identifying which problem you actually have. Running Dijkstra when you want minimum steps is correct but wasteful. Running BFS when you want minimum cost gives wrong answers.

Chapter 2 also draws a boundary LaValle is careful to respect: these algorithms assume the world is deterministic and fully observable. The robot knows exactly where it is (full observability), and actions always have their intended effect (determinism). Relax either assumption — the robot's sensors are noisy, or the floor is slippery — and you're in Chapters 10–12 (probabilistic planning). The discrete planning of Chapter 2 is the foundation: master it, and the probabilistic extensions become natural generalizations.

If a 9×9 grid has no obstacles, approximately how many edges does its state transition graph have? (Interior cells have 4 neighbors; edge cells have 2–3.)

About 81 — one per cell About 288 — roughly 4 × 81 counting valid adjacencies (≈ 4N² for large N) About 6,561 — every pair of cells connected

Chapter 1: Formulation 2.1 — The Five Components

Before writing a single line of search code, LaValle insists on nailing the problem down precisely. Formulation 2.1 gives every discrete planning problem exactly five ingredients. Get these five right and the algorithm almost writes itself.

Symbol	Name	Meaning
X	State space	All possible situations the system can be in
U(x)	Action space	Actions available at state x (varies by state)
f(x, u)	Transition function	Next state when taking action u from state x
x_I	Initial state	Where we start
X_G	Goal set	The set of states we want to reach

Each component — with physical meaning

1. The state space X. For our 9×9 warehouse grid: X = {(i, j) : i, j ∈ {0, …, 8}, (i, j) not a wall}. With no walls, |X| = 81. But the concept scales brutally: a Rubik's cube has |X| = 43,252,003,274,489,856,000 states. A 19×19 Go board has more states than atoms in the observable universe. The state must capture all and only the information needed to determine what happens next.

2. The action space U(x). At most interior cells: U(x) = {up, down, left, right} = {(−1,0), (1,0), (0,−1), (0,1)}. But at a boundary or next to a wall, some moves leave X, so they're removed from U(x). This dependence on x is critical — it's why we write U(x) and not just U. At different states, different actions are legal.

3. The transition function f(x, u) = x + u. For our grid this is pure vector addition. Worked example: x = (3, 4), u = (0, 1) → f((3,4), (0,1)) = (3, 5). The robot moved one tile up. For more complex systems — robot arms, game boards, autonomous vehicles — f is much more involved, but the concept is identical: it maps (state, action) to next state.

4. The initial state x_I. The loading dock, x_I = (0, 0). This is where planning starts. In some problems, x_I is also a set (multiple possible starts), but Formulation 2.1 takes it as a single state.

5. The goal set X_G. For one target shelf: X_G = {(8, 8)}. But goals can be sets: any charging dock, any empty parking space, any configuration with the cargo delivered. Planning succeeds as soon as the robot reaches any x ∈ X_G, regardless of which goal state.

The state must include ALL relevant information — and NOTHING else. A robot planning in a Paris warehouse shouldn't encode whether the lights are on in Beijing. Irrelevant state bloat converts tractable problems into impossible ones. LaValle calls this the Markov property: f(x, u) must depend only on the current state x and action u — not on history. If your state doesn't satisfy this, you have the wrong state space.

The state transition graph

Once you have X, U(x), and f, you implicitly define a directed graph. Vertices are states in X. There is a directed edge from x to x' if and only if there exists some u ∈ U(x) such that f(x, u) = x'. This is the state transition graph, and it is the central object of Chapter 2.

G = (X, E) where (x, x') ∈ E ⇔ ∃ u ∈ U(x) : f(x, u) = x'

Planning = find a directed path in G from x_I to any vertex in X_G. That's the whole game. BFS, DFS, Dijkstra, A*, and value iteration are all just different strategies for searching this graph.

The transition graph is usually implicit

For a 9×9 grid, you could in principle build the full transition graph in memory: 81 vertices, ~288 edges, an adjacency list of ~1KB. But for the Rubik's cube with 4.3 × 10¹⁹ states, building the graph explicitly would require more RAM than exists on Earth. Instead, the graph is kept implicit — defined by the functions X, U(x), and f(x,u) rather than a stored data structure.

This is why Formulation 2.1 lists X, U(x), f rather than "an adjacency list." The functions are a compact description of a potentially enormous graph. The search algorithms — BFS, Dijkstra, A* — only generate the nodes they need. A* on the Rubik's cube might explore a few million nodes out of 4.3 × 10¹⁹ before finding the solution. The rest of the graph is never instantiated.

Implicit vs explicit graphs. An explicit graph (adjacency list/matrix) stores all nodes and edges up front. An implicit graph is described by (X, U(x), f) and nodes are generated on demand. All real planning problems use implicit graphs — the state space is too large for explicit storage. This is why Formulation 2.1 is stated the way it is.

State Transition Graph — 5×5 Grid

A 5×5 grid with the transition graph overlaid. Arrows show valid moves (directed edges). Click a cell to place a wall — watch its edges disappear and edges pointing to it vanish too.

Click a cell to place/remove a wall

Enumerating all edges of the transition graph

For a 3×3 grid with no walls, let's explicitly list the state transition graph. This helps build intuition before we move to larger spaces:

python
# Build the explicit state transition graph for a 3×3 grid
prob = GridPlanningProblem(rows=3, cols=3)

edges = []
for x in prob.states():
    for u in prob.actions(x):
        xp = prob.f(x, u)
        edges.append((x, xp))

print(f"|V| = {len(prob.states())}")   # 9 states
print(f"|E| = {len(edges)}")           # 24 directed edges

# Show adjacency for (1,1) — the center cell
center = (1, 1)
neighbors = [prob.f(center, u) for u in prob.actions(center)]
print(f"Neighbors of {center}: {neighbors}")
# → [(0,1), (2,1), (1,0), (1,2)] — all 4 directions, center has them all

corner = (0, 0)
neighbors_corner = [prob.f(corner, u) for u in prob.actions(corner)]
print(f"Neighbors of {corner}: {neighbors_corner}")
# → [(1,0), (0,1)] — only 2 directions; corner has fewer options

The center cell (1,1) has 4 edges. Corner cells have 2 edges. Edge cells have 3 edges. For N×N: total edges ≈ 2×N×(N−1)×2 = 4N(N−1). For N=9: 4×9×8 = 288 edges — matching our earlier estimate.

Worked example: f in action

f((5, 3), (−1, 0)) = ? Add component-wise: (5 + (−1), 3 + 0) = (4, 3). The robot moved left one tile, from column 5 to column 4, same row 3. This is the transition function doing its one job.

An important subtlety: f is only applied when u ∈ U(x). Before calling f, the algorithm checks "is this action legal?" If (5,3) is adjacent to a wall on the left, then (−1,0) ∉ U((5,3)) and f is never called. The state space definition handles this implicitly by only including valid actions in U(x).

python
class GridPlanningProblem:
    """Formulation 2.1 for a rectangular 2D grid."""

    def __init__(self, rows, cols, walls=None, x_I=(0,0), X_G=None):
        self.rows  = rows
        self.cols  = cols
        self.walls = walls or set()   # set of (i,j) tuples
        self.x_I   = x_I
        self.X_G   = X_G or {(rows-1, cols-1)}

    # State space X — all non-wall cells (implicit)
    def in_bounds(self, x):
        i, j = x
        return 0 <= i < self.rows and 0 <= j < self.cols

    # Action space U(x) — only moves that stay in X
    MOVES = [(-1,0), (1,0), (0,-1), (0,1)]  # up, down, left, right

    def actions(self, x):
        i, j = x
        return [u for u in self.MOVES
                if self.in_bounds((i+u[0], j+u[1]))
                and (i+u[0], j+u[1]) not in self.walls]

    # Transition function f(x, u) = x + u (vector addition)
    def f(self, x, u):
        return (x[0] + u[0], x[1] + u[1])

    # Edge cost l(x, u) — uniform for now, overridden later
    def cost(self, x, u):
        return 1

# Example usage — Formulation 2.1 for a 9×9 warehouse grid
prob = GridPlanningProblem(
    rows=9, cols=9,
    walls={(3,3),(3,4),(4,3),(5,5),(6,4)},
    x_I=(0,0),
    X_G={(8,8)}
)

# Check: what actions are available at (3,2)?
print("U((3,2)):", prob.actions((3,2)))
# → [(-1,0), (1,0), (0,-1)]  — (0,1) blocked by wall at (3,3)

# Apply transition: f((3,2), (-1,0))
print("f((3,2),(-1,0)):", prob.f((3,2), (-1,0)))
# → (2, 2)  — moved up

The Markov property — why state design is hard

The Markov property says: everything you need to know about the future is encoded in the current state x. History doesn't matter — only where you are right now. For our grid robot, position (i, j) satisfies this: knowing you're at (3, 4) tells you exactly what moves are available and where they lead, regardless of how you got there.

But Markov breaks down in surprising ways. Consider a robot that needs to carry a box from shelf A to dock B. Just knowing the robot's position isn't enough — the state must also include whether it's currently holding the box. Fail to include "holding" in x, and f(x, u) is undefined (does "move right" take the box along or leave it behind?). The fix: x = (robot_position, is_holding). State space doubles, but now f is well-defined.

The Markov property is a design constraint on your state space. If two scenarios look identical in your state representation but lead to different futures, your state is underdefined. Keep adding information to x until f is a deterministic function of (x, u) alone — then you have a valid Formulation 2.1.

Graphs derived from Formulation 2.1

Formulation 2.1 can represent a surprising variety of problems beyond navigation. The key is choosing X, U(x), and f correctly:

Problem	State x	Action u	f(x, u)
Grid navigation	(row, col)	(Δrow, Δcol)	x + u
Puzzle (8-tile)	Tile configuration	Slide a tile	New configuration
Rubik's cube	Face permutation	Rotate a face	New permutation
Pick-and-place robot	(position, holding?)	Move/grab/release	Updated (pos, holding)
Turn-based game	Board position + turn	Legal move	New board + other turn

Once you've cast a problem into Formulation 2.1, the same search algorithms — BFS, Dijkstra, A* — apply without modification. This universality is the deepest insight of Chapter 2.

State space size — a reality check

The state space size |X| is the single most important factor in planning difficulty. Here's how different problems compare:

Problem	\|X\| (states)	Tractable?
9×9 warehouse grid	81	Instant
100×100 city grid	10,000	Milliseconds
10×10×10 3D maze	1,000	Instant
Rubik's cube	4.3 × 10¹⁹	Only with heuristics
15-tile sliding puzzle	10¹³	Only with A* + pattern DBs
Chess (mid-game positions)	~10⁴⁷	Requires game-specific methods

Discrete planning is only feasible when |X| is manageable — or when heuristics focus the search on a small fraction of X. The choice of state representation is therefore crucial: a bad representation inflates |X| unnecessarily, while a clever one keeps it tight.

In Formulation 2.1, f(x, u) = x + u (vector addition). What is f((5, 3), (−1, 0))?

(5, 3) — no change; u = (−1, 0) cancels out (4, 3) — moved left by one row (row index decreases) (6, 3) — moved right by one row (row index increases)

Chapter 2: Forward Search — The Master Algorithm

We have a state space, a transition function, and a goal. Now: how do we find the path? LaValle's Figure 2.4 presents a single forward search algorithm that subsumes BFS, DFS, Dijkstra, and A* as special cases. The only difference between them is how one data structure — the priority queue Q — is sorted. Learn this one algorithm, understand all four.

Three categories for every state

During the search, every state in X belongs to exactly one of three categories. Understanding these categories is key to understanding why the algorithm is correct.

Category	Meaning	Where it lives
Unvisited	Never encountered. The algorithm doesn't know it exists yet.	Not in Q, not in visited set
Alive	Discovered — added to Q — but not yet fully processed. May have unvisited neighbors.	In Q
Dead	Fully processed. Every neighbor has been discovered. Will never be revisited.	Removed from Q, in visited set

The algorithm starts with only x_I alive. Every other state is unvisited. The algorithm ends when either (a) a goal state is removed from Q — SUCCESS — or (b) Q is empty and no goal was found — FAILURE.

The algorithm (LaValle Figure 2.4)

1. Initialize

Mark x_I as visited. Insert x_I into Q. Set parent[x_I] = null.

↓

2. Is Q empty?

If yes → FAILURE (no path from x_I to X_G exists). Return null.

↓

3. x ← GetFirst(Q)

Remove the "best" element from Q. The definition of "best" is what distinguishes BFS / DFS / Dijkstra / A*.

↓

4. Is x ∈ X_G?

If yes → SUCCESS. Trace parent pointers from x back to x_I for the full path.

↓

5. Expand x

For each u ∈ U(x): compute x' = f(x, u). If x' is unvisited: mark visited, set parent[x'] = x, insert x' into Q.

↻ back to step 2

Parent pointers are how you recover the actual plan. The loop above determines whether a path exists. To find the path itself, record parent[x'] = x each time you discover x' from x. At SUCCESS: trace backwards — goal → parent[goal] → parent[parent[goal]] → … → x_I. Reverse that list and you have the action sequence.

What "systematic" means — and why it matters

LaValle uses the word systematic precisely. On a finite graph: systematic means eventually visiting every state reachable from x_I. On an infinite graph: systematic means finding a solution in finite time, if one of finite length exists. The visited-state check at step 5 is exactly what makes the algorithm systematic — without it, cycles would cause infinite loops.

Being systematic is a guarantee, not a hope. A systematic algorithm can tell you definitively "no path exists" — because it will have exhausted every reachable state. A non-systematic algorithm that loops on cycles can never safely report failure. This is why the visited-state check is non-negotiable in production planners.

Completeness guarantee: The forward search algorithm with the visited-state check is complete on finite graphs — it finds a solution if one exists, and reports failure if none exists. This is the first correctness property we demand of any planning algorithm.

Forward search vs backward search — a preview

Everything so far searches forward from x_I toward X_G. LaValle also discusses backward search: start from X_G and search backward toward x_I. At each step, instead of asking "which states can I reach from x?", ask "which states can reach x?"

Backward search uses a backward action space U⁻¹(x): the set of actions that could lead into state x. For our grid, backward actions are the same as forward actions (the grid is undirected), but in general problems (one-way doors, asymmetric transitions), they differ. Chapter 8 of this lesson covers backward search in detail.

The payoff: bidirectional search runs both simultaneously, meeting in the middle. Theoretical complexity drops from O(b^d) to O(b^d/2) — an exponential improvement. On large graphs, bidirectional BFS can be 1000× faster than forward BFS alone.

Forward Search Step-Through

Watch BFS (FIFO queue) explore a 7×7 grid. Orange = alive (in Q). Teal = dead (processed). Gray = unvisited. Path shown in yellow when goal is reached.

Ready — click Step or Run

Concrete trace on the warehouse grid

Let's trace forward search (BFS) on a tiny 3×3 grid, no walls. Start x_I = (0,0), goal X_G = {(2,2)}. Neighbors explored in order: right, down (so (i,j+1) before (i+1,j)).

Iter	x extracted	Q before	States discovered	Q after
Init	—	—	x_I=(0,0)	[(0,0)]
1	(0,0)	[(0,0)]	(0,1),(1,0)	[(0,1),(1,0)]
2	(0,1)	[(0,1),(1,0)]	(0,2),(1,1)	[(1,0),(0,2),(1,1)]
3	(1,0)	[(1,0),(0,2),(1,1)]	(2,0),(1,1)→skip	[(0,2),(1,1),(2,0)]
4	(0,2)	[(0,2),(1,1),(2,0)]	(1,2)	[(1,1),(2,0),(1,2)]
5	(1,1)	[(1,1),(2,0),(1,2)]	(2,1),(1,2)→skip	[(2,0),(1,2),(2,1)]
6	(2,0)	[(2,0),(1,2),(2,1)]	(2,1)→skip	[(1,2),(2,1)]
7	(1,2)	[(1,2),(2,1)]	(2,2)	[(2,1),(2,2)]
8	(2,1)	[(2,1),(2,2)]	—	[(2,2)]
9	(2,2)	[(2,2)]	GOAL! Return	—

Parent pointers: parent[(0,1)]=(0,0), parent[(1,0)]=(0,0), parent[(0,2)]=(0,1), parent[(1,1)]=(0,1), parent[(2,0)]=(1,0), parent[(1,2)]=(0,2), parent[(2,1)]=(1,1), parent[(2,2)]=(1,2).

Trace back: (2,2) → parent=(1,2) → parent=(0,2) → parent=(0,1) → parent=(0,0). Reversed: (0,0)→(0,1)→(0,2)→(1,2)→(2,2). A 4-step path — the optimal for a 3×3 grid with no walls.

Recovering the path — parent pointer trace

The algorithm tells you the path exists; parent pointers tell you what it is. The trick: every time you discover a new state x' from x, record parent[x'] = x. At the end, you have a chain from goal back to start. Trace it:

goal → parent[goal] → parent[parent[goal]] → … → x_I

Reverse this list and you have the forward path. The action at each step is u_k = x_k+1 − x_k (for grid problems). The plan length K equals the path length minus 1.

A subtle point: parent pointers implicitly define a shortest-path tree rooted at x_I. Every state in the tree has a unique parent — the state from which it was first discovered. This tree covers all reachable states, not just the goal. If you need paths to multiple goals, BFS builds this tree for free.

python
from collections import deque

def forward_search(prob, mode='bfs'):
    """
    Generic forward search (LaValle Figure 2.4).
    mode='bfs'  → FIFO queue (breadth-first, shortest path)
    mode='dfs'  → LIFO stack (depth-first, any path)
    """
    x_I = prob.x_I
    X_G = prob.X_G

    visited = {x_I}
    parent  = {x_I: None}

    if mode == 'bfs':
        Q = deque([x_I])
        def get_first(): return Q.popleft()  # FIFO
        def insert(x):  Q.append(x)
    else:
        Q = [x_I]
        def get_first(): return Q.pop()    # LIFO
        def insert(x):  Q.append(x)

    while Q:                           # step 2: is Q empty?
        x = get_first()              # step 3: GetFirst

        if x in X_G:                  # step 4: goal check
            path = []
            cur  = x
            while cur is not None:
                path.append(cur)
                cur = parent[cur]
            return list(reversed(path))  # path from x_I to goal

        for u in prob.actions(x):    # step 5: expand
            xp = prob.f(x, u)
            if xp not in visited:
                visited.add(xp)
                parent[xp] = x
                insert(xp)

    return None  # FAILURE — no path exists

Putting it together — full demo on the warehouse grid

python
# Complete end-to-end planning on the 9×9 warehouse grid
prob = GridPlanningProblem(
    rows=9, cols=9,
    walls={(3,3),(3,4),(3,5),(4,3),(5,3),(5,4),(5,5)},  # U-shaped obstacle
    x_I=(0,0),
    X_G={(8,8)}
)

# Run BFS (forward search with FIFO)
path = forward_search(prob, mode='bfs')

if path:
    print(f"Path found! Length = {len(path)-1} steps")
    print(f"Path: {' → '.join(str(s) for s in path)}")
    # Reconstruct actions from path
    actions = [(path[k+1][0]-path[k][0],
                path[k+1][1]-path[k][1])
               for k in range(len(path)-1)]
    direction = {(-1,0):'↑',(1,0):'↓',(0,-1):'←',(0,1):'→'}
    print(f"Actions: {''.join(direction[a] for a in actions)}")
else:
    print("No path exists — goal is unreachable")

In the forward search algorithm, what happens if you remove the "if x' not visited" check on a graph with a cycle A → B → A?

The algorithm loops forever: A is inserted, then B, then A again, then B again, … The algorithm finds a shorter path by exploring cycles The algorithm terminates immediately with FAILURE

Chapter 3: BFS vs DFS — Same Algorithm, Different Queue

Here is the key insight that makes the master algorithm so powerful: BFS and DFS are the same algorithm. Change one data structure — the order of Q — and the entire character of the search transforms. Understanding why this is true, and what the consequences are, is the subject of this chapter.

BFS — FIFO Queue

States discovered earliest are processed first. The effect: the search expands in concentric wavefronts. All states reachable in 1 step from x_I are processed before any state reachable in 2 steps. All 2-step states before 3-step states. And so on.

Guarantees shortest path (fewest steps)
Systematic on finite AND infinite graphs
Time: O(|V| + |E|)
Memory: O(|V|) — stores entire active wavefront

DFS — LIFO Stack

Most recently discovered state is processed first. The effect: the search plunges deep — it follows one direction as far as possible before backtracking to try another. Like exploring a maze by always turning left.

Does NOT guarantee shortest path
Systematic only on finite graphs
Time: O(|V| + |E|)
Memory: O(b·d) — one path + backtrack info

Why DFS fails on infinite graphs (LaValle Figure 2.3a). Imagine an infinite grid. DFS might choose to dive right: (0,0) → (1,0) → (2,0) → (3,0) → … and never return. If the goal is one step up at (0,1), DFS will never find it — it explores an infinite branch first. BFS is immune: it visits ALL neighbors at distance k before moving to distance k+1. Since the goal is at distance 1, BFS finds it on the very first expansion.

When DFS is preferred in practice

Despite its drawbacks, DFS is preferred in several scenarios:

Memory-constrained environments. Embedded systems with limited RAM can't store the BFS frontier. DFS uses O(bd) memory — a single path. For a depth-100 tree with branching factor 4, DFS uses ~400 pointers while BFS uses 4¹⁰⁰.
Finding ANY solution fast. In some problems, any valid plan is acceptable — optimality is not required. DFS often finds a solution quickly by plunging deep, while BFS systematically explores the entire frontier before reaching goal depth.
Cycle detection in general graphs. DFS's recursive structure naturally supports detecting back edges (cycles) — useful for topological sort, strongly connected components, and dependency resolution. These algorithms don't have BFS variants.
Satisficing planning. In robotics, a suboptimal path that is found quickly and executed immediately may be better than waiting for the optimal path. DFS (or greedy best-first) often provides rapid responses for reactive planning.

LaValle's perspective: DFS is not used in the primary planning algorithms of Chapter 2 — BFS, Dijkstra, and A* are preferred for their optimality and completeness guarantees. DFS appears mainly as a theoretical foil: understanding why DFS fails on infinite graphs clarifies exactly what BFS provides. The failure case is instructive.

Why BFS guarantees shortest path (the proof)

It's worth understanding why BFS gives the shortest path, not just that it does. The argument is short: BFS processes all states at distance d before any state at distance d+1. So when BFS first reaches the goal, it has explored all paths of length ≤ d — and found none that work (else it would have stopped). Therefore the d-step path it found must be optimal.

DFS doesn't have this property because it might find a long path — say d = 20 — even though a 3-step path exists. DFS just happens not to have explored that 3-step path yet when it found the 20-step one.

BFS optimality theorem: If all edge costs are equal (unit cost), BFS finds the path with the minimum number of edges. The proof is by induction on the wavefront depth: BFS visits all depth-k states before depth-(k+1) states, so the first time it reaches the goal, no shorter path could exist.

Concrete trace: 5×5 grid with one obstacle

Start (0,0), Goal (4,4), wall at (2,2). Neighbors explored in priority order: down, right, up, left.

Step	BFS — Queue [FIFO]	DFS — Stack [LIFO]
Init	Q = [(0,0)]	S = [(0,0)]
1	Pop (0,0); push (1,0),(0,1). Q=[(1,0),(0,1)]	Pop (0,0); push (0,1),(1,0). S=[(0,1),(1,0)]
2	Pop (1,0); push (2,0),(1,1). Q=[(0,1),(2,0),(1,1)]	Pop (1,0); push (1,1),(2,0). S=[(0,1),(1,1),(2,0)]
3	Pop (0,1); push (0,2). Q=[(2,0),(1,1),(0,2)]	Pop (2,0); push (3,0). S=[(0,1),(1,1),(3,0)]
4	Pop (2,0); push (3,0). Q=[(1,1),(0,2),(3,0)]	Pop (3,0); push (4,0). S=[(0,1),(1,1),(4,0)]
…	Wavefront expands uniformly outward	Dives: (4,0)→(4,1)→(4,2)→(4,3)→(4,4) GOAL
Final	8-step optimal path	10-step non-optimal path

BFS finds the 8-step shortest path. DFS finds a valid but longer 10-step path — it dived south then east along the bottom edge rather than taking the diagonal shortcut. Both are correct plans; only BFS is guaranteed to be the shortest.

Memory tradeoff matters in practice. BFS uses O(b^d) memory where b = branching factor and d = depth to goal. DFS uses O(b·d). For a 1000×1000 grid where goal depth is 500 steps: BFS might store ~4×500 = 2,000 alive states in its frontier. DFS stores ~500 states (one path). For very deep searches in large spaces, DFS's memory efficiency is decisive.

BFS vs DFS — Side by Side

Same 7×7 grid, same start and goal. Left: BFS expands in uniform waves. Right: DFS dives aggressively. Orange = alive. Teal = dead. Counters show nodes expanded so far.

Ready — click Step Both or Run

Iterative Deepening DFS (IDDFS) — the best of both worlds

A natural question: can we get BFS's optimality guarantee with DFS's memory efficiency? Yes — Iterative Deepening Depth-First Search (IDDFS) does exactly this. It runs DFS with a depth limit d = 1, 2, 3, … until the goal is found. At each iteration, any path longer than the current limit is cut off.

IDDFS is optimal (finds minimum-step path), complete, and uses only O(bd) memory. The cost: it re-expands states on each iteration. But the work is dominated by the final iteration, so the total extra work is only a constant factor (about b/(b−1)). For b=4 (grid), IDDFS does 33% more work than BFS while using a fraction of the memory.

Algorithm	Optimal?	Complete?	Time	Space
BFS	Yes (unit cost)	Yes (finite+infinite)	O(b^d)	O(b^d)
DFS	No	Yes (finite only)	O(b^m) m=max depth	O(bm)
IDDFS	Yes (unit cost)	Yes (finite+infinite)	O(b^d)	O(bd)
Dijkstra	Yes (weighted)	Yes (non-neg costs)	O((V+E)log V)	O(V)
A*	Yes (admissible h)	Yes (non-neg costs)	O(b^d) worst	O(b^d)

python
def iddfs(prob):
    """Iterative deepening DFS — optimal, complete, O(bd) memory."""
    def dls(x, goal, depth, path, visited):
        """Depth-limited search."""
        if x in goal: return path[:]  # found!
        if depth == 0: return None      # depth limit hit
        for u in prob.actions(x):
            xp = prob.f(x, u)
            if xp not in visited:
                visited.add(xp)
                path.append(xp)
                result = dls(xp, goal, depth-1, path, visited)
                if result is not None: return result
                path.pop()
                visited.discard(xp)
        return None

    for d in range(1, 10000):       # depth limit grows: 1, 2, 3, ...
        result = dls(
            prob.x_I, prob.X_G, d,
            path=[prob.x_I],
            visited={prob.x_I}
        )
        if result is not None: return result
    return None

A robot searches a 100×100 grid. Both BFS and DFS find the goal. Which algorithm's resulting plan has fewer steps?

BFS — it always finds the minimum-step plan because of wavefront expansion DFS — it dives directly toward the goal so its path is shorter They always produce the same plan length

Chapter 4: Dijkstra's Algorithm — Optimal Search

BFS finds the path with fewest steps. But what if steps have different costs? Crossing a muddy aisle costs 3 time units; crossing a paved aisle costs 1. Two 8-step paths might cost 12 and 32 respectively. BFS cannot distinguish them. Dijkstra's algorithm sorts Q by cost instead of order, finding the minimum-cost path.

Dijkstra = forward search where Q is a min-heap sorted by cost-to-come C(x). The cost-to-come C(x) is the total cost of the cheapest path found so far from x_I to x. When x is extracted from Q (the global minimum), its cost becomes proven optimal — written C*(x).

The update rule

C*(x_I) = 0

C(x') ← min( C(x'), C*(x) + l(x, u) ) when discovering x' via x

Here l(x, u) is the edge cost — the cost of taking action u at state x. When x' is first discovered, C(x') = C*(x) + l(x, u). If x' is later discovered again via a different, cheaper path, C(x') is updated downward (decrease-key in the heap).

Why it's optimal — the inductive argument

The proof is elegant and worth knowing. Claim: when a state x is extracted from Q (minimum C element), C(x) = C*(x). Proof by induction:

Base case: x_I is extracted first with C*(x_I) = 0. Any path to x_I has non-negative cost, so 0 is optimal. ✓
Inductive step: Assume all previously extracted states are optimal. State x is extracted with cost C(x). Any cheaper path to x would have to pass through some alive state y first. But C(y) ≥ C(x) (since x was extracted as the minimum). With non-negative edge costs, extending the path from y further can only increase its cost. So no cheaper path to x can exist. ✓

Dijkstra requires all edge costs to be non-negative. If l(x, u) can be negative, the inductive proof breaks. A state might be extracted "optimally" with cost 5, but a later path through a cost-7 alive state plus a −3 edge gives cost 4 — better. Dijkstra would have already committed to 5. For negative edges, use Bellman-Ford (O(|V|·|E|), much slower) or restructure the problem to eliminate negatives.

Real application: road network routing

Dijkstra is the backbone of GPS routing. A city road network has intersections as states and roads as edges, with travel time as edge cost. Hundreds of millions of intersections — far too large to enumerate explicitly. In practice, Google Maps and similar systems use Dijkstra with two key optimizations: (1) bidirectional Dijkstra runs two simultaneous searches (forward from source, backward from destination) that meet in the middle, reducing search to roughly √|V| nodes each; (2) highway hierarchies precompute long-distance routes so queries stay local. Both are Formulation 2.2 under the hood.

The Open Source Routing Machine (OSRM), used by many navigation apps, uses Contraction Hierarchies: a preprocessing step that ranks nodes by importance and adds "shortcut" edges (if the fastest A→B→C path is known, add direct A→C shortcut). After preprocessing, Dijkstra on the contracted graph answers continent-scale queries in under a millisecond. The core algorithm is still Dijkstra; the speedup comes from a smarter graph representation.

Dijkstra vs BFS — when does it matter?

Dijkstra reduces to BFS when all edge costs are equal (l(x,u) = constant). In that case, cost-to-come is just the number of steps, so the min-heap ordering is identical to FIFO ordering. Running Dijkstra on unit-cost graphs is correct but slower than BFS (heap operations are O(log n) vs O(1) queue operations). Choose BFS when costs are uniform; choose Dijkstra when they're not.

Worked trace: 4×4 weighted grid

Horizontal edges cost 1. Vertical edges cost 2. Start (0,0), Goal (3,3). We process states in cost order:

Extract	C*(x)	Updates to neighbors
(0,0)	0	(0,1)→C=1, (1,0)→C=2
(0,1)	1	(0,2)→C=2, (1,1)→C=3
(1,0)	2	(1,1)→min(3,4)=3 no update, (2,0)→C=4
(0,2)	2	(0,3)→C=3, (1,2)→C=4
(1,1)	3	(1,2)→min(4,5)=4, (2,1)→C=5
(0,3)	3	(1,3)→C=5
…	…	Eventually (3,3) extracted with C*=6

The minimum-cost path goes right three times then down three times: (0,0)→(0,1)→(0,2)→(0,3)→(1,3)→(2,3)→(3,3). Cost = 3 horizontal moves × 1 + 3 vertical moves × 2 = 3 + 6 = 9. Any other path to (3,3) would mix horizontal/vertical differently — but since horizontal is cheaper, the optimal strategy is to exhaust horizontal movement first. Dijkstra discovers this automatically.

Contrast with BFS: it would find a path with 6 steps (the minimum step count), but that path might mix horizontal and vertical moves — say 3+3 — costing the same 9. Or it might find a different 6-step path that cost 11. BFS cannot distinguish; Dijkstra can.

Dijkstra Step-Through — Weighted Grid

Each cell shows its current C(x). Orange = alive (in heap). Teal = dead (settled). The min-heap extracts the cheapest alive state each step. Path shown in yellow at the end.

Ready. Darker cells = higher move cost (1, 2, or 3).

python
import heapq
from math import inf

def dijkstra(prob):
    """
    Dijkstra's algorithm. Returns (path, cost) or (None, inf).
    prob.cost(x, u) must return non-negative edge cost l(x, u).
    """
    x_I = prob.x_I
    X_G = prob.X_G

    C      = {x_I: 0}       # cost-to-come estimates
    parent = {x_I: None}   # for path reconstruction
    heap   = [(0, x_I)]      # min-heap: (cost, state)

    while heap:
        c, x = heapq.heappop(heap)

        # Skip stale entries (x was already settled with lower cost)
        if c > C.get(x, inf):
            continue

        # Goal check — C*(x) is now proven optimal
        if x in X_G:
            path = []
            cur  = x
            while cur is not None:
                path.append(cur)
                cur = parent[cur]
            return list(reversed(path)), c

        # Expand x
        for u in prob.actions(x):
            xp = prob.f(x, u)
            new_c = c + prob.cost(x, u)   # C*(x) + l(x,u)
            if new_c < C.get(xp, inf):
                C[xp]      = new_c
                parent[xp] = x
                heapq.heappush(heap, (new_c, xp))

    return None, inf  # FAILURE — no path exists

Dijkstra's complexity

With a binary heap: O((|V| + |E|) log |V|). With a Fibonacci heap (more complex implementation): O(|V| log |V| + |E|) — faster for dense graphs. For our 9×9 grid: |V| = 81, |E| ≈ 288, so Dijkstra finishes in microseconds. For a city map with 10⁶ intersections and 2.5×10⁶ roads, it takes a few hundred milliseconds — still practical.

Data structure for Q	Complexity	Best for
Binary heap (Python heapq)	O((\|V\| + \|E\|) log \|V\|)	Sparse graphs (most planning)
Fibonacci heap	O(\|V\| log \|V\| + \|E\|)	Dense graphs, large \|E\|/\|V\|
Array (unsorted)	O(\|V\|²)	Very dense graphs, small \|V\|

Dijkstra runs on a graph where one edge has cost −1; all others cost +1. Can the algorithm still guarantee finding the optimal path?

Yes — one negative edge is fine; the others compensate No — negative edge costs break the inductive optimality proof Only if the negative edge doesn't lie on the optimal path

Chapter 5: A* Search — Heuristic Guidance

Dijkstra is optimal but blind. It expands states in all directions equally — even away from the goal. If you're navigating from New York to Los Angeles, Dijkstra spends time exploring Boston. A* search adds a compass: a heuristic estimate of the remaining cost, so the algorithm spends time expanding states that look promising.

A* sorts Q by:

f(x) = C*(x) + &Ĝ;*(x)

where C*(x) is the proven cost-to-come (same as Dijkstra) and Ĝ*(x) is a heuristic estimate of cost-to-go — how much it costs to reach the goal from x. We can't compute the true G*(x) exactly (that's the whole problem!), so we estimate it with a function we can evaluate cheaply.

Admissibility — the critical requirement

A heuristic is admissible if it never overestimates the true cost-to-go:

&Ĝ;*(x) ≤ G*(x) for all x ∈ X

If Ĝ* is admissible, A* is guaranteed to find the optimal path. The proof uses the same inductive argument as Dijkstra, extended: when a state x is extracted with minimum f = C*(x) + Ĝ*(x), any cheaper path would require a node y on the frontier with f(y) ≤ f(x) — but x was the minimum, so no such y exists.

Manhattan distance — the canonical grid heuristic

For our warehouse grid with unit costs and goal g = (g_i, g_j):

&Ĝ;*(x) = |x_i − g_i| + |x_j − g_j|

This is admissible because the robot must take at least this many steps — it can't teleport, and grid movement is along axes only. Manhattan distance is exact when there are no walls (every step makes optimal progress). With walls, the true cost is larger, so the estimate remains a lower bound — admissible by construction.

Precomputed heuristics — pattern databases

For Rubik's cube and sliding puzzles, Manhattan distance is too weak — it ignores interactions between tiles. The state-of-the-art uses pattern databases: precompute the exact optimal cost to solve a sub-problem (e.g., just the corner cubie configuration) for every possible configuration of that sub-problem. Store this in a table. At runtime, look up the table value as Ĝ*. This is admissible (solving just corners is easier than the full cube) and extremely tight — usually within 2× of the true G*.

For our warehouse grid, we don't need this — Manhattan distance is tight and free to compute. But the pattern database idea illustrates a general principle: any precomputed lower bound on G*(x) is an admissible heuristic.

Designing admissible heuristics

A useful recipe for building admissible heuristics: solve a relaxed version of the problem. Remove constraints from the original problem, solve the easier version, and use that optimal cost as Ĝ*. Since the relaxed problem is easier, its optimal cost ≤ the original's — admissibility by construction.

Original problem	Relaxation	Resulting heuristic
Grid with walls	Remove all walls	Manhattan distance
Grid with walls	Allow diagonal moves	Chebyshev distance = max(\|Δr\|, \|Δc\|)
Grid with walls	Allow teleportation	h = 0 (admissible but useless)
Rubik's cube	Solve each face independently	Pattern database heuristic

Tighter relaxations give better heuristics. Manhattan distance (remove walls, keep grid) is tighter than Euclidean distance (allow straight-line movement through walls). Both admissible; Manhattan gives fewer expanded nodes because it's closer to the true G*.

A spectrum of algorithms

Sort key for Q	Algorithm	Optimal?	Behavior
insertion order	BFS (unit costs)	Yes (fewest steps)	Uniform wavefront
C*(x)	Dijkstra	Yes (minimum cost)	Uniform cost wavefront
C(x) + Ĝ(x), admissible	A*	Yes (minimum cost)	Directed toward goal
Ĝ*(x) only	Greedy best-first	No	Fastest but can get stuck
C(x) + Ĝ(x), inadmissible	Weighted A*	No (bounded suboptimal)	Faster but potentially suboptimal

The efficiency win is dramatic. On a large grid labyrinth, Dijkstra might expand 5,000 cells before finding the goal. A* with Manhattan distance might expand only 200 — a 25× speedup — finding the same optimal path. The heuristic doesn't change correctness; it redirects effort toward cells that are actually on or near the optimal path.

What admissibility buys you — exactly

If Ĝ*(x) = 0 for all x → A* degenerates to Dijkstra (the heuristic provides no guidance). If Ĝ*(x) = G*(x) exactly → A* expands only nodes on the optimal path (perfect, oracle heuristic — impossible in general). Admissibility is the sweet spot: any admissible Ĝ* gives you optimality. Better admissible heuristics (closer to G*) give faster search. The challenge is designing heuristics that are tight but still admissible.

Consistency — a stronger condition

A heuristic is consistent (also called monotone) if for every state x and action u leading to x':

&Ĝ;*(x) ≤ l(x, u) + &Ĝ;*(x')

This is the triangle inequality for heuristics: the estimated cost from x cannot be more than one step's cost plus the estimated cost from x'. Consistency implies admissibility (but not vice versa). The practical benefit: with a consistent heuristic, A* never needs to re-open already-closed states. Each state is settled exactly once — like Dijkstra — making the implementation simpler and avoiding a correctness edge case.

Consistency = admissibility + monotonicity. In practice, almost all admissible heuristics people use are also consistent. The distinction matters theoretically — inconsistent admissible heuristics require more careful implementation — but in grid planning, you rarely encounter the distinction.

Inadmissible heuristics: a calculated risk. If you use Ĝ*(x) = 2 × Manhattan_distance — overestimates by 2× — A* is no longer optimal. It might find a path that costs up to 2× the true optimum (weighted A* with weight w=2 has this guarantee). In practice, inadmissible heuristics are used when speed matters more than optimality. Real-time systems often accept 5% suboptimality for 10× speedup.

Dijkstra vs A* — Expansion Comparison

Same maze. Left: Dijkstra expands uniformly. Right: A* (Manhattan heuristic) expands toward the goal. Both find the same optimal path. Counters show nodes expanded — watch A* do more with less.

Ready — Dijkstra (left) vs A* with Manhattan (right)

python
import heapq
from math import inf

def manhattan(x, goal):
    return abs(x[0] - goal[0]) + abs(x[1] - goal[1])

def astar(prob, heuristic=None):
    """
    A* search with a pluggable heuristic.
    heuristic(x) → estimated cost-to-go from x.
    Must be admissible for optimal guarantee.
    """
    x_I  = prob.x_I
    goal = list(prob.X_G)[0]   # single goal for simplicity
    h    = heuristic or (lambda x: manhattan(x, goal))

    C      = {x_I: 0}
    parent = {x_I: None}
    heap   = [(h(x_I), 0, x_I)]  # (f = C + h, C, state)

    while heap:
        f, c, x = heapq.heappop(heap)

        if c > C.get(x, inf):    # stale entry
            continue

        if x == goal:
            path = []
            cur  = x
            while cur is not None:
                path.append(cur)
                cur = parent[cur]
            return list(reversed(path)), c

        for u in prob.actions(x):
            xp    = prob.f(x, u)
            new_c = c + prob.cost(x, u)
            if new_c < C.get(xp, inf):
                C[xp]      = new_c
                parent[xp] = x
                # Sort by f = cost-to-come + heuristic
                heapq.heappush(heap, (new_c + h(xp), new_c, xp))

    return None, inf

# Inadmissible heuristic — faster but may be suboptimal
def weighted_astar(prob, w=2.0):
    goal = list(prob.X_G)[0]
    h = lambda x: w * manhattan(x, goal)  # overestimates by w×
    return astar(prob, heuristic=h)        # still complete, not optimal

A* in practice — when is it dramatically faster?

A* truly shines when the state space has a natural metric and the goal is in a "known direction." Grid planning is the canonical case: Manhattan distance tells the algorithm "the goal is down and to the right, so expand downward and rightward states first." Dijkstra doesn't know this — it expands a full disc of equal-cost states in every direction.

A* performs poorly when: (a) the heuristic is weak (Ĝ*(x) ≈ 0 everywhere → degenerates to Dijkstra), or (b) the optimal path requires going "backwards" relative to the goal — e.g., a maze that forces a long detour away from the goal before arriving. In case (b), Ĝ* misleads the search initially, and A* may expand more nodes than Dijkstra before discovering the true path requires backtracking.

Worked numerical example: A* vs Dijkstra

5×5 grid, unit costs, wall at (1,2), (2,2), (3,2). Start (0,0), Goal (4,4). f(x) = C*(x) + Manhattan(x, (4,4)).

Step	Dijkstra extracts (C*)	A* extracts (f = C* + h)
1	(0,0) C=0	(0,0) f=0+8=8
2	(1,0) C=1 or (0,1) C=1	(1,0) f=1+7=8 or (0,1) f=1+7=8
3	(2,0),(1,1),(0,2) C=2 (all tied)	(1,1) f=2+6=8 — same or (2,0) f=2+6=8
…	Expands all of left side of grid	Focuses on lower-right quadrant
Final	~20 nodes expanded	~12 nodes expanded

A* discovers the wall at column 2 and routes around it to the right — fewer wasted expansions because it knew to go right and down from the start. Dijkstra expends equal effort exploring the left side of the grid even though it leads away from the goal.

A* memory usage warning. A* stores all generated states in memory — worst case O(b^d). For deep searches (Rubik's cube at depth 20, say), this can exhaust memory before finding the goal. Variants like IDA* (iterative deepening A*) and SMA* (simplified memory-bounded A*) address this at the cost of some speed. In robotics and game planning, IDA* is common precisely because it uses O(bd) memory.

Your heuristic is h(x) = 2 × Manhattan_distance(x, goal). All edge costs are 1. Is this admissible, and what does A* do with it?

Admissible — 2× Manhattan is still a lower bound on cost Not admissible — it overestimates, so A* may return a suboptimal path Not admissible, but A* still guarantees optimality on grids specifically

Chapter 6: Optimal Planning — The Cost Functional

So far we've asked: does a path exist? Now we ask: among all paths, which is best? This is Formulation 2.2 — optimization layered on top of Formulation 2.1. It introduces a cost functional that evaluates the quality of any complete plan.

What is a plan?

A plan of length K is a sequence of K actions (u₁, u₂, …, u_K) applied starting from x_I. Applying them produces a state sequence: x₁ = x_I, and x_k+1 = f(x_k, u_k) for k = 1, …, K. The final state x_F = x_K+1.

The cost functional L assigns a number to any plan:

L(π_K) = ∑_k=1^K l(x_k, u_k) + l_F(x_F)

Two components:

Running cost ∑ l(x_k, u_k): the cost accumulated step by step. This is what Dijkstra's C*(x) tracks.
Terminal cost l_F(x_F): paid once at the end. LaValle uses this to encode feasibility.

The terminal cost trick — encoding feasibility as optimization

LaValle defines the terminal cost with a beautiful trick:

l_F(x) = 0 if x ∈ X_G, l_F(x) = ∞ otherwise

This converts "plan must end in X_G" into unconstrained optimization. Instead of minimizing L subject to the constraint x_F ∈ X_G, we minimize L freely. Any plan that misses the goal gets L = ∞ and automatically loses. Any plan that reaches the goal gets L = (running cost). We never need to explicitly enforce the goal constraint — the ∞ penalty does it for us.

Special cases — same algorithm, different objectives

The power of Formulation 2.2 is that one algorithm (Dijkstra / A*) solves all these problems by swapping in a different l:

l(x, u)	L minimizes	Algorithm used
l(x, u) ≡ 0	Nothing — just reach the goal	BFS (any path)
l(x, u) ≡ 1	Number of steps in the plan	BFS (fewest steps)
l(x, u) = Euclidean distance	Total path length in meters	Dijkstra / A*
l(x, u) = energy(u)	Total energy consumed	Dijkstra / A*
l(x, u) = time(x, u)	Total travel time	Dijkstra / A*
l(x, u) = danger(x)	Total danger exposure	Dijkstra / A*

Dynamic programming principle behind L

The cost functional L has a recursive structure that is the foundation of dynamic programming and value iteration. Notice: if you split the plan π_K into a first step (x₁, u₁) and the remaining plan π_K-1 from x₂:

L(π_K) = l(x₁, u₁) + L(π_K-1 starting from x₂)

This is Bellman's principle of optimality: the tail of an optimal plan is itself optimal. If π* is optimal from x_I to X_G, then the sub-plan from x₂ onward must be optimal from x₂ to X_G. Otherwise, we could replace the tail with a cheaper one, contradicting optimality of π*.

Bellman's principle unlocks backward computation. Instead of searching forward from x_I, we can compute the optimal cost-to-go G*(x) for every state simultaneously by working backward from X_G. This is value iteration — the subject of Chapter 7 of this lesson. Dijkstra and value iteration are two different computational strategies for the same mathematical object.

The cost functional as a language

One underappreciated feature of Formulation 2.2 is that it provides a language for expressing what you want — separate from how you compute it. You write down l(x, u) and l_F(x) to describe your objective. Then you hand it to Dijkstra (or A* or value iteration) and get the answer. The algorithm doesn't care what l means physically — fuel, time, money, danger — it just minimizes it.

This separation of concerns — "what to optimize" vs "how to optimize it" — is the hallmark of good mathematical formalism. It's why Formulation 2.2 is worth memorizing: once you've cast your problem into it, the solution method follows automatically.

Multi-objective planning — a preview

What if you want to minimize both total time AND total energy? These are two different cost functionals L_time and L_energy. Optimizing both simultaneously is a multi-objective planning problem — beyond Formulation 2.2, which handles only one scalar objective. The standard approach: form a weighted combination L = w₁L_time + w₂L_energy and solve with Dijkstra. Different weights give different Pareto-optimal plans. LaValle covers multi-objective extensions in later chapters.

Why "plans" not "paths" — variable-length plans

In Formulation 2.2, we optimize over plans of a fixed length K. For each fixed K, this is a finite optimization — tractable by dynamic programming. But we don't know K in advance. LaValle handles this in two ways: (a) take the minimum over all K, which leads to the backward induction / value iteration framework (Chapter 7 of this lesson), or (b) use Dijkstra/A* which implicitly searches over all lengths simultaneously via the cost-to-come.

Comparing plans of different lengths is non-trivial. A 3-step plan costing 6 vs a 7-step plan costing 4: the shorter plan costs more per step but less overall? No — the 7-step plan costs less in total (L=4 vs L=6). We minimize L, not L/K. Length itself doesn't matter; total cost does. (Unless you add a per-step budget constraint — that's a different formulation.)

Cost-Aware Path Planning

Click cells to cycle their move cost: 1 → 2 → 3 → wall → 1. Numbers shown in cells. The optimal path (yellow) updates via Dijkstra as you change costs. L(π*) displayed below.

Click cells to toggle cost 1→2→3→wall→1 · L(π*) = —

Formulation 2.2 — the full code version

python
from math import inf
import heapq

class WeightedGridProblem(GridPlanningProblem):
    """Formulation 2.2: adds edge costs l(x,u) and terminal cost l_F."""

    def __init__(self, rows, cols, walls=None,
                 cell_costs=None, x_I=(0,0), X_G=None):
        super().__init__(rows, cols, walls, x_I, X_G)
        # cell_costs: dict (i,j) → cost to ENTER that cell
        self.cell_costs = cell_costs or {}

    # Running cost l(x, u): cost to enter the state we move to
    def cost(self, x, u):
        xp = self.f(x, u)
        return self.cell_costs.get(xp, 1)  # default cost 1

    # Terminal cost l_F(x): 0 if goal, ∞ otherwise
    def l_F(self, x):
        return 0 if x in self.X_G else inf

    # Cost functional L(π_K)
    def L(self, plan):
        """Evaluate cost functional on a plan (sequence of states)."""
        if len(plan) < 2: return self.l_F(plan[0])
        total = 0
        for k in range(len(plan) - 1):
            u = (plan[k+1][0] - plan[k][0],
                 plan[k+1][1] - plan[k][1])
            total += self.cost(plan[k], u)     # running cost
        total += self.l_F(plan[-1])            # terminal cost
        return total

# Example: robot must avoid expensive muddy zones
prob = WeightedGridProblem(
    rows=7, cols=7,
    cell_costs={(2,3):5, (3,3):5, (4,3):5},  # muddy corridor
    x_I=(0,0), X_G={(6,6)}
)
path, cost = dijkstra(prob)
print(f"Optimal cost: L(π*) = {cost}")
print(f"L evaluated directly: {prob.L(path)}")  # same number

Connection to Dijkstra

Dijkstra computes exactly L(π*) — the minimum value of L over all plans reaching X_G. When Dijkstra extracts a goal state g from Q with cost C*(g), that IS the minimum of L. The l_F = ∞ for non-goal states means any plan ending outside X_G has L = ∞ — Dijkstra would never output such a plan as optimal. Formulation 2.2 and Dijkstra are two sides of the same coin.

What makes a good cost function — practical considerations

Choosing l(x, u) well is as much art as science. A few pitfalls to avoid:

Misspecified objectives lead to unexpected plans. If you minimize total distance but forget to penalize dangerous zones, the robot will happily cross a busy intersection at peak hour. The cost function is a contract: the algorithm delivers exactly what you specify.
Negative costs create paradoxes. If l(x, u) < 0 for some actions, the robot might benefit from wandering — taking costly detours to collect negative-cost rewards. Dijkstra breaks; you need Bellman-Ford or modified formulations.
Infinite costs are useful flags. Setting l(x, u) = ∞ for "forbidden" transitions (restricted zones, load-bearing walls) blocks them cleanly. The optimizer treats them identically to non-existent edges — they simply never appear in any optimal plan.
Scale matters. If l_time is in seconds and l_energy is in joules, combining them as w₁·time + w₂·energy requires careful unit analysis. A 1-second detour vs 1-joule saving: are they comparable? The weights encode your trade-off preferences.

Summary of Chapter 6. Formulation 2.2 adds a cost functional L to Formulation 2.1. L = Σ l(x,u) + l_F(x_F). The terminal cost trick (l_F = ∞ outside X_G) converts constrained feasibility into unconstrained minimization. Dijkstra computes L(π*) exactly. Bellman's optimality principle gives L its recursive structure, previewing value iteration in the next chapter.

Plan π reaches x_G with total running cost Σl = 15. Plan π' does NOT reach x_G and has total running cost Σl = 3. Using the Formulation 2.2 cost functional, which plan wins?

π wins: L(π) = 15 + 0 = 15, L(π') = 3 + ∞ = ∞ π' wins: running cost 3 < 15 They tie: we only compare running costs

Chapter 7: Backward Value Iteration — The Key Algorithm

Forward search (Dijkstra, A*) starts from x_I and races toward X_G. There is a completely different strategy: start from X_G and compute, for every state simultaneously, the optimal cost to reach the goal. This is backward value iteration — LaValle's central algorithm for optimal planning under a fixed horizon.

The result is a function G*: X → ℝ ∪ {∞} called the optimal cost-to-go. Once you have G*, extracting the optimal plan from any starting state is trivial — one greedy step at a time. Computing G* is the hard part; using it is free.

Defining G* precisely

Fix a horizon K (plan length). The optimal cost-to-go from stage k is:

G*_k(x_k) = min_{u_k,…,u_K} &Bigl[ ∑_i=k^K l(x_i, u_i) + l_F(x_F) &Bigr;

Read this as: starting at state x_k at time k, what is the minimum cost achievable over the remaining K−k+1 steps? The boundary condition anchors the recursion:

G*_K+1(x) = l_F(x) = 0 if x ∈ X_G, ∞ otherwise

Deriving the Bellman equation — every step shown

Starting from the definition of G*_k(x_k), we derive the recursion that makes computation possible. The derivation has four steps — follow each carefully.

Step 1. Write out the minimization in full:

G*_k(x_k) = min_{u_k,…,u_K} &Bigl[ l(x_k,u_k) + l(x_k+1,u_k+1) + … + l(x_K,u_K) + l_F(x_F) &Bigr]

Step 2. Pull the first term outside the min — it only depends on u_k, not on the future choices u_k+1,…,u_K:

= min_{u_k} &Bigl[ l(x_k,u_k) + min_{u_k+1,…,u_K} &Bigl[ l(x_k+1,u_k+1) + … + l_F(x_F) &Bigr] &Bigr]

Step 3. Recognize the inner minimization. It is exactly G*_k+1(x_k+1) — the optimal cost-to-go from the next state at the next time step. And x_k+1 = f(x_k, u_k):

= min_{u_k} &Bigl[ l(x_k,u_k) + G*_k+1(f(x_k, u_k)) &Bigr]

Step 4 — The Bellman Equation (LaValle eq. 2.11). This IS the algorithm:

G*_k(x) = min_{u ∈ U(x)} &Bigl[ l(x, u) + G*_k+1(f(x, u)) &Bigr]

Compute G*_K+1 first (boundary), then G*_K, then G*_K−1, all the way back to G*₁. Each step touches every state once. Each state does |U(x)| comparisons. Total: O(K · |X| · |U|).

Worked example — 5 states, 4 steps (LaValle Example 2.3)

States {a, b, c, d, e}. Horizon K = 4. Start x_I = a. Goal X_G = {d}. We compute G*₅ through G*₁ using the Bellman equation. The complete table (every cell computed from the recurrence):

State	G*₅	G*₄	G*₃	G*₂	G*₁
a	∞	∞	6	4	6
b	∞	4	2	6	4
c	∞	1	∞	3	5
d	0	∞	2	∞	4
e	∞	∞	∞	∞	∞

Walk through key cells:

G*₅: Boundary condition. Only d ∈ X_G, so G*₅(d) = l_F(d) = 0. Every other state gets l_F = ∞.
G*₄(c): What actions lead from c? c can go to d with cost 1. So G*₄(c) = min{l(c,→d) + G*₅(d)} = 1 + 0 = 1. No other action from c leads to a finite G*₅.
G*₄(b): From the graph structure, b can reach d in one step with cost 4. G*₄(b) = 4 + G*₅(d) = 4 + 0 = 4. Other actions from b lead to states with G*₅ = ∞.
G*₃(b): b can reach c (cost 1): 1 + G*₄(c) = 1 + 1 = 2. Or reach d (cost 4): 4 + G*₄(d) = 4 + ∞ = ∞. Min = 2.
G*₁(a): The answer to the original problem. G*₁(a) = 6 — that is the minimum cost to go from a to d in exactly 4 steps.
e: e has no outgoing edges that reach the goal in any number of steps. Every cell stays ∞ — e is a dead end.

Complexity: O(K · |X| · |U|). Each of the K backward iterations visits every state (|X| = 5 here) and tries every action (|U| ≤ 4 for a grid). Total work scales linearly in K. Compare to brute-force enumeration of all paths, which is exponential in K. Value iteration is the efficient alternative.

The algorithm in code

python
from math import inf

def backward_value_iteration(prob, K):
    """
    Backward value iteration (LaValle eq. 2.11).
    Returns G[k][x] = G*_k(x) for k = 1..K+1.
    prob.X_G, prob.states(), prob.actions(x), prob.f(x,u), prob.cost(x,u)
    """
    states = list(prob.states())

    # G[K+1] = boundary condition: l_F(x)
    G = [{x: (0 if x in prob.X_G else inf) for x in states}]

    # Backward induction: compute G_K, G_{K-1}, ..., G_1
    for k in range(K, 0, -1):
        G_next = G[-1]            # G*_{k+1}
        G_k = {}
        for x in states:
            best = inf
            for u in prob.actions(x):
                xp = prob.f(x, u)
                val = prob.cost(x, u) + G_next.get(xp, inf)
                if val < best:
                    best = val
            G_k[x] = best
        G.append(G_k)

    G.reverse()
    return G

# Extract optimal plan greedily from G*
def extract_plan(prob, G, x_I, K):
    path, x = [x_I], x_I
    for k in range(K):
        best_u, best_val = None, inf
        for u in prob.actions(x):
            xp  = prob.f(x, u)
            val = prob.cost(x, u) + G[k+1].get(xp, inf)
            if val < best_val:
                best_u, best_val = u, val
        if best_u is None: break
        x = prob.f(x, best_u)
        path.append(x)
        if x in prob.X_G: break
    return path

Value Iteration — Wavefront from Goal

Each cell shows its current G* value. Click Step to run one backward iteration — watch the wavefront of finite values expand outward from the goal (red). Cells that can never reach the goal stay dark. After all steps, click Show Plan to highlight the optimal path from the green start.

Ready — G* initialized: goal=0, all others=∞

In backward value iteration, why do we compute G*_K+1 FIRST (the boundary) and then G*_K, G*_K−1, … G*₁ — rather than going forward from G*₁?

Tradition — both directions work equally well The Bellman equation G*_k(x) depends on G*_k+1 — you need later stages computed before earlier ones Going backward is faster in big-O complexity

Chapter 8: Variable-Length Plans — Convergence to Stationarity

Chapter 7's backward value iteration required knowing K in advance. But the warehouse robot doesn't know how many steps it will need — it just needs to reach the goal. Formulation 2.3 fixes this by adding a termination action u_T: once applied, the robot stays put and accumulates no further cost.

This elegant addition converts the fixed-horizon problem into a variable-length one. Plans (u₁, u₂) and (u₁, u₂, u_T, u_T, …) become equivalent — the termination action pads any plan to any length without changing its cost. Now we can optimize over all plan lengths simultaneously.

The termination action — making it precise

Termination action u_T: At any state x, applying u_T sends f(x, u_T) = x (self-loop) and l(x, u_T) = 0 (zero cost). Once you reach x_G ∈ X_G, you apply u_T forever. The robot is "done." The terminal cost l_F is collected once, at the first application of u_T at the goal.

The stationary algorithm

With u_T in the action space, run backward value iterations without stopping at K. Instead, run until the values stop changing:

G*_k−1(x) = G*_k(x) for all x ∈ X

When this holds, the iterations have converged. The common value is simply called G*(x) — the stationary optimal cost-to-go. It satisfies:

G*(x) = min_{u ∈ U(x)} &Bigl[ l(x,u) + G*(f(x,u)) &Bigr]

Notice: no subscript k. This is the fixed-point equation — G* is its own "next iteration."

Traced example — 5-state graph to stationarity

State	iter 0	iter 1	iter 2	iter 3	iter 4 (stable)
a	∞	∞	6	4	4
b	∞	4	2	2	2
c	∞	1	1	1	1
d	0	0	0	0	0
e	∞	∞	∞	∞	∞

Between iterations 3 and 4, values stop changing — convergence reached. G*(a) = 4: three steps, not four. The extra flexibility of variable-length plans found a cheaper route than the fixed-horizon K=4 case.

Recovering the optimal plan from G* (eq. 2.19)

Once G* is known, the optimal action at any state x is the greedy minimizer:

u*(x) = argmin_{u ∈ U(x)} &Bigl[ l(x, u) + G*(f(x, u)) &Bigr]

Trace from x_I = a:

From a: To b: 2 + G*(b) = 2+2 = 4. To a: 2+G*(a) = 2+4 = 6. Choose b.
From b: To c: 1 + G*(c) = 1+1 = 2. To a: 1+4 = 5. Choose c.
From c: To d: 1 + G*(d) = 1+0 = 1. Choose d. Goal reached.
Plan: a → b → c → d. Total cost = 2+1+1 = 4. Optimal.

The policy is a map, not a sequence. u*(x) is defined for every reachable state x — not just states on the path from x_I. If the robot is knocked off course and ends up at b directly, u*(b) = go to c is still optimal. A policy is robust to deviations; a fixed action sequence is not. This robustness is the key advantage of backward value iteration over forward path-finding.

python
from math import inf

def stationary_value_iteration(prob, tol=1e-9, max_iter=10000):
    """Variable-length backward value iteration (Formulation 2.3).
    Runs until convergence. Returns G_star: dict {state: G*(state)}."""
    states = list(prob.states())
    G = {x: (0 if x in prob.X_G else inf) for x in states}

    for _ in range(max_iter):
        G_new, delta = {}, 0
        for x in states:
            best = 0 if x in prob.X_G else inf
            for u in prob.actions(x):
                xp  = prob.f(x, u)
                val = prob.cost(x, u) + G.get(xp, inf)
                if val < best: best = val
            G_new[x] = best
            if G[x] < inf and G_new[x] < inf:
                delta = max(delta, abs(G_new[x] - G[x]))
        G = G_new
        if delta <= tol: break
    return G

def optimal_policy(prob, G_star):
    policy = {}
    for x in prob.states():
        if x in prob.X_G or G_star[x] == inf: continue
        best_u, best_val = None, inf
        for u in prob.actions(x):
            xp  = prob.f(x, u)
            val = prob.cost(x, u) + G_star.get(xp, inf)
            if val < best_val: best_u, best_val = u, val
        policy[x] = best_u
    return policy

Convergence to Stationarity

Values propagate outward from the goal (red) each iteration. The bar chart on the right tracks max delta per iteration — convergence happens when it hits zero. Click Step to advance one iteration and watch cells stop changing.

Iteration 0 — max Δ = ∞

Stationary value iteration converges when G*_k−1(x) = G*_k(x) for all x. What does this mean geometrically in the grid?

All cells have reached cost 0 (every state is a goal) The algorithm has run for exactly K steps equal to the grid diameter The wavefront from the goal has reached every reachable state — adding more iterations changes nothing

Chapter 9: Showcase — BFS · Dijkstra · A* Face-Off

Everything we've built — BFS, Dijkstra, A* — runs side by side on the same maze. All three start simultaneously. Watch the wavefronts diverge. Count the expanded nodes. The differences that were abstract in earlier chapters become viscerally clear when you see them in motion.

The core insight in one picture: BFS expands a uniform disc. Dijkstra expands a cost-weighted disc. A* expands an ellipse pointed at the goal. Same path found; dramatically different work done.

Algorithm comparison table

Algorithm	Q sorted by	Optimal?	Requires	Complexity
BFS	FIFO (insertion order)	Yes (unit cost)	—	O(V+E)
Dijkstra	min C*(x)	Yes	Non-neg costs	O(V log V + E)
A*	min C(x) + Ĝ(x)	Yes	Admissible h	O(V log V + E)
Greedy best-first	min Ĝ*(x) only	No	—	O(V log V + E)
DFS	LIFO	No	—	O(V+E)

Dijkstra as smart forward value iteration

There is a deep connection between backward value iteration (Chapter 7) and Dijkstra. Both compute optimal costs — but from opposite directions and with different data structures.

Dijkstra is essentially forward value iteration with a priority queue that ensures states are processed in order of increasing C*(x). When x is popped from the heap, C*(x) is final — exactly like a state whose G* has stabilized in backward VI. The three categories (unvisited / alive / dead) in forward search map directly to (∞ / tentative / converged) in value iteration.

Why Dijkstra requires non-negative costs — revisited. In value iteration terms: when a state becomes "dead," we need to guarantee no future path can improve its cost. Non-negative edges ensure any extension of the current minimum-cost path only increases cost. Negative edges could create a longer path that is cheaper — the "dead" state would need reopening, breaking Dijkstra's one-pass structure.

Showcase — Three Algorithms, One Maze

BFS (left) · Dijkstra (center) · A* with Manhattan (right). All three search the same maze simultaneously. Orange = alive. Teal = dead. Yellow = final path. A* typically finishes with fewest expansions. Click New Maze to generate a different layout.

Ready — click Step All or Run

      BFS: 0 expanded
      Dijkstra: 0 expanded
      A*: 0 expanded
    

What the showcase teaches

Open spaces: A* dominates. It beelines toward the goal. BFS and Dijkstra expand uniformly in all directions.
Dense mazes: All three converge. When the only path winds through the whole grid, every algorithm must follow it — heuristic guidance helps less.
Goal near start: BFS is fastest — it finds the goal in the first wavefront. A*'s heuristic computation overhead isn't worth it for short searches.
Unit costs: BFS and Dijkstra find the same path. With non-uniform costs, Dijkstra outperforms BFS (BFS ignores costs; Dijkstra respects them).

The takeaway: A* is not always better than Dijkstra. It wins when the heuristic is informative and the goal is far. When the heuristic is weak or the problem has unusual structure (winding paths that force detours away from the goal), Dijkstra may be preferable. Always profile before committing to A*.

On a grid where ALL edge costs are 1, which algorithm finds the path with the fewest edges fastest (with the lowest overhead)?

BFS — it uses a simple FIFO queue (O(1) per operation), no heap overhead Dijkstra — it handles unit costs as a special case with equal efficiency A* — the Manhattan heuristic guides it to expand fewer nodes

Chapter 10: Complexity & the State Space Explosion

Discrete planning algorithms are polynomially efficient in the size of the state space |X|. The problem is that |X| grows exponentially in the number of variables describing the state. A robot with n joints, each discretized to D angles, has |X| = Dⁿ states. This is the curse of dimensionality — the central challenge of motion planning.

Complexity of the core algorithms

Algorithm	Time	Space	Notes
BFS	O(\|V\|+\|E\|)	O(\|V\|)	All reachable states in queue
DFS	O(\|V\|+\|E\|)	O(b·d)	d = depth, b = branching factor
Dijkstra	O((\|V\|+\|E\|) log \|V\|)	O(\|V\|)	log factor from heap ops
A*	O(b^d) worst	O(b^d)	O(d) best with perfect heuristic
Backward VI (fixed K)	O(K·\|X\|·\|U\|)	O(\|X\|)	Two arrays: current and next G*
Stationary VI	O(D·\|X\|·\|U\|)	O(\|X\|)	D = graph diameter

State space explosion — the real enemy

Problem	\|X\|	At 10⁹ states/sec
9×9 grid	81	< 1 µs
6-DOF arm, 360 angles each	360⁶ ≈ 2×10¹⁵	~23 days
Rubik's cube	4.3×10¹⁹	~1.4 billion years
Chess (mid-game)	~10⁴⁷	Heat death of universe

The algorithms of Chapter 2 are theoretically clean but practically limited to small state spaces. Everything in Chapters 4–6 of LaValle (configuration spaces, sampling-based planning) is a response to this explosion — planning in continuous spaces without ever discretizing them fully.

Approaches to state space explosion

Symmetry reduction. Many states are equivalent under symmetry. The 15-tile puzzle has 8 symmetries — factor them out, reduce |X| by 8×.
Pattern databases. Solve sub-problems exactly and use the results as heuristics. The pattern database for 7 of 15 tiles in the sliding puzzle reduces A* runtime from infeasible to seconds.
Sampling-based methods (LaValle Ch. 5). Instead of discretizing state space, sample random configurations and build a roadmap. PRM and RRT handle 100-dimensional spaces routinely by abandoning completeness for computational tractability.
Hierarchical planning. Plan coarsely at a high level, then refine locally. The high-level graph might have 100 nodes; the low-level millions, but only queried near the path.

The complete cheat sheet

Concept	Formula
Transition function	x_k+1 = f(x_k, u_k)
Cost functional	L = ∑ l(x_k, u_k) + l_F(x_F)
Terminal cost trick	l_F(x) = 0 if x ∈ X_G, else ∞
Bellman equation	G_k(x) = min_u{l(x,u) + G_k+1(f(x,u))}
Stationary fixed point	G(x) = min_u{l(x,u) + G(f(x,u))}
Optimal policy	u(x) = argmin_u{l(x,u) + G(f(x,u))}
Admissibility	Ĝ(x) ≤ G(x) for all x
A* sort key	f(x) = C(x) + Ĝ(x)
Dijkstra update	C(x') ← min(C(x'), C*(x) + l(x,u))

A robot arm has 5 joints, each discretized to 100 angles. How many states does the Formulation 2.1 state space have?

500 — 5 × 100 10¹⁰ — 100⁵ 10⁵⁰ — 100⁵⁰

Chapter 11: Connections — Where This Leads

Chapter 2 is the bedrock. Every algorithm in the rest of LaValle either directly extends these ideas or needs them as prerequisite. This chapter maps the road ahead and closes the loop on the full arc of discrete planning.

What Chapter 2 didn't cover

Sections 2.4–2.5: Logic-based planning (STRIPS/PDDL). States as sets of propositions; actions with preconditions and effects. Important for task planning ("pick up block A if arm is empty and block is clear"), less central for motion planning.
Bidirectional search. Run forward from x_I and backward from X_G simultaneously, meeting in the middle. Reduces O(b^d) to O(b^d/2). Tricky to implement correctly for weighted graphs (the meeting condition is subtle).
Anytime algorithms. Return a suboptimal solution immediately, then improve as computation continues. Useful when you need a plan now, not a perfect plan later.

How Chapter 2 extends to the rest of the book

Chapter	Extension of Ch. 2	Key new idea
Ch. 4: Configuration Space	X becomes continuous C-space	Robot geometry → obstacle regions
Ch. 5: Sampling-Based	Graph search on random roadmaps	RRT, PRM — random exploration
Ch. 10: MDPs	f(x,u) becomes probabilistic	Stochastic transitions, expected cost
Ch. 11: POMDPs	State not fully observable	Belief space planning
Ch. 13: Differential Constraints	U(x) constrained by dynamics	Kinodynamic planning

The thread from discrete planning to deep RL

The Bellman equation G*(x) = min_u{l(x,u) + G*(f(x,u))} is deterministic. Replace f(x,u) with a transition probability P(x'|x,u) and you get the MDP Bellman equation. Value iteration works almost identically — replace min with expected value. Reinforcement learning is value iteration when the transition function and cost are unknown: estimated from experience rather than given analytically.

The full arc: Formulation 2.1 (deterministic, known) → MDP (stochastic, known) → POMDP (partial observability) → RL (unknown model, stochastic) → Deep RL (function approximation for enormous state spaces). Every step relaxes one assumption. The Bellman equation remains the organizing principle throughout.

Related lessons on this site

Lesson	Connection to Ch. 2
MDP	Extends Formulation 2.1 with stochastic transitions and expected costs
RL Algorithms	Q-learning = value iteration with unknown model, estimated online
Bayes Filter	State estimation in Ch. 10 (POMDPs) requires belief-space tracking

What you now know from Chapter 2. Formulation 2.1 (five components). Formulation 2.2 (cost functional L = Σl + l_F, Bellman principle). Formulation 2.3 (variable-length plans via u_T). The Bellman equation and backward value iteration. Forward search (BFS, DFS, Dijkstra, A*) as instances of the master algorithm differing only in Q ordering. Their complexities, optimality guarantees, and failure conditions. This is the complete toolkit for deterministic discrete planning — every later chapter is an extension of this foundation.

The deterministic Bellman equation is G*(x) = min_u{l(x,u) + G*(f(x,u))}. In a stochastic MDP where u from x leads to x' with probability P(x'|x,u), what does it become?

G*(x) = min_u{l(x,u) + max_x' G*(x')} — maximize over next states G*(x) = l(x,u*) + G*(f(x,u*)) — same equation, just plug in optimal action G*(x) = min_u{l(x,u) + ∑_x' P(x'|x,u) G*(x')} — expected value over next states