LaValle, Chapter 2

Discrete Planning

State spaces, graph search, value iteration, Dijkstra, A* — the algorithmic foundations everything else builds on.

Prerequisites: Basic graph theory (nodes, edges, paths) + Python. That's it.
12
Chapters
6+
Simulations
12
Quizzes

Chapter 0: The Problem

A warehouse robot wakes up at the loading dock — call it cell (0,0). Its job: deliver a pallet to shelf #247, sitting at cell (8,8) on a 9×9 grid. Between them: rows of shelves, some aisles blocked by pallets stacked on the floor. The robot can move up, down, left, or right — one tile per step. It cannot pass through blocked cells.

The question sounds simple: how does the robot find a path? But three deeper questions hide inside it, and Chapter 2 answers all three. First: how do we represent this problem mathematically — precisely enough for a computer to reason about it? Second: how do we search for a path without looping forever or wasting exponential work? Third: when some routes are cheaper than others — slippery floors, detour penalties — how do we find the optimal path?

The canvas below is the warehouse. Click any gray cell to toggle a wall (blocked aisle). The green cell is the start; the red cell is the goal. Try building mazes. Notice which configurations trap the robot completely — no path exists at all. Detecting "no path" is just as important as finding a path: a robot that searches forever for an impossible route is useless in production.

Warehouse Grid — Place Obstacles

Click any gray cell to toggle a wall. Green = start (0,0). Red = goal (8,8). Every cell is a state. Every valid move is an action. This IS the state transition graph.

9×9 grid · 81 states · up to 4 actions per state
The grid IS a graph. Every cell is a node. Every valid move between adjacent, unblocked cells is a directed edge. Finding a path from start to goal = finding a path in this graph. All of Chapter 2's machinery is just making this intuition precise enough to implement correctly and to prove things about.

Why discrete planning is hard (even in theory)

The naive approach — enumerate all possible paths, compute their costs, return the minimum — is computationally catastrophic. In a 9×9 grid, the number of self-avoiding paths from corner to corner can be astronomical (tens of millions). We need smarter algorithms that avoid ever explicitly enumerating paths.

The key insight: instead of thinking about paths (exponentially many), think about states (only 81). There are only 81 possible positions the robot can be in. If we track which states we've visited and build up costs incrementally, the computation is polynomial in |X| — not exponential in path length. This is the fundamental reason graph search algorithms work.

Thinking in states, not paths. A maze with 81 cells might have 108 distinct self-avoiding paths from start to goal. Enumerating all of them is hopeless. But there are only 81 states. If we visit each state at most once, the algorithm terminates in at most 81 steps. Tracking states (not paths) converts an exponential problem into a polynomial one.

One more thing worth noticing: the number of states explodes in harder problems. Our 9×9 grid has 81 states — trivial for any computer. A Rubik's cube has 4.3 × 1019 states. A robot arm with 6 joints, each discretized to 360 angles, has 3606 ≈ 2.2 × 1015 states. The algorithms we build need to work on spaces too large to enumerate explicitly — which is why the theory matters.

Three problems disguised as one

The warehouse scenario neatly illustrates the three sub-problems Chapter 2 addresses. Notice how each one builds on the previous:

ProblemQuestionAnswer algorithm
FeasibilityDoes any path exist from start to goal?BFS or DFS
Minimum stepsWhich path uses the fewest moves?BFS (wavefront guarantees it)
Minimum costWhich path has lowest total cost (muddy/paved mix)?Dijkstra or A*

These three problems share the same mathematical framework — Formulation 2.1 — but differ in what they optimize. A key lesson from LaValle: choosing the right algorithm starts with identifying which problem you actually have. Running Dijkstra when you want minimum steps is correct but wasteful. Running BFS when you want minimum cost gives wrong answers.

Chapter 2 also draws a boundary LaValle is careful to respect: these algorithms assume the world is deterministic and fully observable. The robot knows exactly where it is (full observability), and actions always have their intended effect (determinism). Relax either assumption — the robot's sensors are noisy, or the floor is slippery — and you're in Chapters 10–12 (probabilistic planning). The discrete planning of Chapter 2 is the foundation: master it, and the probabilistic extensions become natural generalizations.

If a 9×9 grid has no obstacles, approximately how many edges does its state transition graph have? (Interior cells have 4 neighbors; edge cells have 2–3.)

Chapter 1: Formulation 2.1 — The Five Components

Before writing a single line of search code, LaValle insists on nailing the problem down precisely. Formulation 2.1 gives every discrete planning problem exactly five ingredients. Get these five right and the algorithm almost writes itself.

SymbolNameMeaning
XState spaceAll possible situations the system can be in
U(x)Action spaceActions available at state x (varies by state)
f(x, u)Transition functionNext state when taking action u from state x
xIInitial stateWhere we start
XGGoal setThe set of states we want to reach

Each component — with physical meaning

1. The state space X. For our 9×9 warehouse grid: X = {(i, j) : i, j ∈ {0, …, 8}, (i, j) not a wall}. With no walls, |X| = 81. But the concept scales brutally: a Rubik's cube has |X| = 43,252,003,274,489,856,000 states. A 19×19 Go board has more states than atoms in the observable universe. The state must capture all and only the information needed to determine what happens next.

2. The action space U(x). At most interior cells: U(x) = {up, down, left, right} = {(−1,0), (1,0), (0,−1), (0,1)}. But at a boundary or next to a wall, some moves leave X, so they're removed from U(x). This dependence on x is critical — it's why we write U(x) and not just U. At different states, different actions are legal.

3. The transition function f(x, u) = x + u. For our grid this is pure vector addition. Worked example: x = (3, 4), u = (0, 1) → f((3,4), (0,1)) = (3, 5). The robot moved one tile up. For more complex systems — robot arms, game boards, autonomous vehicles — f is much more involved, but the concept is identical: it maps (state, action) to next state.

4. The initial state xI. The loading dock, xI = (0, 0). This is where planning starts. In some problems, xI is also a set (multiple possible starts), but Formulation 2.1 takes it as a single state.

5. The goal set XG. For one target shelf: XG = {(8, 8)}. But goals can be sets: any charging dock, any empty parking space, any configuration with the cargo delivered. Planning succeeds as soon as the robot reaches any x ∈ XG, regardless of which goal state.

The state must include ALL relevant information — and NOTHING else. A robot planning in a Paris warehouse shouldn't encode whether the lights are on in Beijing. Irrelevant state bloat converts tractable problems into impossible ones. LaValle calls this the Markov property: f(x, u) must depend only on the current state x and action u — not on history. If your state doesn't satisfy this, you have the wrong state space.

The state transition graph

Once you have X, U(x), and f, you implicitly define a directed graph. Vertices are states in X. There is a directed edge from x to x' if and only if there exists some u ∈ U(x) such that f(x, u) = x'. This is the state transition graph, and it is the central object of Chapter 2.

G = (X, E)    where    (x, x') ∈ E   ⇔   ∃ u ∈ U(x) : f(x, u) = x'

Planning = find a directed path in G from xI to any vertex in XG. That's the whole game. BFS, DFS, Dijkstra, A*, and value iteration are all just different strategies for searching this graph.

The transition graph is usually implicit

For a 9×9 grid, you could in principle build the full transition graph in memory: 81 vertices, ~288 edges, an adjacency list of ~1KB. But for the Rubik's cube with 4.3 × 1019 states, building the graph explicitly would require more RAM than exists on Earth. Instead, the graph is kept implicit — defined by the functions X, U(x), and f(x,u) rather than a stored data structure.

This is why Formulation 2.1 lists X, U(x), f rather than "an adjacency list." The functions are a compact description of a potentially enormous graph. The search algorithms — BFS, Dijkstra, A* — only generate the nodes they need. A* on the Rubik's cube might explore a few million nodes out of 4.3 × 1019 before finding the solution. The rest of the graph is never instantiated.

Implicit vs explicit graphs. An explicit graph (adjacency list/matrix) stores all nodes and edges up front. An implicit graph is described by (X, U(x), f) and nodes are generated on demand. All real planning problems use implicit graphs — the state space is too large for explicit storage. This is why Formulation 2.1 is stated the way it is.
State Transition Graph — 5×5 Grid

A 5×5 grid with the transition graph overlaid. Arrows show valid moves (directed edges). Click a cell to place a wall — watch its edges disappear and edges pointing to it vanish too.

Click a cell to place/remove a wall

Enumerating all edges of the transition graph

For a 3×3 grid with no walls, let's explicitly list the state transition graph. This helps build intuition before we move to larger spaces:

python
# Build the explicit state transition graph for a 3×3 grid
prob = GridPlanningProblem(rows=3, cols=3)

edges = []
for x in prob.states():
    for u in prob.actions(x):
        xp = prob.f(x, u)
        edges.append((x, xp))

print(f"|V| = {len(prob.states())}")   # 9 states
print(f"|E| = {len(edges)}")           # 24 directed edges

# Show adjacency for (1,1) — the center cell
center = (1, 1)
neighbors = [prob.f(center, u) for u in prob.actions(center)]
print(f"Neighbors of {center}: {neighbors}")
# → [(0,1), (2,1), (1,0), (1,2)] — all 4 directions, center has them all

corner = (0, 0)
neighbors_corner = [prob.f(corner, u) for u in prob.actions(corner)]
print(f"Neighbors of {corner}: {neighbors_corner}")
# → [(1,0), (0,1)] — only 2 directions; corner has fewer options

The center cell (1,1) has 4 edges. Corner cells have 2 edges. Edge cells have 3 edges. For N×N: total edges ≈ 2×N×(N−1)×2 = 4N(N−1). For N=9: 4×9×8 = 288 edges — matching our earlier estimate.

Worked example: f in action

f((5, 3), (−1, 0)) = ?  Add component-wise: (5 + (−1), 3 + 0) = (4, 3). The robot moved left one tile, from column 5 to column 4, same row 3. This is the transition function doing its one job.

An important subtlety: f is only applied when u ∈ U(x). Before calling f, the algorithm checks "is this action legal?" If (5,3) is adjacent to a wall on the left, then (−1,0) ∉ U((5,3)) and f is never called. The state space definition handles this implicitly by only including valid actions in U(x).

python
class GridPlanningProblem:
    """Formulation 2.1 for a rectangular 2D grid."""

    def __init__(self, rows, cols, walls=None, x_I=(0,0), X_G=None):
        self.rows  = rows
        self.cols  = cols
        self.walls = walls or set()   # set of (i,j) tuples
        self.x_I   = x_I
        self.X_G   = X_G or {(rows-1, cols-1)}

    # State space X — all non-wall cells (implicit)
    def in_bounds(self, x):
        i, j = x
        return 0 <= i < self.rows and 0 <= j < self.cols

    # Action space U(x) — only moves that stay in X
    MOVES = [(-1,0), (1,0), (0,-1), (0,1)]  # up, down, left, right

    def actions(self, x):
        i, j = x
        return [u for u in self.MOVES
                if self.in_bounds((i+u[0], j+u[1]))
                and (i+u[0], j+u[1]) not in self.walls]

    # Transition function f(x, u) = x + u (vector addition)
    def f(self, x, u):
        return (x[0] + u[0], x[1] + u[1])

    # Edge cost l(x, u) — uniform for now, overridden later
    def cost(self, x, u):
        return 1

# Example usage — Formulation 2.1 for a 9×9 warehouse grid
prob = GridPlanningProblem(
    rows=9, cols=9,
    walls={(3,3),(3,4),(4,3),(5,5),(6,4)},
    x_I=(0,0),
    X_G={(8,8)}
)

# Check: what actions are available at (3,2)?
print("U((3,2)):", prob.actions((3,2)))
# → [(-1,0), (1,0), (0,-1)]  — (0,1) blocked by wall at (3,3)

# Apply transition: f((3,2), (-1,0))
print("f((3,2),(-1,0)):", prob.f((3,2), (-1,0)))
# → (2, 2)  — moved up

The Markov property — why state design is hard

The Markov property says: everything you need to know about the future is encoded in the current state x. History doesn't matter — only where you are right now. For our grid robot, position (i, j) satisfies this: knowing you're at (3, 4) tells you exactly what moves are available and where they lead, regardless of how you got there.

But Markov breaks down in surprising ways. Consider a robot that needs to carry a box from shelf A to dock B. Just knowing the robot's position isn't enough — the state must also include whether it's currently holding the box. Fail to include "holding" in x, and f(x, u) is undefined (does "move right" take the box along or leave it behind?). The fix: x = (robot_position, is_holding). State space doubles, but now f is well-defined.

The Markov property is a design constraint on your state space. If two scenarios look identical in your state representation but lead to different futures, your state is underdefined. Keep adding information to x until f is a deterministic function of (x, u) alone — then you have a valid Formulation 2.1.

Graphs derived from Formulation 2.1

Formulation 2.1 can represent a surprising variety of problems beyond navigation. The key is choosing X, U(x), and f correctly:

ProblemState xAction uf(x, u)
Grid navigation(row, col)(Δrow, Δcol)x + u
Puzzle (8-tile)Tile configurationSlide a tileNew configuration
Rubik's cubeFace permutationRotate a faceNew permutation
Pick-and-place robot(position, holding?)Move/grab/releaseUpdated (pos, holding)
Turn-based gameBoard position + turnLegal moveNew board + other turn

Once you've cast a problem into Formulation 2.1, the same search algorithms — BFS, Dijkstra, A* — apply without modification. This universality is the deepest insight of Chapter 2.

State space size — a reality check

The state space size |X| is the single most important factor in planning difficulty. Here's how different problems compare:

Problem|X| (states)Tractable?
9×9 warehouse grid81Instant
100×100 city grid10,000Milliseconds
10×10×10 3D maze1,000Instant
Rubik's cube4.3 × 1019Only with heuristics
15-tile sliding puzzle1013Only with A* + pattern DBs
Chess (mid-game positions)~1047Requires game-specific methods

Discrete planning is only feasible when |X| is manageable — or when heuristics focus the search on a small fraction of X. The choice of state representation is therefore crucial: a bad representation inflates |X| unnecessarily, while a clever one keeps it tight.

In Formulation 2.1, f(x, u) = x + u (vector addition). What is f((5, 3), (−1, 0))?

Chapter 2: Forward Search — The Master Algorithm

We have a state space, a transition function, and a goal. Now: how do we find the path? LaValle's Figure 2.4 presents a single forward search algorithm that subsumes BFS, DFS, Dijkstra, and A* as special cases. The only difference between them is how one data structure — the priority queue Q — is sorted. Learn this one algorithm, understand all four.

Three categories for every state

During the search, every state in X belongs to exactly one of three categories. Understanding these categories is key to understanding why the algorithm is correct.

CategoryMeaningWhere it lives
UnvisitedNever encountered. The algorithm doesn't know it exists yet.Not in Q, not in visited set
AliveDiscovered — added to Q — but not yet fully processed. May have unvisited neighbors.In Q
DeadFully processed. Every neighbor has been discovered. Will never be revisited.Removed from Q, in visited set

The algorithm starts with only xI alive. Every other state is unvisited. The algorithm ends when either (a) a goal state is removed from Q — SUCCESS — or (b) Q is empty and no goal was found — FAILURE.

The algorithm (LaValle Figure 2.4)

1. Initialize
Mark xI as visited. Insert xI into Q. Set parent[xI] = null.
2. Is Q empty?
If yes → FAILURE (no path from xI to XG exists). Return null.
3. x ← GetFirst(Q)
Remove the "best" element from Q. The definition of "best" is what distinguishes BFS / DFS / Dijkstra / A*.
4. Is x ∈ XG?
If yes → SUCCESS. Trace parent pointers from x back to xI for the full path.
5. Expand x
For each u ∈ U(x): compute x' = f(x, u). If x' is unvisited: mark visited, set parent[x'] = x, insert x' into Q.
↻ back to step 2
Parent pointers are how you recover the actual plan. The loop above determines whether a path exists. To find the path itself, record parent[x'] = x each time you discover x' from x. At SUCCESS: trace backwards — goal → parent[goal] → parent[parent[goal]] → … → xI. Reverse that list and you have the action sequence.

What "systematic" means — and why it matters

LaValle uses the word systematic precisely. On a finite graph: systematic means eventually visiting every state reachable from xI. On an infinite graph: systematic means finding a solution in finite time, if one of finite length exists. The visited-state check at step 5 is exactly what makes the algorithm systematic — without it, cycles would cause infinite loops.

Being systematic is a guarantee, not a hope. A systematic algorithm can tell you definitively "no path exists" — because it will have exhausted every reachable state. A non-systematic algorithm that loops on cycles can never safely report failure. This is why the visited-state check is non-negotiable in production planners.

Completeness guarantee: The forward search algorithm with the visited-state check is complete on finite graphs — it finds a solution if one exists, and reports failure if none exists. This is the first correctness property we demand of any planning algorithm.

Forward search vs backward search — a preview

Everything so far searches forward from xI toward XG. LaValle also discusses backward search: start from XG and search backward toward xI. At each step, instead of asking "which states can I reach from x?", ask "which states can reach x?"

Backward search uses a backward action space U−1(x): the set of actions that could lead into state x. For our grid, backward actions are the same as forward actions (the grid is undirected), but in general problems (one-way doors, asymmetric transitions), they differ. Chapter 8 of this lesson covers backward search in detail.

The payoff: bidirectional search runs both simultaneously, meeting in the middle. Theoretical complexity drops from O(bd) to O(bd/2) — an exponential improvement. On large graphs, bidirectional BFS can be 1000× faster than forward BFS alone.

Forward Search Step-Through

Watch BFS (FIFO queue) explore a 7×7 grid. Orange = alive (in Q). Teal = dead (processed). Gray = unvisited. Path shown in yellow when goal is reached.

Ready — click Step or Run

Concrete trace on the warehouse grid

Let's trace forward search (BFS) on a tiny 3×3 grid, no walls. Start xI = (0,0), goal XG = {(2,2)}. Neighbors explored in order: right, down (so (i,j+1) before (i+1,j)).

Iterx extractedQ beforeStates discoveredQ after
InitxI=(0,0)[(0,0)]
1(0,0)[(0,0)](0,1),(1,0)[(0,1),(1,0)]
2(0,1)[(0,1),(1,0)](0,2),(1,1)[(1,0),(0,2),(1,1)]
3(1,0)[(1,0),(0,2),(1,1)](2,0),(1,1)→skip[(0,2),(1,1),(2,0)]
4(0,2)[(0,2),(1,1),(2,0)](1,2)[(1,1),(2,0),(1,2)]
5(1,1)[(1,1),(2,0),(1,2)](2,1),(1,2)→skip[(2,0),(1,2),(2,1)]
6(2,0)[(2,0),(1,2),(2,1)](2,1)→skip[(1,2),(2,1)]
7(1,2)[(1,2),(2,1)](2,2)[(2,1),(2,2)]
8(2,1)[(2,1),(2,2)][(2,2)]
9(2,2)[(2,2)]GOAL! Return

Parent pointers: parent[(0,1)]=(0,0), parent[(1,0)]=(0,0), parent[(0,2)]=(0,1), parent[(1,1)]=(0,1), parent[(2,0)]=(1,0), parent[(1,2)]=(0,2), parent[(2,1)]=(1,1), parent[(2,2)]=(1,2).

Trace back: (2,2) → parent=(1,2) → parent=(0,2) → parent=(0,1) → parent=(0,0). Reversed: (0,0)→(0,1)→(0,2)→(1,2)→(2,2). A 4-step path — the optimal for a 3×3 grid with no walls.

Recovering the path — parent pointer trace

The algorithm tells you the path exists; parent pointers tell you what it is. The trick: every time you discover a new state x' from x, record parent[x'] = x. At the end, you have a chain from goal back to start. Trace it:

goal → parent[goal] → parent[parent[goal]] → … → xI

Reverse this list and you have the forward path. The action at each step is uk = xk+1 − xk (for grid problems). The plan length K equals the path length minus 1.

A subtle point: parent pointers implicitly define a shortest-path tree rooted at xI. Every state in the tree has a unique parent — the state from which it was first discovered. This tree covers all reachable states, not just the goal. If you need paths to multiple goals, BFS builds this tree for free.

python
from collections import deque

def forward_search(prob, mode='bfs'):
    """
    Generic forward search (LaValle Figure 2.4).
    mode='bfs'  → FIFO queue (breadth-first, shortest path)
    mode='dfs'  → LIFO stack (depth-first, any path)
    """
    x_I = prob.x_I
    X_G = prob.X_G

    visited = {x_I}
    parent  = {x_I: None}

    if mode == 'bfs':
        Q = deque([x_I])
        def get_first(): return Q.popleft()  # FIFO
        def insert(x):  Q.append(x)
    else:
        Q = [x_I]
        def get_first(): return Q.pop()    # LIFO
        def insert(x):  Q.append(x)

    while Q:                           # step 2: is Q empty?
        x = get_first()              # step 3: GetFirst

        if x in X_G:                  # step 4: goal check
            path = []
            cur  = x
            while cur is not None:
                path.append(cur)
                cur = parent[cur]
            return list(reversed(path))  # path from x_I to goal

        for u in prob.actions(x):    # step 5: expand
            xp = prob.f(x, u)
            if xp not in visited:
                visited.add(xp)
                parent[xp] = x
                insert(xp)

    return None  # FAILURE — no path exists

Putting it together — full demo on the warehouse grid

python
# Complete end-to-end planning on the 9×9 warehouse grid
prob = GridPlanningProblem(
    rows=9, cols=9,
    walls={(3,3),(3,4),(3,5),(4,3),(5,3),(5,4),(5,5)},  # U-shaped obstacle
    x_I=(0,0),
    X_G={(8,8)}
)

# Run BFS (forward search with FIFO)
path = forward_search(prob, mode='bfs')

if path:
    print(f"Path found! Length = {len(path)-1} steps")
    print(f"Path: {' → '.join(str(s) for s in path)}")
    # Reconstruct actions from path
    actions = [(path[k+1][0]-path[k][0],
                path[k+1][1]-path[k][1])
               for k in range(len(path)-1)]
    direction = {(-1,0):'↑',(1,0):'↓',(0,-1):'←',(0,1):'→'}
    print(f"Actions: {''.join(direction[a] for a in actions)}")
else:
    print("No path exists — goal is unreachable")
In the forward search algorithm, what happens if you remove the "if x' not visited" check on a graph with a cycle A → B → A?

Chapter 3: BFS vs DFS — Same Algorithm, Different Queue

Here is the key insight that makes the master algorithm so powerful: BFS and DFS are the same algorithm. Change one data structure — the order of Q — and the entire character of the search transforms. Understanding why this is true, and what the consequences are, is the subject of this chapter.

BFS — FIFO Queue

States discovered earliest are processed first. The effect: the search expands in concentric wavefronts. All states reachable in 1 step from xI are processed before any state reachable in 2 steps. All 2-step states before 3-step states. And so on.

  • Guarantees shortest path (fewest steps)
  • Systematic on finite AND infinite graphs
  • Time: O(|V| + |E|)
  • Memory: O(|V|) — stores entire active wavefront

DFS — LIFO Stack

Most recently discovered state is processed first. The effect: the search plunges deep — it follows one direction as far as possible before backtracking to try another. Like exploring a maze by always turning left.

  • Does NOT guarantee shortest path
  • Systematic only on finite graphs
  • Time: O(|V| + |E|)
  • Memory: O(b·d) — one path + backtrack info
Why DFS fails on infinite graphs (LaValle Figure 2.3a). Imagine an infinite grid. DFS might choose to dive right: (0,0) → (1,0) → (2,0) → (3,0) → … and never return. If the goal is one step up at (0,1), DFS will never find it — it explores an infinite branch first. BFS is immune: it visits ALL neighbors at distance k before moving to distance k+1. Since the goal is at distance 1, BFS finds it on the very first expansion.

When DFS is preferred in practice

Despite its drawbacks, DFS is preferred in several scenarios:

LaValle's perspective: DFS is not used in the primary planning algorithms of Chapter 2 — BFS, Dijkstra, and A* are preferred for their optimality and completeness guarantees. DFS appears mainly as a theoretical foil: understanding why DFS fails on infinite graphs clarifies exactly what BFS provides. The failure case is instructive.

Why BFS guarantees shortest path (the proof)

It's worth understanding why BFS gives the shortest path, not just that it does. The argument is short: BFS processes all states at distance d before any state at distance d+1. So when BFS first reaches the goal, it has explored all paths of length ≤ d — and found none that work (else it would have stopped). Therefore the d-step path it found must be optimal.

DFS doesn't have this property because it might find a long path — say d = 20 — even though a 3-step path exists. DFS just happens not to have explored that 3-step path yet when it found the 20-step one.

BFS optimality theorem: If all edge costs are equal (unit cost), BFS finds the path with the minimum number of edges. The proof is by induction on the wavefront depth: BFS visits all depth-k states before depth-(k+1) states, so the first time it reaches the goal, no shorter path could exist.

Concrete trace: 5×5 grid with one obstacle

Start (0,0), Goal (4,4), wall at (2,2). Neighbors explored in priority order: down, right, up, left.

StepBFS — Queue [FIFO]DFS — Stack [LIFO]
InitQ = [(0,0)]S = [(0,0)]
1Pop (0,0); push (1,0),(0,1). Q=[(1,0),(0,1)]Pop (0,0); push (0,1),(1,0). S=[(0,1),(1,0)]
2Pop (1,0); push (2,0),(1,1). Q=[(0,1),(2,0),(1,1)]Pop (1,0); push (1,1),(2,0). S=[(0,1),(1,1),(2,0)]
3Pop (0,1); push (0,2). Q=[(2,0),(1,1),(0,2)]Pop (2,0); push (3,0). S=[(0,1),(1,1),(3,0)]
4Pop (2,0); push (3,0). Q=[(1,1),(0,2),(3,0)]Pop (3,0); push (4,0). S=[(0,1),(1,1),(4,0)]
Wavefront expands uniformly outwardDives: (4,0)→(4,1)→(4,2)→(4,3)→(4,4) GOAL
Final8-step optimal path10-step non-optimal path

BFS finds the 8-step shortest path. DFS finds a valid but longer 10-step path — it dived south then east along the bottom edge rather than taking the diagonal shortcut. Both are correct plans; only BFS is guaranteed to be the shortest.

Memory tradeoff matters in practice. BFS uses O(bd) memory where b = branching factor and d = depth to goal. DFS uses O(b·d). For a 1000×1000 grid where goal depth is 500 steps: BFS might store ~4×500 = 2,000 alive states in its frontier. DFS stores ~500 states (one path). For very deep searches in large spaces, DFS's memory efficiency is decisive.
BFS vs DFS — Side by Side

Same 7×7 grid, same start and goal. Left: BFS expands in uniform waves. Right: DFS dives aggressively. Orange = alive. Teal = dead. Counters show nodes expanded so far.

Ready — click Step Both or Run

Iterative Deepening DFS (IDDFS) — the best of both worlds

A natural question: can we get BFS's optimality guarantee with DFS's memory efficiency? Yes — Iterative Deepening Depth-First Search (IDDFS) does exactly this. It runs DFS with a depth limit d = 1, 2, 3, … until the goal is found. At each iteration, any path longer than the current limit is cut off.

IDDFS is optimal (finds minimum-step path), complete, and uses only O(bd) memory. The cost: it re-expands states on each iteration. But the work is dominated by the final iteration, so the total extra work is only a constant factor (about b/(b−1)). For b=4 (grid), IDDFS does 33% more work than BFS while using a fraction of the memory.

AlgorithmOptimal?Complete?TimeSpace
BFSYes (unit cost)Yes (finite+infinite)O(bd)O(bd)
DFSNoYes (finite only)O(bm) m=max depthO(bm)
IDDFSYes (unit cost)Yes (finite+infinite)O(bd)O(bd)
DijkstraYes (weighted)Yes (non-neg costs)O((V+E)log V)O(V)
A*Yes (admissible h)Yes (non-neg costs)O(bd) worstO(bd)
python
def iddfs(prob):
    """Iterative deepening DFS — optimal, complete, O(bd) memory."""
    def dls(x, goal, depth, path, visited):
        """Depth-limited search."""
        if x in goal: return path[:]  # found!
        if depth == 0: return None      # depth limit hit
        for u in prob.actions(x):
            xp = prob.f(x, u)
            if xp not in visited:
                visited.add(xp)
                path.append(xp)
                result = dls(xp, goal, depth-1, path, visited)
                if result is not None: return result
                path.pop()
                visited.discard(xp)
        return None

    for d in range(1, 10000):       # depth limit grows: 1, 2, 3, ...
        result = dls(
            prob.x_I, prob.X_G, d,
            path=[prob.x_I],
            visited={prob.x_I}
        )
        if result is not None: return result
    return None
A robot searches a 100×100 grid. Both BFS and DFS find the goal. Which algorithm's resulting plan has fewer steps?

Chapter 4: Dijkstra's Algorithm — Optimal Search

BFS finds the path with fewest steps. But what if steps have different costs? Crossing a muddy aisle costs 3 time units; crossing a paved aisle costs 1. Two 8-step paths might cost 12 and 32 respectively. BFS cannot distinguish them. Dijkstra's algorithm sorts Q by cost instead of order, finding the minimum-cost path.

Dijkstra = forward search where Q is a min-heap sorted by cost-to-come C(x). The cost-to-come C(x) is the total cost of the cheapest path found so far from xI to x. When x is extracted from Q (the global minimum), its cost becomes proven optimal — written C*(x).

The update rule

C*(xI) = 0
C(x') ← min( C(x'), C*(x) + l(x, u) )    when discovering x' via x

Here l(x, u) is the edge cost — the cost of taking action u at state x. When x' is first discovered, C(x') = C*(x) + l(x, u). If x' is later discovered again via a different, cheaper path, C(x') is updated downward (decrease-key in the heap).

Why it's optimal — the inductive argument

The proof is elegant and worth knowing. Claim: when a state x is extracted from Q (minimum C element), C(x) = C*(x). Proof by induction:

Dijkstra requires all edge costs to be non-negative. If l(x, u) can be negative, the inductive proof breaks. A state might be extracted "optimally" with cost 5, but a later path through a cost-7 alive state plus a −3 edge gives cost 4 — better. Dijkstra would have already committed to 5. For negative edges, use Bellman-Ford (O(|V|·|E|), much slower) or restructure the problem to eliminate negatives.

Real application: road network routing

Dijkstra is the backbone of GPS routing. A city road network has intersections as states and roads as edges, with travel time as edge cost. Hundreds of millions of intersections — far too large to enumerate explicitly. In practice, Google Maps and similar systems use Dijkstra with two key optimizations: (1) bidirectional Dijkstra runs two simultaneous searches (forward from source, backward from destination) that meet in the middle, reducing search to roughly √|V| nodes each; (2) highway hierarchies precompute long-distance routes so queries stay local. Both are Formulation 2.2 under the hood.

The Open Source Routing Machine (OSRM), used by many navigation apps, uses Contraction Hierarchies: a preprocessing step that ranks nodes by importance and adds "shortcut" edges (if the fastest A→B→C path is known, add direct A→C shortcut). After preprocessing, Dijkstra on the contracted graph answers continent-scale queries in under a millisecond. The core algorithm is still Dijkstra; the speedup comes from a smarter graph representation.

Dijkstra vs BFS — when does it matter?

Dijkstra reduces to BFS when all edge costs are equal (l(x,u) = constant). In that case, cost-to-come is just the number of steps, so the min-heap ordering is identical to FIFO ordering. Running Dijkstra on unit-cost graphs is correct but slower than BFS (heap operations are O(log n) vs O(1) queue operations). Choose BFS when costs are uniform; choose Dijkstra when they're not.

Worked trace: 4×4 weighted grid

Horizontal edges cost 1. Vertical edges cost 2. Start (0,0), Goal (3,3). We process states in cost order:

ExtractC*(x)Updates to neighbors
(0,0)0(0,1)→C=1, (1,0)→C=2
(0,1)1(0,2)→C=2, (1,1)→C=3
(1,0)2(1,1)→min(3,4)=3 no update, (2,0)→C=4
(0,2)2(0,3)→C=3, (1,2)→C=4
(1,1)3(1,2)→min(4,5)=4, (2,1)→C=5
(0,3)3(1,3)→C=5
Eventually (3,3) extracted with C*=6

The minimum-cost path goes right three times then down three times: (0,0)→(0,1)→(0,2)→(0,3)→(1,3)→(2,3)→(3,3). Cost = 3 horizontal moves × 1 + 3 vertical moves × 2 = 3 + 6 = 9. Any other path to (3,3) would mix horizontal/vertical differently — but since horizontal is cheaper, the optimal strategy is to exhaust horizontal movement first. Dijkstra discovers this automatically.

Contrast with BFS: it would find a path with 6 steps (the minimum step count), but that path might mix horizontal and vertical moves — say 3+3 — costing the same 9. Or it might find a different 6-step path that cost 11. BFS cannot distinguish; Dijkstra can.

Dijkstra Step-Through — Weighted Grid

Each cell shows its current C(x). Orange = alive (in heap). Teal = dead (settled). The min-heap extracts the cheapest alive state each step. Path shown in yellow at the end.

Ready. Darker cells = higher move cost (1, 2, or 3).
python
import heapq
from math import inf

def dijkstra(prob):
    """
    Dijkstra's algorithm. Returns (path, cost) or (None, inf).
    prob.cost(x, u) must return non-negative edge cost l(x, u).
    """
    x_I = prob.x_I
    X_G = prob.X_G

    C      = {x_I: 0}       # cost-to-come estimates
    parent = {x_I: None}   # for path reconstruction
    heap   = [(0, x_I)]      # min-heap: (cost, state)

    while heap:
        c, x = heapq.heappop(heap)

        # Skip stale entries (x was already settled with lower cost)
        if c > C.get(x, inf):
            continue

        # Goal check — C*(x) is now proven optimal
        if x in X_G:
            path = []
            cur  = x
            while cur is not None:
                path.append(cur)
                cur = parent[cur]
            return list(reversed(path)), c

        # Expand x
        for u in prob.actions(x):
            xp = prob.f(x, u)
            new_c = c + prob.cost(x, u)   # C*(x) + l(x,u)
            if new_c < C.get(xp, inf):
                C[xp]      = new_c
                parent[xp] = x
                heapq.heappush(heap, (new_c, xp))

    return None, inf  # FAILURE — no path exists

Dijkstra's complexity

With a binary heap: O((|V| + |E|) log |V|). With a Fibonacci heap (more complex implementation): O(|V| log |V| + |E|) — faster for dense graphs. For our 9×9 grid: |V| = 81, |E| ≈ 288, so Dijkstra finishes in microseconds. For a city map with 106 intersections and 2.5×106 roads, it takes a few hundred milliseconds — still practical.

Data structure for QComplexityBest for
Binary heap (Python heapq)O((|V| + |E|) log |V|)Sparse graphs (most planning)
Fibonacci heapO(|V| log |V| + |E|)Dense graphs, large |E|/|V|
Array (unsorted)O(|V|²)Very dense graphs, small |V|
Dijkstra runs on a graph where one edge has cost −1; all others cost +1. Can the algorithm still guarantee finding the optimal path?

Chapter 5: A* Search — Heuristic Guidance

Dijkstra is optimal but blind. It expands states in all directions equally — even away from the goal. If you're navigating from New York to Los Angeles, Dijkstra spends time exploring Boston. A* search adds a compass: a heuristic estimate of the remaining cost, so the algorithm spends time expanding states that look promising.

A* sorts Q by:

f(x) = C*(x) + &Ĝ;*(x)

where C*(x) is the proven cost-to-come (same as Dijkstra) and Ĝ*(x) is a heuristic estimate of cost-to-go — how much it costs to reach the goal from x. We can't compute the true G*(x) exactly (that's the whole problem!), so we estimate it with a function we can evaluate cheaply.

Admissibility — the critical requirement

A heuristic is admissible if it never overestimates the true cost-to-go:

&Ĝ;*(x) ≤ G*(x)    for all x ∈ X

If Ĝ* is admissible, A* is guaranteed to find the optimal path. The proof uses the same inductive argument as Dijkstra, extended: when a state x is extracted with minimum f = C*(x) + Ĝ*(x), any cheaper path would require a node y on the frontier with f(y) ≤ f(x) — but x was the minimum, so no such y exists.

Manhattan distance — the canonical grid heuristic

For our warehouse grid with unit costs and goal g = (gi, gj):

&Ĝ;*(x) = |xi − gi| + |xj − gj|

This is admissible because the robot must take at least this many steps — it can't teleport, and grid movement is along axes only. Manhattan distance is exact when there are no walls (every step makes optimal progress). With walls, the true cost is larger, so the estimate remains a lower bound — admissible by construction.

Precomputed heuristics — pattern databases

For Rubik's cube and sliding puzzles, Manhattan distance is too weak — it ignores interactions between tiles. The state-of-the-art uses pattern databases: precompute the exact optimal cost to solve a sub-problem (e.g., just the corner cubie configuration) for every possible configuration of that sub-problem. Store this in a table. At runtime, look up the table value as Ĝ*. This is admissible (solving just corners is easier than the full cube) and extremely tight — usually within 2× of the true G*.

For our warehouse grid, we don't need this — Manhattan distance is tight and free to compute. But the pattern database idea illustrates a general principle: any precomputed lower bound on G*(x) is an admissible heuristic.

Designing admissible heuristics

A useful recipe for building admissible heuristics: solve a relaxed version of the problem. Remove constraints from the original problem, solve the easier version, and use that optimal cost as Ĝ*. Since the relaxed problem is easier, its optimal cost ≤ the original's — admissibility by construction.

Original problemRelaxationResulting heuristic
Grid with wallsRemove all wallsManhattan distance
Grid with wallsAllow diagonal movesChebyshev distance = max(|Δr|, |Δc|)
Grid with wallsAllow teleportationh = 0 (admissible but useless)
Rubik's cubeSolve each face independentlyPattern database heuristic

Tighter relaxations give better heuristics. Manhattan distance (remove walls, keep grid) is tighter than Euclidean distance (allow straight-line movement through walls). Both admissible; Manhattan gives fewer expanded nodes because it's closer to the true G*.

A spectrum of algorithms

Sort key for QAlgorithmOptimal?Behavior
insertion orderBFS (unit costs)Yes (fewest steps)Uniform wavefront
C*(x)DijkstraYes (minimum cost)Uniform cost wavefront
C*(x) + Ĝ*(x), admissibleA*Yes (minimum cost)Directed toward goal
Ĝ*(x) onlyGreedy best-firstNoFastest but can get stuck
C*(x) + Ĝ*(x), inadmissibleWeighted A*No (bounded suboptimal)Faster but potentially suboptimal
The efficiency win is dramatic. On a large grid labyrinth, Dijkstra might expand 5,000 cells before finding the goal. A* with Manhattan distance might expand only 200 — a 25× speedup — finding the same optimal path. The heuristic doesn't change correctness; it redirects effort toward cells that are actually on or near the optimal path.

What admissibility buys you — exactly

If Ĝ*(x) = 0 for all x → A* degenerates to Dijkstra (the heuristic provides no guidance). If Ĝ*(x) = G*(x) exactly → A* expands only nodes on the optimal path (perfect, oracle heuristic — impossible in general). Admissibility is the sweet spot: any admissible Ĝ* gives you optimality. Better admissible heuristics (closer to G*) give faster search. The challenge is designing heuristics that are tight but still admissible.

Consistency — a stronger condition

A heuristic is consistent (also called monotone) if for every state x and action u leading to x':

&Ĝ;*(x) ≤ l(x, u) + &Ĝ;*(x')

This is the triangle inequality for heuristics: the estimated cost from x cannot be more than one step's cost plus the estimated cost from x'. Consistency implies admissibility (but not vice versa). The practical benefit: with a consistent heuristic, A* never needs to re-open already-closed states. Each state is settled exactly once — like Dijkstra — making the implementation simpler and avoiding a correctness edge case.

Manhattan distance on a grid with unit costs is consistent: |xi − gi| + |xj − gj| ≤ 1 + |x'i − gi| + |x'j − gj| holds because one step can decrease Manhattan distance by at most 1. Any admissible heuristic derived from a problem relaxation is typically consistent too.

Consistency = admissibility + monotonicity. In practice, almost all admissible heuristics people use are also consistent. The distinction matters theoretically — inconsistent admissible heuristics require more careful implementation — but in grid planning, you rarely encounter the distinction.
Inadmissible heuristics: a calculated risk. If you use Ĝ*(x) = 2 × Manhattan_distance — overestimates by 2× — A* is no longer optimal. It might find a path that costs up to 2× the true optimum (weighted A* with weight w=2 has this guarantee). In practice, inadmissible heuristics are used when speed matters more than optimality. Real-time systems often accept 5% suboptimality for 10× speedup.
Dijkstra vs A* — Expansion Comparison

Same maze. Left: Dijkstra expands uniformly. Right: A* (Manhattan heuristic) expands toward the goal. Both find the same optimal path. Counters show nodes expanded — watch A* do more with less.

Ready — Dijkstra (left) vs A* with Manhattan (right)
python
import heapq
from math import inf

def manhattan(x, goal):
    return abs(x[0] - goal[0]) + abs(x[1] - goal[1])

def astar(prob, heuristic=None):
    """
    A* search with a pluggable heuristic.
    heuristic(x) → estimated cost-to-go from x.
    Must be admissible for optimal guarantee.
    """
    x_I  = prob.x_I
    goal = list(prob.X_G)[0]   # single goal for simplicity
    h    = heuristic or (lambda x: manhattan(x, goal))

    C      = {x_I: 0}
    parent = {x_I: None}
    heap   = [(h(x_I), 0, x_I)]  # (f = C + h, C, state)

    while heap:
        f, c, x = heapq.heappop(heap)

        if c > C.get(x, inf):    # stale entry
            continue

        if x == goal:
            path = []
            cur  = x
            while cur is not None:
                path.append(cur)
                cur = parent[cur]
            return list(reversed(path)), c

        for u in prob.actions(x):
            xp    = prob.f(x, u)
            new_c = c + prob.cost(x, u)
            if new_c < C.get(xp, inf):
                C[xp]      = new_c
                parent[xp] = x
                # Sort by f = cost-to-come + heuristic
                heapq.heappush(heap, (new_c + h(xp), new_c, xp))

    return None, inf

# Inadmissible heuristic — faster but may be suboptimal
def weighted_astar(prob, w=2.0):
    goal = list(prob.X_G)[0]
    h = lambda x: w * manhattan(x, goal)  # overestimates by w×
    return astar(prob, heuristic=h)        # still complete, not optimal

A* in practice — when is it dramatically faster?

A* truly shines when the state space has a natural metric and the goal is in a "known direction." Grid planning is the canonical case: Manhattan distance tells the algorithm "the goal is down and to the right, so expand downward and rightward states first." Dijkstra doesn't know this — it expands a full disc of equal-cost states in every direction.

A* performs poorly when: (a) the heuristic is weak (Ĝ*(x) ≈ 0 everywhere → degenerates to Dijkstra), or (b) the optimal path requires going "backwards" relative to the goal — e.g., a maze that forces a long detour away from the goal before arriving. In case (b), Ĝ* misleads the search initially, and A* may expand more nodes than Dijkstra before discovering the true path requires backtracking.

Worked numerical example: A* vs Dijkstra

5×5 grid, unit costs, wall at (1,2), (2,2), (3,2). Start (0,0), Goal (4,4). f(x) = C*(x) + Manhattan(x, (4,4)).

StepDijkstra extracts (C*)A* extracts (f = C* + h)
1(0,0) C=0(0,0) f=0+8=8
2(1,0) C=1 or (0,1) C=1(1,0) f=1+7=8 or (0,1) f=1+7=8
3(2,0),(1,1),(0,2) C=2 (all tied)(1,1) f=2+6=8 — same or (2,0) f=2+6=8
Expands all of left side of gridFocuses on lower-right quadrant
Final~20 nodes expanded~12 nodes expanded

A* discovers the wall at column 2 and routes around it to the right — fewer wasted expansions because it knew to go right and down from the start. Dijkstra expends equal effort exploring the left side of the grid even though it leads away from the goal.

A* memory usage warning. A* stores all generated states in memory — worst case O(bd). For deep searches (Rubik's cube at depth 20, say), this can exhaust memory before finding the goal. Variants like IDA* (iterative deepening A*) and SMA* (simplified memory-bounded A*) address this at the cost of some speed. In robotics and game planning, IDA* is common precisely because it uses O(bd) memory.
Your heuristic is h(x) = 2 × Manhattan_distance(x, goal). All edge costs are 1. Is this admissible, and what does A* do with it?

Chapter 6: Optimal Planning — The Cost Functional

So far we've asked: does a path exist? Now we ask: among all paths, which is best? This is Formulation 2.2 — optimization layered on top of Formulation 2.1. It introduces a cost functional that evaluates the quality of any complete plan.

What is a plan?

A plan of length K is a sequence of K actions (u1, u2, …, uK) applied starting from xI. Applying them produces a state sequence: x1 = xI, and xk+1 = f(xk, uk) for k = 1, …, K. The final state xF = xK+1.

The cost functional L assigns a number to any plan:

L(πK)  =  k=1K l(xk, uk)  +  lF(xF)

Two components:

The terminal cost trick — encoding feasibility as optimization

LaValle defines the terminal cost with a beautiful trick:

lF(x) = 0   if   x ∈ XG,     lF(x) = ∞   otherwise
This converts "plan must end in XG" into unconstrained optimization. Instead of minimizing L subject to the constraint xF ∈ XG, we minimize L freely. Any plan that misses the goal gets L = ∞ and automatically loses. Any plan that reaches the goal gets L = (running cost). We never need to explicitly enforce the goal constraint — the ∞ penalty does it for us.

Special cases — same algorithm, different objectives

The power of Formulation 2.2 is that one algorithm (Dijkstra / A*) solves all these problems by swapping in a different l:

l(x, u)L minimizesAlgorithm used
l(x, u) ≡ 0Nothing — just reach the goalBFS (any path)
l(x, u) ≡ 1Number of steps in the planBFS (fewest steps)
l(x, u) = Euclidean distanceTotal path length in metersDijkstra / A*
l(x, u) = energy(u)Total energy consumedDijkstra / A*
l(x, u) = time(x, u)Total travel timeDijkstra / A*
l(x, u) = danger(x)Total danger exposureDijkstra / A*

Dynamic programming principle behind L

The cost functional L has a recursive structure that is the foundation of dynamic programming and value iteration. Notice: if you split the plan πK into a first step (x1, u1) and the remaining plan πK-1 from x2:

L(πK) = l(x1, u1) + L(πK-1 starting from x2)

This is Bellman's principle of optimality: the tail of an optimal plan is itself optimal. If π* is optimal from xI to XG, then the sub-plan from x2 onward must be optimal from x2 to XG. Otherwise, we could replace the tail with a cheaper one, contradicting optimality of π*.

Bellman's principle unlocks backward computation. Instead of searching forward from xI, we can compute the optimal cost-to-go G*(x) for every state simultaneously by working backward from XG. This is value iteration — the subject of Chapter 7 of this lesson. Dijkstra and value iteration are two different computational strategies for the same mathematical object.

The cost functional as a language

One underappreciated feature of Formulation 2.2 is that it provides a language for expressing what you want — separate from how you compute it. You write down l(x, u) and lF(x) to describe your objective. Then you hand it to Dijkstra (or A* or value iteration) and get the answer. The algorithm doesn't care what l means physically — fuel, time, money, danger — it just minimizes it.

This separation of concerns — "what to optimize" vs "how to optimize it" — is the hallmark of good mathematical formalism. It's why Formulation 2.2 is worth memorizing: once you've cast your problem into it, the solution method follows automatically.

Multi-objective planning — a preview

What if you want to minimize both total time AND total energy? These are two different cost functionals Ltime and Lenergy. Optimizing both simultaneously is a multi-objective planning problem — beyond Formulation 2.2, which handles only one scalar objective. The standard approach: form a weighted combination L = w1Ltime + w2Lenergy and solve with Dijkstra. Different weights give different Pareto-optimal plans. LaValle covers multi-objective extensions in later chapters.

Why "plans" not "paths" — variable-length plans

In Formulation 2.2, we optimize over plans of a fixed length K. For each fixed K, this is a finite optimization — tractable by dynamic programming. But we don't know K in advance. LaValle handles this in two ways: (a) take the minimum over all K, which leads to the backward induction / value iteration framework (Chapter 7 of this lesson), or (b) use Dijkstra/A* which implicitly searches over all lengths simultaneously via the cost-to-come.

Comparing plans of different lengths is non-trivial. A 3-step plan costing 6 vs a 7-step plan costing 4: the shorter plan costs more per step but less overall? No — the 7-step plan costs less in total (L=4 vs L=6). We minimize L, not L/K. Length itself doesn't matter; total cost does. (Unless you add a per-step budget constraint — that's a different formulation.)
Cost-Aware Path Planning

Click cells to cycle their move cost: 1 → 2 → 3 → wall → 1. Numbers shown in cells. The optimal path (yellow) updates via Dijkstra as you change costs. L(π*) displayed below.

Click cells to toggle cost 1→2→3→wall→1 · L(π*) = —

Formulation 2.2 — the full code version

python
from math import inf
import heapq

class WeightedGridProblem(GridPlanningProblem):
    """Formulation 2.2: adds edge costs l(x,u) and terminal cost l_F."""

    def __init__(self, rows, cols, walls=None,
                 cell_costs=None, x_I=(0,0), X_G=None):
        super().__init__(rows, cols, walls, x_I, X_G)
        # cell_costs: dict (i,j) → cost to ENTER that cell
        self.cell_costs = cell_costs or {}

    # Running cost l(x, u): cost to enter the state we move to
    def cost(self, x, u):
        xp = self.f(x, u)
        return self.cell_costs.get(xp, 1)  # default cost 1

    # Terminal cost l_F(x): 0 if goal, ∞ otherwise
    def l_F(self, x):
        return 0 if x in self.X_G else inf

    # Cost functional L(π_K)
    def L(self, plan):
        """Evaluate cost functional on a plan (sequence of states)."""
        if len(plan) < 2: return self.l_F(plan[0])
        total = 0
        for k in range(len(plan) - 1):
            u = (plan[k+1][0] - plan[k][0],
                 plan[k+1][1] - plan[k][1])
            total += self.cost(plan[k], u)     # running cost
        total += self.l_F(plan[-1])            # terminal cost
        return total

# Example: robot must avoid expensive muddy zones
prob = WeightedGridProblem(
    rows=7, cols=7,
    cell_costs={(2,3):5, (3,3):5, (4,3):5},  # muddy corridor
    x_I=(0,0), X_G={(6,6)}
)
path, cost = dijkstra(prob)
print(f"Optimal cost: L(π*) = {cost}")
print(f"L evaluated directly: {prob.L(path)}")  # same number

Connection to Dijkstra

Dijkstra computes exactly L(π*) — the minimum value of L over all plans reaching XG. When Dijkstra extracts a goal state g from Q with cost C*(g), that IS the minimum of L. The lF = ∞ for non-goal states means any plan ending outside XG has L = ∞ — Dijkstra would never output such a plan as optimal. Formulation 2.2 and Dijkstra are two sides of the same coin.

What makes a good cost function — practical considerations

Choosing l(x, u) well is as much art as science. A few pitfalls to avoid:

Summary of Chapter 6. Formulation 2.2 adds a cost functional L to Formulation 2.1. L = Σ l(x,u) + lF(xF). The terminal cost trick (lF = ∞ outside XG) converts constrained feasibility into unconstrained minimization. Dijkstra computes L(π*) exactly. Bellman's optimality principle gives L its recursive structure, previewing value iteration in the next chapter.
Plan π reaches xG with total running cost Σl = 15. Plan π' does NOT reach xG and has total running cost Σl = 3. Using the Formulation 2.2 cost functional, which plan wins?

Chapter 7: Backward Value Iteration — The Key Algorithm

Forward search (Dijkstra, A*) starts from xI and races toward XG. There is a completely different strategy: start from XG and compute, for every state simultaneously, the optimal cost to reach the goal. This is backward value iteration — LaValle's central algorithm for optimal planning under a fixed horizon.

The result is a function G*: X → ℝ ∪ {∞} called the optimal cost-to-go. Once you have G*, extracting the optimal plan from any starting state is trivial — one greedy step at a time. Computing G* is the hard part; using it is free.

Defining G* precisely

Fix a horizon K (plan length). The optimal cost-to-go from stage k is:

G*k(xk)  =  minuk,…,uK  &Bigl[  i=kK l(xi, ui)  +  lF(xF)  &Bigr;

Read this as: starting at state xk at time k, what is the minimum cost achievable over the remaining K−k+1 steps? The boundary condition anchors the recursion:

G*K+1(x) = lF(x)  =  0   if   x ∈ XG,    ∞   otherwise

Deriving the Bellman equation — every step shown

Starting from the definition of G*k(xk), we derive the recursion that makes computation possible. The derivation has four steps — follow each carefully.

Step 1. Write out the minimization in full:
G*k(xk) = minuk,…,uK &Bigl[ l(xk,uk) + l(xk+1,uk+1) + … + l(xK,uK) + lF(xF) &Bigr]
Step 2. Pull the first term outside the min — it only depends on uk, not on the future choices uk+1,…,uK:
= minuk &Bigl[ l(xk,uk)  +  minuk+1,…,uK &Bigl[ l(xk+1,uk+1) + … + lF(xF) &Bigr] &Bigr]
Step 3. Recognize the inner minimization. It is exactly G*k+1(xk+1) — the optimal cost-to-go from the next state at the next time step. And xk+1 = f(xk, uk):
= minuk &Bigl[ l(xk,uk)  +  G*k+1(f(xk, uk)) &Bigr]
Step 4 — The Bellman Equation (LaValle eq. 2.11). This IS the algorithm:
G*k(x)  =  minu ∈ U(x) &Bigl[  l(x, u)  +  G*k+1(f(x, u))  &Bigr]
Compute G*K+1 first (boundary), then G*K, then G*K−1, all the way back to G*1. Each step touches every state once. Each state does |U(x)| comparisons. Total: O(K · |X| · |U|).

Worked example — 5 states, 4 steps (LaValle Example 2.3)

States {a, b, c, d, e}. Horizon K = 4. Start xI = a. Goal XG = {d}. We compute G*5 through G*1 using the Bellman equation. The complete table (every cell computed from the recurrence):

StateG*5G*4G*3G*2G*1
a646
b4264
c135
d024
e

Walk through key cells:

Complexity: O(K · |X| · |U|). Each of the K backward iterations visits every state (|X| = 5 here) and tries every action (|U| ≤ 4 for a grid). Total work scales linearly in K. Compare to brute-force enumeration of all paths, which is exponential in K. Value iteration is the efficient alternative.

The algorithm in code

python
from math import inf

def backward_value_iteration(prob, K):
    """
    Backward value iteration (LaValle eq. 2.11).
    Returns G[k][x] = G*_k(x) for k = 1..K+1.
    prob.X_G, prob.states(), prob.actions(x), prob.f(x,u), prob.cost(x,u)
    """
    states = list(prob.states())

    # G[K+1] = boundary condition: l_F(x)
    G = [{x: (0 if x in prob.X_G else inf) for x in states}]

    # Backward induction: compute G_K, G_{K-1}, ..., G_1
    for k in range(K, 0, -1):
        G_next = G[-1]            # G*_{k+1}
        G_k = {}
        for x in states:
            best = inf
            for u in prob.actions(x):
                xp = prob.f(x, u)
                val = prob.cost(x, u) + G_next.get(xp, inf)
                if val < best:
                    best = val
            G_k[x] = best
        G.append(G_k)

    G.reverse()
    return G

# Extract optimal plan greedily from G*
def extract_plan(prob, G, x_I, K):
    path, x = [x_I], x_I
    for k in range(K):
        best_u, best_val = None, inf
        for u in prob.actions(x):
            xp  = prob.f(x, u)
            val = prob.cost(x, u) + G[k+1].get(xp, inf)
            if val < best_val:
                best_u, best_val = u, val
        if best_u is None: break
        x = prob.f(x, best_u)
        path.append(x)
        if x in prob.X_G: break
    return path
Value Iteration — Wavefront from Goal

Each cell shows its current G* value. Click Step to run one backward iteration — watch the wavefront of finite values expand outward from the goal (red). Cells that can never reach the goal stay dark. After all steps, click Show Plan to highlight the optimal path from the green start.

Ready — G* initialized: goal=0, all others=∞
In backward value iteration, why do we compute G*K+1 FIRST (the boundary) and then G*K, G*K−1, … G*1 — rather than going forward from G*1?

Chapter 8: Variable-Length Plans — Convergence to Stationarity

Chapter 7's backward value iteration required knowing K in advance. But the warehouse robot doesn't know how many steps it will need — it just needs to reach the goal. Formulation 2.3 fixes this by adding a termination action uT: once applied, the robot stays put and accumulates no further cost.

This elegant addition converts the fixed-horizon problem into a variable-length one. Plans (u1, u2) and (u1, u2, uT, uT, …) become equivalent — the termination action pads any plan to any length without changing its cost. Now we can optimize over all plan lengths simultaneously.

The termination action — making it precise

Termination action uT: At any state x, applying uT sends f(x, uT) = x (self-loop) and l(x, uT) = 0 (zero cost). Once you reach xG ∈ XG, you apply uT forever. The robot is "done." The terminal cost lF is collected once, at the first application of uT at the goal.

The stationary algorithm

With uT in the action space, run backward value iterations without stopping at K. Instead, run until the values stop changing:

G*k−1(x) = G*k(x)    for all x ∈ X

When this holds, the iterations have converged. The common value is simply called G*(x) — the stationary optimal cost-to-go. It satisfies:

G*(x) = minu ∈ U(x) &Bigl[ l(x,u) + G*(f(x,u)) &Bigr]

Notice: no subscript k. This is the fixed-point equation — G* is its own "next iteration."

Traced example — 5-state graph to stationarity

Stateiter 0iter 1iter 2iter 3iter 4 (stable)
a644
b4222
c1111
d00000
e

Between iterations 3 and 4, values stop changing — convergence reached. G*(a) = 4: three steps, not four. The extra flexibility of variable-length plans found a cheaper route than the fixed-horizon K=4 case.

Recovering the optimal plan from G* (eq. 2.19)

Once G* is known, the optimal action at any state x is the greedy minimizer:

u*(x) = argminu ∈ U(x) &Bigl[ l(x, u) + G*(f(x, u)) &Bigr]

Trace from xI = a:

The policy is a map, not a sequence. u*(x) is defined for every reachable state x — not just states on the path from xI. If the robot is knocked off course and ends up at b directly, u*(b) = go to c is still optimal. A policy is robust to deviations; a fixed action sequence is not. This robustness is the key advantage of backward value iteration over forward path-finding.
python
from math import inf

def stationary_value_iteration(prob, tol=1e-9, max_iter=10000):
    """Variable-length backward value iteration (Formulation 2.3).
    Runs until convergence. Returns G_star: dict {state: G*(state)}."""
    states = list(prob.states())
    G = {x: (0 if x in prob.X_G else inf) for x in states}

    for _ in range(max_iter):
        G_new, delta = {}, 0
        for x in states:
            best = 0 if x in prob.X_G else inf
            for u in prob.actions(x):
                xp  = prob.f(x, u)
                val = prob.cost(x, u) + G.get(xp, inf)
                if val < best: best = val
            G_new[x] = best
            if G[x] < inf and G_new[x] < inf:
                delta = max(delta, abs(G_new[x] - G[x]))
        G = G_new
        if delta <= tol: break
    return G

def optimal_policy(prob, G_star):
    policy = {}
    for x in prob.states():
        if x in prob.X_G or G_star[x] == inf: continue
        best_u, best_val = None, inf
        for u in prob.actions(x):
            xp  = prob.f(x, u)
            val = prob.cost(x, u) + G_star.get(xp, inf)
            if val < best_val: best_u, best_val = u, val
        policy[x] = best_u
    return policy
Convergence to Stationarity

Values propagate outward from the goal (red) each iteration. The bar chart on the right tracks max delta per iteration — convergence happens when it hits zero. Click Step to advance one iteration and watch cells stop changing.

Iteration 0 — max Δ = ∞
Stationary value iteration converges when G*k−1(x) = G*k(x) for all x. What does this mean geometrically in the grid?

Chapter 9: Showcase — BFS · Dijkstra · A* Face-Off

Everything we've built — BFS, Dijkstra, A* — runs side by side on the same maze. All three start simultaneously. Watch the wavefronts diverge. Count the expanded nodes. The differences that were abstract in earlier chapters become viscerally clear when you see them in motion.

The core insight in one picture: BFS expands a uniform disc. Dijkstra expands a cost-weighted disc. A* expands an ellipse pointed at the goal. Same path found; dramatically different work done.

Algorithm comparison table

AlgorithmQ sorted byOptimal?RequiresComplexity
BFSFIFO (insertion order)Yes (unit cost)O(V+E)
Dijkstramin C*(x)YesNon-neg costsO(V log V + E)
A*min C*(x) + Ĝ*(x)YesAdmissible hO(V log V + E)
Greedy best-firstmin Ĝ*(x) onlyNoO(V log V + E)
DFSLIFONoO(V+E)

Dijkstra as smart forward value iteration

There is a deep connection between backward value iteration (Chapter 7) and Dijkstra. Both compute optimal costs — but from opposite directions and with different data structures.

Dijkstra is essentially forward value iteration with a priority queue that ensures states are processed in order of increasing C*(x). When x is popped from the heap, C*(x) is final — exactly like a state whose G* has stabilized in backward VI. The three categories (unvisited / alive / dead) in forward search map directly to (∞ / tentative / converged) in value iteration.

Why Dijkstra requires non-negative costs — revisited. In value iteration terms: when a state becomes "dead," we need to guarantee no future path can improve its cost. Non-negative edges ensure any extension of the current minimum-cost path only increases cost. Negative edges could create a longer path that is cheaper — the "dead" state would need reopening, breaking Dijkstra's one-pass structure.
Showcase — Three Algorithms, One Maze

BFS (left) · Dijkstra (center) · A* with Manhattan (right). All three search the same maze simultaneously. Orange = alive. Teal = dead. Yellow = final path. A* typically finishes with fewest expansions. Click New Maze to generate a different layout.

Ready — click Step All or Run
BFS: 0 expanded Dijkstra: 0 expanded A*: 0 expanded

What the showcase teaches

The takeaway: A* is not always better than Dijkstra. It wins when the heuristic is informative and the goal is far. When the heuristic is weak or the problem has unusual structure (winding paths that force detours away from the goal), Dijkstra may be preferable. Always profile before committing to A*.
On a grid where ALL edge costs are 1, which algorithm finds the path with the fewest edges fastest (with the lowest overhead)?

Chapter 10: Complexity & the State Space Explosion

Discrete planning algorithms are polynomially efficient in the size of the state space |X|. The problem is that |X| grows exponentially in the number of variables describing the state. A robot with n joints, each discretized to D angles, has |X| = Dn states. This is the curse of dimensionality — the central challenge of motion planning.

Complexity of the core algorithms

AlgorithmTimeSpaceNotes
BFSO(|V|+|E|)O(|V|)All reachable states in queue
DFSO(|V|+|E|)O(b·d)d = depth, b = branching factor
DijkstraO((|V|+|E|) log |V|)O(|V|)log factor from heap ops
A*O(bd) worstO(bd)O(d) best with perfect heuristic
Backward VI (fixed K)O(K·|X|·|U|)O(|X|)Two arrays: current and next G*
Stationary VIO(D·|X|·|U|)O(|X|)D = graph diameter

State space explosion — the real enemy

Problem|X|At 109 states/sec
9×9 grid81< 1 µs
6-DOF arm, 360 angles each3606 ≈ 2×1015~23 days
Rubik's cube4.3×1019~1.4 billion years
Chess (mid-game)~1047Heat death of universe

The algorithms of Chapter 2 are theoretically clean but practically limited to small state spaces. Everything in Chapters 4–6 of LaValle (configuration spaces, sampling-based planning) is a response to this explosion — planning in continuous spaces without ever discretizing them fully.

Approaches to state space explosion

The complete cheat sheet

ConceptFormula
Transition functionxk+1 = f(xk, uk)
Cost functionalL = ∑ l(xk, uk) + lF(xF)
Terminal cost tricklF(x) = 0 if x ∈ XG, else ∞
Bellman equationG*k(x) = minu{l(x,u) + G*k+1(f(x,u))}
Stationary fixed pointG*(x) = minu{l(x,u) + G*(f(x,u))}
Optimal policyu*(x) = argminu{l(x,u) + G*(f(x,u))}
AdmissibilityĜ*(x) ≤ G*(x) for all x
A* sort keyf(x) = C*(x) + Ĝ*(x)
Dijkstra updateC(x') ← min(C(x'), C*(x) + l(x,u))
A robot arm has 5 joints, each discretized to 100 angles. How many states does the Formulation 2.1 state space have?

Chapter 11: Connections — Where This Leads

Chapter 2 is the bedrock. Every algorithm in the rest of LaValle either directly extends these ideas or needs them as prerequisite. This chapter maps the road ahead and closes the loop on the full arc of discrete planning.

What Chapter 2 didn't cover

How Chapter 2 extends to the rest of the book

ChapterExtension of Ch. 2Key new idea
Ch. 4: Configuration SpaceX becomes continuous C-spaceRobot geometry → obstacle regions
Ch. 5: Sampling-BasedGraph search on random roadmapsRRT, PRM — random exploration
Ch. 10: MDPsf(x,u) becomes probabilisticStochastic transitions, expected cost
Ch. 11: POMDPsState not fully observableBelief space planning
Ch. 13: Differential ConstraintsU(x) constrained by dynamicsKinodynamic planning

The thread from discrete planning to deep RL

The Bellman equation G*(x) = minu{l(x,u) + G*(f(x,u))} is deterministic. Replace f(x,u) with a transition probability P(x'|x,u) and you get the MDP Bellman equation. Value iteration works almost identically — replace min with expected value. Reinforcement learning is value iteration when the transition function and cost are unknown: estimated from experience rather than given analytically.

The full arc: Formulation 2.1 (deterministic, known) → MDP (stochastic, known) → POMDP (partial observability) → RL (unknown model, stochastic) → Deep RL (function approximation for enormous state spaces). Every step relaxes one assumption. The Bellman equation remains the organizing principle throughout.

Related lessons on this site

LessonConnection to Ch. 2
MDPExtends Formulation 2.1 with stochastic transitions and expected costs
RL AlgorithmsQ-learning = value iteration with unknown model, estimated online
Bayes FilterState estimation in Ch. 10 (POMDPs) requires belief-space tracking
What you now know from Chapter 2. Formulation 2.1 (five components). Formulation 2.2 (cost functional L = Σl + lF, Bellman principle). Formulation 2.3 (variable-length plans via uT). The Bellman equation and backward value iteration. Forward search (BFS, DFS, Dijkstra, A*) as instances of the master algorithm differing only in Q ordering. Their complexities, optimality guarantees, and failure conditions. This is the complete toolkit for deterministic discrete planning — every later chapter is an extension of this foundation.
The deterministic Bellman equation is G*(x) = minu{l(x,u) + G*(f(x,u))}. In a stochastic MDP where u from x leads to x' with probability P(x'|x,u), what does it become?