LaValle, Chapter 1

Introduction to Planning

What does it mean to plan? State, time, actions, and plans — the ingredients that unify robot arms, Rubik's cubes, and autonomous cars.

Prerequisites: Curiosity. No math required for this chapter — it's all motivation and vocabulary.
8
Chapters
5
Simulations
8
Quizzes

Chapter 0: The Problem

You have a complex mechanical system — a robot arm, a self-driving car, a protein that needs to fold correctly — and you want it to reach some goal. It starts in one configuration. It must arrive in another. Between the start and the goal lies a vast, tangled space of possibilities. Which sequence of moves, if any, gets you there?

That question — and all its variants — is planning. It sounds simple until you try to pin it down formally. What exactly is a "configuration"? What counts as a valid "move"? What does "best" path mean when we have multiple paths to choose from? LaValle's textbook is an attempt to answer all these questions with mathematical precision across a startling range of problem domains.

Chapter 1 is deliberately broad: it refuses to commit to a specific algorithm. Instead it surveys the landscape — the kinds of problems that planning applies to, the vocabulary needed to describe any planning problem, and the four-part structure of the book. Think of this chapter as building a shared language so that later chapters can be precise without being overwhelming.

Planning ≠ problem solving (but it's close). The term "planning" historically emphasized the temporal aspect — producing a sequence of actions that unfolds over time. "Problem solving" is the older AI term. LaValle treats them as nearly synonymous, but reserves "planning" for cases where the system must reason about its future state, not just a single-step decision. If the answer is a single choice, that's classification or optimization. If the answer is a multi-step sequence where each step changes what's possible next, that's planning.

Here is the core puzzle planning must solve: a system has a current state. The planner applies actions. Each action moves the system to a new state. The goal is to find a sequence of actions — a plan — that drives the system from the initial state to some desired goal state. Simple to say. Technically rich to formalize and solve.

Navigation Puzzle — A Mini Planning Problem

A robot (orange) must reach the goal (teal) by choosing actions. Click Step to try a random action, or Reset to restart. Notice: every cell is a state; every arrow is an action; the sequence of moves is a plan.

Steps taken: 0

Notice what just happened: even for a tiny 5×5 grid, a random sequence of actions often fails — it loops, wastes steps, or gets stuck. A plan (press "Show Plan") finds the shortest route instantly. The difference between random action and planned action is the central theme of this entire textbook.

Planning is about anticipating consequences. A random action-taker looks at the present. A planner looks at the future — it asks "if I take this action, what state will I be in next?" — and chains those predictions forward until it finds a sequence that reaches the goal. That forward simulation of consequences is what separates planning from reflexive behavior.
What is the fundamental output of a planning algorithm?

Chapter 1: Motivational Examples

Before building any theory, LaValle grounds the subject with a collection of problems that are startlingly diverse yet share a common mathematical structure. The goal is to show that planning is not a narrow robotics concern — it appears everywhere that a system must navigate a complex space of configurations to reach a target.

The Piano Mover's Problem

Move a grand piano from one room to another in a house with narrow doorways. The piano is rigid and large; the hallways curve and branch. You cannot disassemble the piano. Every possible orientation and position of the piano in the floor plan is a state. The set of all valid positions — those where the piano doesn't overlap any wall — forms the free configuration space. A plan is a continuous path through that space from start to goal.

This problem is the origin of modern motion planning. It was formalized by Schwartz and Sharir in the early 1980s, and proving that it is solvable at all was a major theoretical milestone. The piano has 3 degrees of freedom in 2D (x, y, rotation), giving a 3-dimensional configuration space — already too large to enumerate by brute force.

Rubik's Cube

There are 4.3 × 1019 distinct states of a standard 3×3×3 Rubik's cube. From any scrambled position, "God's algorithm" — the optimal solution — requires at most 20 moves. Finding it requires searching an immense discrete state space. This is pure combinatorial planning: no geometry, no physics, just a state graph with an enormous number of nodes.

Robot Navigation

A wheeled robot in an office building must navigate from reception to a specific room, avoiding walls, doors, and people. Unlike the piano, the robot has its own sensors — and often incomplete knowledge of the map. This problem mixes motion planning (finding a collision-free path) with decision-making under uncertainty (what to do when sensors are unreliable).

Protein Folding

A protein is a chain of amino acids. Its biological function depends entirely on how it folds into a 3D structure. The number of possible conformations is astronomical (Levinthal's paradox: a chain of 100 amino acids with 2 bond angles each has 2100 ≈ 1030 conformations). Planning algorithms that search this space efficiently are essential to computational biology.

Virtual Prototyping

Before building a car engine, engineers simulate assembly and disassembly in CAD software. Can part X be inserted into the housing without colliding with part Y? This is motion planning in a high-dimensional space of rigid body configurations — the same mathematical machinery as the piano mover's problem, but with thousands of parts.

Autonomous Driving

An autonomous car must plan a trajectory through traffic: avoiding other vehicles, obeying traffic laws, predicting pedestrian behavior, and smoothly merging. The state space includes the car's position, velocity, heading, and the positions of all nearby agents. The car must plan under uncertainty (other drivers are unpredictable), with continuous dynamics (physical constraints on acceleration), in real time.

Video Game AI

A game character needs to navigate a dungeon, avoid enemies, find items, and complete quests. Simple navigation uses A* search on a waypoint graph. More sophisticated game AI plans sequences of actions (patrol, hide, attack) given goals — this is exactly hierarchical task planning, the discrete planning of Chapter 2 applied to character behavior trees.

One framework, many domains. Piano moving, protein folding, and video game AI feel completely different. But they are all instances of the same mathematical abstraction: a system with a state, a set of actions that transition between states, and a goal. LaValle's contribution is identifying this common structure and building algorithms that exploit it — so improvements in motion planning transfer directly to robotics, biology, and game AI.
Problem Zoo — Comparing Planning Domains

Each domain has different state space size, action types, and constraints. Click a card to see where it falls on two axes: state space size vs. whether actions are continuous or discrete.

What do the piano mover's problem and the Rubik's cube have in common, despite being so different on the surface?

Chapter 2: State + Time + Actions

Every planning problem, regardless of domain, can be described using four fundamental ingredients. LaValle calls them the "basic ingredients" — and understanding them precisely is what allows us to write down a general theory instead of solving each problem from scratch.

State

A state is a complete description of the system at a single instant — everything the planner needs to know to predict what happens next. For the warehouse robot, the state is the robot's grid cell (row, column). For the Rubik's cube, the state is the arrangement of all 54 colored stickers. For a robot arm, the state is the vector of joint angles (θ₁, θ₂, ..., θₙ).

The set of all possible states is the state space, written X. This set can be finite (Rubik's cube has exactly 4.3 × 1019 states), countably infinite (a 1D integer lattice), or uncountably infinite (the real-valued joint angles of a robot arm). The character of X — its size and structure — determines which algorithms can work.

State must be Markov. A valid state must contain all information needed to determine the next state given an action. If you need to remember history to predict the future — for instance, if tire wear depends on total distance traveled — then you must include that history in the state. The Markov property (future depends only on current state, not on how we got there) is a design choice, not a law of nature. Choosing your state representation is the first creative act in problem formulation.

Time

Time in a planning problem can be discrete (step 1, step 2, step 3, ...) or continuous (the real line t ∈ ℝ). Discrete time is the norm for combinatorial problems (Rubik's moves, robot navigation on a grid). Continuous time appears in physics-based problems: a robot arm sweeping through space, a ball rolling down a ramp. The distinction matters algorithmically — discrete problems use graph search; continuous problems require differential equations.

Actions

An action is a transformation of the state. When the warehouse robot moves right, the action changes its state from (r, c) to (r, c+1) — if that cell is unblocked. The full description of how actions transform states is the state transition function, written f(x, u) = x', where x is the current state, u is the action, and x' is the next state.

The set of all available actions in state x is written U(x) — the action space at x. It can depend on x: the robot cannot move right if there's a wall to its right. For continuous systems, U(x) is often a set of differential equations (how velocity and acceleration constrain the next position), not a finite list.

State Transition Demo

Watch how state + action → next state. Drag the slider to pick a current state (robot column position). Buttons apply actions. The arrow shows the transition f(x, u) = x'.

Current position 3
f(3, right) = 4

The Goal

The goal is a subset XG ⊆ X of states the planner is trying to reach. For the warehouse robot, XG = {(8, 8)}. For the Rubik's cube, XG = {solved configuration}. For a robot arm trying to grasp an object, XG might be a set of joint configurations that all place the end-effector at the target location.

Note the asymmetry: the initial state xI is a single state. The goal XG can be a set. This flexibility matters: sometimes you don't care exactly how the goal is reached, just that some goal criterion is satisfied.

These four — state, time, actions, goal — are all you need to state any planning problem. The mathematical machinery of the rest of the book is just elaborations: what kind of state space (discrete or continuous)? What kind of time (discrete or continuous)? What kind of actions (deterministic or stochastic)? What kind of goal (single state, set, or optimality criterion)? Every chapter adds one more dimension to this basic framework.
A robot arm has 6 joints, each with a continuous angle in [0°, 360°]. What is the state space X?

Chapter 3: Plans vs Policies

Once we agree on state, time, actions, and a goal, we need to describe what the algorithm actually produces. The output of a planner is either a plan or a policy — and the distinction is fundamental to the structure of the whole book.

A Plan: Sequence of Actions

A plan is a finite sequence of actions: (u₁, u₂, u₃, ..., uₙ). It is computed before execution. You hand it to the robot, the robot executes action u₁ first, then u₂, and so on. If the world is perfectly predictable — the same action always produces the same state transition — this works perfectly. Apply the plan; reach the goal.

Plans are simple, cheap to execute, and easy to verify (just check the sequence). The catch: they assume the world is deterministic and that execution is perfect. Drop either assumption and a fixed sequence breaks. If the robot's wheel slips on step 3, the rest of the plan executes from the wrong state and may miss the goal entirely.

python
# A plan: just a list of actions to execute in order
plan = ['right', 'right', 'up', 'right', 'up']

def execute_plan(state, plan, transition):
    for action in plan:
        state = transition(state, action)   # one step forward
    return state

# Fine if the world is deterministic. Catastrophic if a step fails.

A Policy: State → Action Mapping

A policy (also called a feedback plan or strategy) maps every state to an action: π(x) = u. Instead of a fixed sequence, the robot looks at its current state and asks "what does my policy tell me to do here?" This lets the robot recover from deviations — if a wheel slips and it ends up in the wrong state, it just looks up π(wrong state) and continues.

Policies are robust. They are also more expensive to compute: instead of finding one path, you must fill in an action for every reachable state (and sometimes every state in X). For the 9×9 grid, that means computing an action for each of 81 cells. For a 1019-state space, storing the full policy is impossible — you must find compact representations.

python
# A policy: a function from state to action (or a lookup table)
policy = {
    (0,0): 'right', (0,1): 'right', (0,2): 'up',
    (1,2): 'up',  (2,2): 'right',
    # ... one entry per reachable state
}

def execute_policy(state, policy, transition, goal):
    while state != goal:
        action = policy[state]             # look up current state
        state = transition(state, action)  # move
    return state

# Robust to perturbations: wrong state → correct action automatically.
Plan vs policy is the first fork in the road. If your world is deterministic and you trust your actuators, a plan is enough. If your robot operates in the real world — with slippage, sensor noise, and unpredictable people — you need a policy that can respond to whatever state you actually find yourself in. Chapters 2–8 focus on plans. Chapters 9–12 introduce the decision-theoretic machinery needed to compute policies under uncertainty.
Plan vs Policy — Side-by-Side

Both solve the same grid. Left: plan (fixed sequence, shown in order). Right: policy (arrows at every cell, always points toward goal). Toggle noise to see how each handles a missed step.

Noise: OFF — plans and policies behave identically
A robot executes a fixed plan: [forward, forward, turn-left, forward]. Halfway through, a gust of wind pushes it slightly off course. What is the most likely outcome?

Chapter 4: Discrete vs Continuous

Perhaps the most consequential classification in all of planning: is the state space discrete or continuous? This single distinction drives almost every algorithmic choice in the book, because discrete and continuous problems live in different mathematical worlds and require fundamentally different tools.

Discrete Planning

In a discrete planning problem, both the state space X and the action space U are finite (or countably infinite). The state transition function f(x, u) maps pairs (state, action) to a next state. This is a directed graph: states are nodes, transitions are edges. Any graph search algorithm — BFS, DFS, Dijkstra, A* — can in principle solve it.

The Rubik's cube is discrete (4.3 × 1019 states, 18 moves). The 8-puzzle is discrete (9!/2 = 181,440 reachable states). The warehouse grid robot is discrete. Discrete planning is mathematically clean: you can write down the graph explicitly (for small problems) or navigate it implicitly (for large ones). The challenge is purely computational — the graph is enormous.

Continuous Planning

In a continuous planning problem, X is a continuous space (the real line, a rotation group, a manifold). The robot arm with 6 joints has X = [0, 2π]⁶ — a 6-dimensional torus of uncountably many states. You cannot enumerate them. You cannot build the graph. You must use algorithms that work with the geometry of the space directly: sampling-based planners (Chapter 5), potential fields (Chapter 7), or trajectory optimization (Chapter 14).

The piano mover's problem is continuous. Autonomous driving is continuous. Robot arm motion planning is continuous. These problems require different mathematical tools — algebraic geometry for exact methods, random sampling for approximate methods, differential equations for dynamic systems.

Discrete vs Continuous State Space

Left panel: discrete grid — every state is a distinct node. Right panel: continuous space — the robot can be anywhere. Drag the dot to see the difference in how "nearby" states are defined.

Resolution (discrete side) 6
Discretization bridges the two worlds. A common engineering trick: take a continuous problem, discretize the state space into a fine grid, and apply discrete search. This works — up to a point. A 6-joint arm discretized at 10 angles per joint gives 10⁶ = 1,000,000 states (manageable). At 100 angles per joint: 10¹² states (not manageable). Sampling-based methods (Chapter 5) handle the continuous case without discretizing at all — they sample random configurations and build a sparse graph on the fly.
PropertyDiscreteContinuous
State spaceFinite or countableUncountably infinite (ℝⁿ or manifold)
RepresentationExplicit graphGeometric/algebraic description
Algorithm familyGraph search (BFS, A*, Dijkstra)Sampling, potential fields, trajectory opt.
ExamplesRubik's cube, grid navigation, 8-puzzleRobot arms, autonomous driving, protein folding
LaValle coverageChapter 2Chapters 3–8, 13–15
A chess-playing program considers all possible board positions. Is this a discrete or continuous planning problem?

Chapter 5: Predictability & Sensing

So far we've assumed the world behaves exactly as expected: apply action u in state x, get state f(x, u) with certainty. But real robots operate in a fundamentally uncertain world. LaValle identifies two orthogonal axes of uncertainty that together define the planning problem's difficulty.

Axis 1: Predictability (Determinism vs Stochasticity)

In a deterministic system, every action has a unique outcome: f(x, u) is a function that returns exactly one state. Apply "move right" and the robot moves right. No exceptions, no surprises. Most of Part I of the textbook assumes determinism.

In a stochastic (or nondeterministic) system, an action can lead to multiple possible outcomes. A command to "move right" might move the robot right with probability 0.85, leave it in place (wheel slip) with probability 0.10, or move it right-and-slightly-forward with probability 0.05. Now the planner must reason about distributions over future states, not single future states. This is the domain of Markov decision processes (MDPs) and the subject of Chapters 9–11.

Nondeterminism vs stochasticity. LaValle distinguishes two flavors of unpredictable outcomes. In nondeterministic systems, we know which states are possible but not their probabilities (adversarial: assume worst case). In stochastic systems, we have probability distributions over outcomes. Nondeterminism → minimax planning (Chapter 10). Stochasticity → expected-cost planning / MDPs (Chapter 9).

Axis 2: Sensing (Full vs Partial Observability)

A system is fully observable if the robot always knows its current state exactly. The warehouse grid robot with a GPS locator: it always knows its (row, column). Full observability means f(x, u) is all the information you need.

A system is partially observable if the robot has sensors that give only noisy or incomplete information about the current state. A robot navigating by sonar can tell that there's a wall 2 meters ahead, but that observation is consistent with many different positions in the building. The robot must maintain a belief — a distribution over possible states — and plan over beliefs, not states. This gives rise to POMDPs (Partially Observable MDPs), covered in Chapter 11.

The Four Planning Regimes

The two axes — predictability and observability — create four regimes. Click each quadrant to see which part of the book handles it.

PredictabilityObservabilityProblem typeChapter
DeterministicFullClassical planning2–8
NondeterministicFullWorst-case planning10
StochasticFullMDP9
StochasticPartialPOMDP11
Sensors collapse uncertainty. If your sensor is perfect (full observability), you always know your state and the four-quadrant table collapses to two rows (deterministic vs stochastic). If your sensor is perfect AND your actions are deterministic, you're in the top-left quadrant: classical planning. Every bit of sensor quality you add moves you toward the easier cases. Every bit of actuator reliability you add does the same.
A drone navigates using GPS (position known precisely) but its rotors sometimes produce 10–20% less thrust than commanded (due to wind). Which quadrant does this problem occupy?

Chapter 6: The Four Parts

LaValle's book has 15 chapters organized into four parts. Understanding the structure of the book is itself useful: it tells you which tools belong to which problems, and which foundational ideas each later chapter depends on.

Book Structure Overview

The four parts of Planning Algorithms, and the dependencies between them. Click a part to highlight it.

Click a part button to see what it covers

Part I: Introductory Material (Chapters 1–2)

Chapters 1 and 2 introduce the vocabulary and the simplest case: discrete planning. Chapter 1 (this chapter) surveys motivational examples and establishes the basic ingredients — state, time, actions, plans. Chapter 2 formalizes discrete planning and covers graph search: BFS, DFS, Dijkstra, A*, and value iteration. Everything in Parts II–IV builds on these foundations.

Part II: Motion Planning (Chapters 3–8)

This part tackles planning in continuous spaces — the geometric and topological challenges of moving a robot body through a world of obstacles. Chapter 3 covers the geometric representations and rigid-body transformations needed to describe robot configurations. Chapter 4 introduces the configuration space abstraction — the single most important idea in motion planning. Chapters 5–8 develop sampling-based methods (RRT, PRM), combinatorial planners, potential fields, and feedback motion planning.

Part III: Decision-Theoretic Planning (Chapters 9–12)

This part handles uncertainty — both in outcomes (stochastic actions) and in observations (partial sensing). Chapter 9 covers Markov Decision Processes (MDPs) — the foundational model for stochastic sequential decision-making. Chapter 10 addresses nondeterministic planning (worst-case guarantees). Chapter 11 introduces POMDPs — planning when you can't observe your state directly. Chapter 12 covers game-theoretic planning (multiple agents with conflicting objectives).

Part IV: Differential Constraints (Chapters 13–15)

Real vehicles have physics. A car cannot instantly change direction. A quadrotor has rotor dynamics. Chapter 13 introduces differential models — state transition functions defined by differential equations (ẋ = f(x, u)). Chapter 14 covers trajectory optimization and optimal control. Chapter 15 addresses planning under differential constraints with motion primitives and kinodynamic planning.

The book's arc in one sentence. Part I gives the language. Part II adds geometry. Part III adds uncertainty. Part IV adds physics. Each part generalizes the previous one — a fully general planner (Part IV + Part III combined) handles geometry, uncertainty, AND dynamics simultaneously. That is the unsolved grand challenge of planning.
Part I: Discrete (Ch 1–2)
State spaces, graph search, value iteration. The algorithmic core.
↓ add geometry
Part II: Motion Planning (Ch 3–8)
Continuous C-space, RRT/PRM, potential fields. Geometry meets search.
↓ add uncertainty
Part III: Decision-Theoretic (Ch 9–12)
MDPs, POMDPs, games. Stochastic outcomes and partial sensing.
↓ add physics
Part IV: Differential Constraints (Ch 13–15)
Dynamics, trajectory optimization, kinodynamic planning.
A self-driving car planner must handle: noisy GPS (partial observability), unpredictable pedestrians (stochastic outcomes), AND physical acceleration limits (differential constraints). Which part of LaValle's book is MOST relevant?

Chapter 7: Connections

Planning algorithms are not an isolated field. They draw from and contribute to a rich ecosystem of adjacent disciplines. Understanding these connections helps you recognize when a planning idea has appeared in another guise — and import solutions across domain boundaries.

Planning and Artificial Intelligence

Classical AI planning — STRIPS, PDDL, planning-as-satisfiability — grew from the same roots as LaValle's work. The STRIPS system (1971) was an early formalization of state-space search for AI agents. LaValle's formulation generalizes it: STRIPS is discrete, deterministic, and fully observable — Chapter 2 territory. Modern AI planning adds probability (Chapter 9) and partial observability (Chapter 11).

Planning and Control Theory

Control theory asks: given a dynamical system (ẋ = f(x, u)), design u(t) to stabilize x at a desired setpoint. Planning asks: find a sequence u₁, u₂, ..., uₙ to move x from xI to XG. These overlap significantly in Part IV. Optimal control (Pontryagin, Bellman) directly informs kinodynamic planning. The Hamilton-Jacobi-Bellman equation from control theory is, in discrete form, exactly the value iteration of Chapter 2.

Planning and Machine Learning

Reinforcement learning (RL) is planning where the transition function f(x, u) is unknown and must be learned from experience. An RL agent learns a policy π(x) by interacting with the environment — this is Chapter 9's MDP framework, but with an unknown model. Deep RL (DQN, AlphaGo) uses neural networks to approximate the policy or value function. AlphaGo's tree search is essentially A* with learned heuristics — Part I algorithms with learned components.

Planning and Robotics

Robotics is the original motivation for motion planning. Every component of a robotics stack has a planning interpretation: localization maintains a belief over states (Chapter 11), mapping builds the world model, path planning is Chapter 5, trajectory tracking is Chapter 14, and task planning is Chapter 2. Modern "end-to-end" robotic learning attempts to collapse all these layers into a single learned model — essentially using RL (Part III) to bypass explicit planning.

LaValle's unifying framework is genuinely powerful. When you see a new problem — protein docking, supply chain optimization, video game AI, surgical robotics — ask: what is the state? What are the actions? Is it discrete or continuous? Deterministic or stochastic? Fully or partially observable? The answer places it precisely in the LaValle taxonomy and tells you which chapter to turn to.
The Planning Landscape

How planning algorithms relate to adjacent fields. Nodes are fields; edges show which mathematical ideas flow between them.

Where to Go Next

TopicResourceConnection
Discrete planning in depthLaValle Ch 2BFS, A*, value iteration on graphs
Geometric transformationsLaValle Ch 3Robot bodies and rigid-body math
Configuration spaceLaValle Ch 4The abstraction that unifies motion planning
Sampling-based plannersLaValle Ch 5RRT, PRM — practical motion planning
MDPs and RLSutton & Barto; LaValle Ch 9Planning under stochastic outcomes
Optimal controlBertsekas; LaValle Ch 14Differential constraints + cost minimization
"Planning is the problem of computing a course of action that will achieve a desired goal state given a model of the world."
— Steven LaValle, Planning Algorithms (2006)
Reinforcement learning (RL) differs from classical planning (Chapter 2) primarily because in RL: