What does it mean to plan? State, time, actions, and plans — the ingredients that unify robot arms, Rubik's cubes, and autonomous cars.
You have a complex mechanical system — a robot arm, a self-driving car, a protein that needs to fold correctly — and you want it to reach some goal. It starts in one configuration. It must arrive in another. Between the start and the goal lies a vast, tangled space of possibilities. Which sequence of moves, if any, gets you there?
That question — and all its variants — is planning. It sounds simple until you try to pin it down formally. What exactly is a "configuration"? What counts as a valid "move"? What does "best" path mean when we have multiple paths to choose from? LaValle's textbook is an attempt to answer all these questions with mathematical precision across a startling range of problem domains.
Chapter 1 is deliberately broad: it refuses to commit to a specific algorithm. Instead it surveys the landscape — the kinds of problems that planning applies to, the vocabulary needed to describe any planning problem, and the four-part structure of the book. Think of this chapter as building a shared language so that later chapters can be precise without being overwhelming.
Here is the core puzzle planning must solve: a system has a current state. The planner applies actions. Each action moves the system to a new state. The goal is to find a sequence of actions — a plan — that drives the system from the initial state to some desired goal state. Simple to say. Technically rich to formalize and solve.
A robot (orange) must reach the goal (teal) by choosing actions. Click Step to try a random action, or Reset to restart. Notice: every cell is a state; every arrow is an action; the sequence of moves is a plan.
Notice what just happened: even for a tiny 5×5 grid, a random sequence of actions often fails — it loops, wastes steps, or gets stuck. A plan (press "Show Plan") finds the shortest route instantly. The difference between random action and planned action is the central theme of this entire textbook.
Before building any theory, LaValle grounds the subject with a collection of problems that are startlingly diverse yet share a common mathematical structure. The goal is to show that planning is not a narrow robotics concern — it appears everywhere that a system must navigate a complex space of configurations to reach a target.
Move a grand piano from one room to another in a house with narrow doorways. The piano is rigid and large; the hallways curve and branch. You cannot disassemble the piano. Every possible orientation and position of the piano in the floor plan is a state. The set of all valid positions — those where the piano doesn't overlap any wall — forms the free configuration space. A plan is a continuous path through that space from start to goal.
This problem is the origin of modern motion planning. It was formalized by Schwartz and Sharir in the early 1980s, and proving that it is solvable at all was a major theoretical milestone. The piano has 3 degrees of freedom in 2D (x, y, rotation), giving a 3-dimensional configuration space — already too large to enumerate by brute force.
There are 4.3 × 1019 distinct states of a standard 3×3×3 Rubik's cube. From any scrambled position, "God's algorithm" — the optimal solution — requires at most 20 moves. Finding it requires searching an immense discrete state space. This is pure combinatorial planning: no geometry, no physics, just a state graph with an enormous number of nodes.
A wheeled robot in an office building must navigate from reception to a specific room, avoiding walls, doors, and people. Unlike the piano, the robot has its own sensors — and often incomplete knowledge of the map. This problem mixes motion planning (finding a collision-free path) with decision-making under uncertainty (what to do when sensors are unreliable).
A protein is a chain of amino acids. Its biological function depends entirely on how it folds into a 3D structure. The number of possible conformations is astronomical (Levinthal's paradox: a chain of 100 amino acids with 2 bond angles each has 2100 ≈ 1030 conformations). Planning algorithms that search this space efficiently are essential to computational biology.
Before building a car engine, engineers simulate assembly and disassembly in CAD software. Can part X be inserted into the housing without colliding with part Y? This is motion planning in a high-dimensional space of rigid body configurations — the same mathematical machinery as the piano mover's problem, but with thousands of parts.
An autonomous car must plan a trajectory through traffic: avoiding other vehicles, obeying traffic laws, predicting pedestrian behavior, and smoothly merging. The state space includes the car's position, velocity, heading, and the positions of all nearby agents. The car must plan under uncertainty (other drivers are unpredictable), with continuous dynamics (physical constraints on acceleration), in real time.
A game character needs to navigate a dungeon, avoid enemies, find items, and complete quests. Simple navigation uses A* search on a waypoint graph. More sophisticated game AI plans sequences of actions (patrol, hide, attack) given goals — this is exactly hierarchical task planning, the discrete planning of Chapter 2 applied to character behavior trees.
Each domain has different state space size, action types, and constraints. Click a card to see where it falls on two axes: state space size vs. whether actions are continuous or discrete.
Every planning problem, regardless of domain, can be described using four fundamental ingredients. LaValle calls them the "basic ingredients" — and understanding them precisely is what allows us to write down a general theory instead of solving each problem from scratch.
A state is a complete description of the system at a single instant — everything the planner needs to know to predict what happens next. For the warehouse robot, the state is the robot's grid cell (row, column). For the Rubik's cube, the state is the arrangement of all 54 colored stickers. For a robot arm, the state is the vector of joint angles (θ₁, θ₂, ..., θₙ).
The set of all possible states is the state space, written X. This set can be finite (Rubik's cube has exactly 4.3 × 1019 states), countably infinite (a 1D integer lattice), or uncountably infinite (the real-valued joint angles of a robot arm). The character of X — its size and structure — determines which algorithms can work.
Time in a planning problem can be discrete (step 1, step 2, step 3, ...) or continuous (the real line t ∈ ℝ). Discrete time is the norm for combinatorial problems (Rubik's moves, robot navigation on a grid). Continuous time appears in physics-based problems: a robot arm sweeping through space, a ball rolling down a ramp. The distinction matters algorithmically — discrete problems use graph search; continuous problems require differential equations.
An action is a transformation of the state. When the warehouse robot moves right, the action changes its state from (r, c) to (r, c+1) — if that cell is unblocked. The full description of how actions transform states is the state transition function, written f(x, u) = x', where x is the current state, u is the action, and x' is the next state.
The set of all available actions in state x is written U(x) — the action space at x. It can depend on x: the robot cannot move right if there's a wall to its right. For continuous systems, U(x) is often a set of differential equations (how velocity and acceleration constrain the next position), not a finite list.
Watch how state + action → next state. Drag the slider to pick a current state (robot column position). Buttons apply actions. The arrow shows the transition f(x, u) = x'.
The goal is a subset XG ⊆ X of states the planner is trying to reach. For the warehouse robot, XG = {(8, 8)}. For the Rubik's cube, XG = {solved configuration}. For a robot arm trying to grasp an object, XG might be a set of joint configurations that all place the end-effector at the target location.
Note the asymmetry: the initial state xI is a single state. The goal XG can be a set. This flexibility matters: sometimes you don't care exactly how the goal is reached, just that some goal criterion is satisfied.
Once we agree on state, time, actions, and a goal, we need to describe what the algorithm actually produces. The output of a planner is either a plan or a policy — and the distinction is fundamental to the structure of the whole book.
A plan is a finite sequence of actions: (u₁, u₂, u₃, ..., uₙ). It is computed before execution. You hand it to the robot, the robot executes action u₁ first, then u₂, and so on. If the world is perfectly predictable — the same action always produces the same state transition — this works perfectly. Apply the plan; reach the goal.
Plans are simple, cheap to execute, and easy to verify (just check the sequence). The catch: they assume the world is deterministic and that execution is perfect. Drop either assumption and a fixed sequence breaks. If the robot's wheel slips on step 3, the rest of the plan executes from the wrong state and may miss the goal entirely.
python # A plan: just a list of actions to execute in order plan = ['right', 'right', 'up', 'right', 'up'] def execute_plan(state, plan, transition): for action in plan: state = transition(state, action) # one step forward return state # Fine if the world is deterministic. Catastrophic if a step fails.
A policy (also called a feedback plan or strategy) maps every state to an action: π(x) = u. Instead of a fixed sequence, the robot looks at its current state and asks "what does my policy tell me to do here?" This lets the robot recover from deviations — if a wheel slips and it ends up in the wrong state, it just looks up π(wrong state) and continues.
Policies are robust. They are also more expensive to compute: instead of finding one path, you must fill in an action for every reachable state (and sometimes every state in X). For the 9×9 grid, that means computing an action for each of 81 cells. For a 1019-state space, storing the full policy is impossible — you must find compact representations.
python # A policy: a function from state to action (or a lookup table) policy = { (0,0): 'right', (0,1): 'right', (0,2): 'up', (1,2): 'up', (2,2): 'right', # ... one entry per reachable state } def execute_policy(state, policy, transition, goal): while state != goal: action = policy[state] # look up current state state = transition(state, action) # move return state # Robust to perturbations: wrong state → correct action automatically.
Both solve the same grid. Left: plan (fixed sequence, shown in order). Right: policy (arrows at every cell, always points toward goal). Toggle noise to see how each handles a missed step.
Perhaps the most consequential classification in all of planning: is the state space discrete or continuous? This single distinction drives almost every algorithmic choice in the book, because discrete and continuous problems live in different mathematical worlds and require fundamentally different tools.
In a discrete planning problem, both the state space X and the action space U are finite (or countably infinite). The state transition function f(x, u) maps pairs (state, action) to a next state. This is a directed graph: states are nodes, transitions are edges. Any graph search algorithm — BFS, DFS, Dijkstra, A* — can in principle solve it.
The Rubik's cube is discrete (4.3 × 1019 states, 18 moves). The 8-puzzle is discrete (9!/2 = 181,440 reachable states). The warehouse grid robot is discrete. Discrete planning is mathematically clean: you can write down the graph explicitly (for small problems) or navigate it implicitly (for large ones). The challenge is purely computational — the graph is enormous.
In a continuous planning problem, X is a continuous space (the real line, a rotation group, a manifold). The robot arm with 6 joints has X = [0, 2π]⁶ — a 6-dimensional torus of uncountably many states. You cannot enumerate them. You cannot build the graph. You must use algorithms that work with the geometry of the space directly: sampling-based planners (Chapter 5), potential fields (Chapter 7), or trajectory optimization (Chapter 14).
The piano mover's problem is continuous. Autonomous driving is continuous. Robot arm motion planning is continuous. These problems require different mathematical tools — algebraic geometry for exact methods, random sampling for approximate methods, differential equations for dynamic systems.
Left panel: discrete grid — every state is a distinct node. Right panel: continuous space — the robot can be anywhere. Drag the dot to see the difference in how "nearby" states are defined.
| Property | Discrete | Continuous |
|---|---|---|
| State space | Finite or countable | Uncountably infinite (ℝⁿ or manifold) |
| Representation | Explicit graph | Geometric/algebraic description |
| Algorithm family | Graph search (BFS, A*, Dijkstra) | Sampling, potential fields, trajectory opt. |
| Examples | Rubik's cube, grid navigation, 8-puzzle | Robot arms, autonomous driving, protein folding |
| LaValle coverage | Chapter 2 | Chapters 3–8, 13–15 |
So far we've assumed the world behaves exactly as expected: apply action u in state x, get state f(x, u) with certainty. But real robots operate in a fundamentally uncertain world. LaValle identifies two orthogonal axes of uncertainty that together define the planning problem's difficulty.
In a deterministic system, every action has a unique outcome: f(x, u) is a function that returns exactly one state. Apply "move right" and the robot moves right. No exceptions, no surprises. Most of Part I of the textbook assumes determinism.
In a stochastic (or nondeterministic) system, an action can lead to multiple possible outcomes. A command to "move right" might move the robot right with probability 0.85, leave it in place (wheel slip) with probability 0.10, or move it right-and-slightly-forward with probability 0.05. Now the planner must reason about distributions over future states, not single future states. This is the domain of Markov decision processes (MDPs) and the subject of Chapters 9–11.
A system is fully observable if the robot always knows its current state exactly. The warehouse grid robot with a GPS locator: it always knows its (row, column). Full observability means f(x, u) is all the information you need.
A system is partially observable if the robot has sensors that give only noisy or incomplete information about the current state. A robot navigating by sonar can tell that there's a wall 2 meters ahead, but that observation is consistent with many different positions in the building. The robot must maintain a belief — a distribution over possible states — and plan over beliefs, not states. This gives rise to POMDPs (Partially Observable MDPs), covered in Chapter 11.
The two axes — predictability and observability — create four regimes. Click each quadrant to see which part of the book handles it.
| Predictability | Observability | Problem type | Chapter |
|---|---|---|---|
| Deterministic | Full | Classical planning | 2–8 |
| Nondeterministic | Full | Worst-case planning | 10 |
| Stochastic | Full | MDP | 9 |
| Stochastic | Partial | POMDP | 11 |
LaValle's book has 15 chapters organized into four parts. Understanding the structure of the book is itself useful: it tells you which tools belong to which problems, and which foundational ideas each later chapter depends on.
The four parts of Planning Algorithms, and the dependencies between them. Click a part to highlight it.
Chapters 1 and 2 introduce the vocabulary and the simplest case: discrete planning. Chapter 1 (this chapter) surveys motivational examples and establishes the basic ingredients — state, time, actions, plans. Chapter 2 formalizes discrete planning and covers graph search: BFS, DFS, Dijkstra, A*, and value iteration. Everything in Parts II–IV builds on these foundations.
This part tackles planning in continuous spaces — the geometric and topological challenges of moving a robot body through a world of obstacles. Chapter 3 covers the geometric representations and rigid-body transformations needed to describe robot configurations. Chapter 4 introduces the configuration space abstraction — the single most important idea in motion planning. Chapters 5–8 develop sampling-based methods (RRT, PRM), combinatorial planners, potential fields, and feedback motion planning.
This part handles uncertainty — both in outcomes (stochastic actions) and in observations (partial sensing). Chapter 9 covers Markov Decision Processes (MDPs) — the foundational model for stochastic sequential decision-making. Chapter 10 addresses nondeterministic planning (worst-case guarantees). Chapter 11 introduces POMDPs — planning when you can't observe your state directly. Chapter 12 covers game-theoretic planning (multiple agents with conflicting objectives).
Real vehicles have physics. A car cannot instantly change direction. A quadrotor has rotor dynamics. Chapter 13 introduces differential models — state transition functions defined by differential equations (ẋ = f(x, u)). Chapter 14 covers trajectory optimization and optimal control. Chapter 15 addresses planning under differential constraints with motion primitives and kinodynamic planning.
Planning algorithms are not an isolated field. They draw from and contribute to a rich ecosystem of adjacent disciplines. Understanding these connections helps you recognize when a planning idea has appeared in another guise — and import solutions across domain boundaries.
Classical AI planning — STRIPS, PDDL, planning-as-satisfiability — grew from the same roots as LaValle's work. The STRIPS system (1971) was an early formalization of state-space search for AI agents. LaValle's formulation generalizes it: STRIPS is discrete, deterministic, and fully observable — Chapter 2 territory. Modern AI planning adds probability (Chapter 9) and partial observability (Chapter 11).
Control theory asks: given a dynamical system (ẋ = f(x, u)), design u(t) to stabilize x at a desired setpoint. Planning asks: find a sequence u₁, u₂, ..., uₙ to move x from xI to XG. These overlap significantly in Part IV. Optimal control (Pontryagin, Bellman) directly informs kinodynamic planning. The Hamilton-Jacobi-Bellman equation from control theory is, in discrete form, exactly the value iteration of Chapter 2.
Reinforcement learning (RL) is planning where the transition function f(x, u) is unknown and must be learned from experience. An RL agent learns a policy π(x) by interacting with the environment — this is Chapter 9's MDP framework, but with an unknown model. Deep RL (DQN, AlphaGo) uses neural networks to approximate the policy or value function. AlphaGo's tree search is essentially A* with learned heuristics — Part I algorithms with learned components.
Robotics is the original motivation for motion planning. Every component of a robotics stack has a planning interpretation: localization maintains a belief over states (Chapter 11), mapping builds the world model, path planning is Chapter 5, trajectory tracking is Chapter 14, and task planning is Chapter 2. Modern "end-to-end" robotic learning attempts to collapse all these layers into a single learned model — essentially using RL (Part III) to bypass explicit planning.
How planning algorithms relate to adjacent fields. Nodes are fields; edges show which mathematical ideas flow between them.
| Topic | Resource | Connection |
|---|---|---|
| Discrete planning in depth | LaValle Ch 2 | BFS, A*, value iteration on graphs |
| Geometric transformations | LaValle Ch 3 | Robot bodies and rigid-body math |
| Configuration space | LaValle Ch 4 | The abstraction that unifies motion planning |
| Sampling-based planners | LaValle Ch 5 | RRT, PRM — practical motion planning |
| MDPs and RL | Sutton & Barto; LaValle Ch 9 | Planning under stochastic outcomes |
| Optimal control | Bertsekas; LaValle Ch 14 | Differential constraints + cost minimization |
"Planning is the problem of computing a course of action that will achieve a desired goal state given a model of the world."
— Steven LaValle, Planning Algorithms (2006)