When you don't know where you are — planning in information space, localization, SLAM, and pursuit-evasion.
You're a robot dropped into a building. Your task: reach the exit. You have a map of the hallways, a motor, and a sensor that tells you how far the nearest wall is. But you don't know your starting position. The sensor gives you distance readings — but multiple rooms could produce identical readings. Where are you?
This is the core challenge of Chapter 12: planning when your state is unknown. In every prior chapter, the robot knew its position. Planning was hard enough then — now we strip away that luxury. The robot must simultaneously figure out where it is and what to do. These problems are entangled: a good action depends on your location, but your location depends on which actions you've taken and what you've observed in response.
The canvas below illustrates the predicament. A robot (blue) starts at an unknown position in a corridor. Black walls give distance readings. Multiple candidate positions (shown in gray) are all consistent with the first reading. Watch what happens when the robot moves and takes new measurements — some hypotheses survive, others are eliminated.
Three corridors are identical in shape. The robot (blue dot) gets a distance reading but can't distinguish which corridor it's in. Move the robot and watch hypotheses (gray) collapse.
LaValle groups this chapter around three compounding problems. Localization: the map is known, but the robot's position is unknown — estimate it from observations. Mapping: the robot's path is known, but the environment is unknown — build the map from observations. SLAM (Simultaneous Localization and Mapping): neither is known — solve both at once. Each is harder than the last.
| Problem | Known | Unknown | Key technique |
|---|---|---|---|
| Localization | Map | Robot pose | Particle filter, Bayes filter |
| Mapping | Trajectory | Environment | Occupancy grid, ray tracing |
| SLAM | Nothing | Both | EKF-SLAM, graph SLAM |
| Pursuit-evasion | Map | Intruder position | Visibility planning, lion & man |
Chapter 11 introduced the information space (I-space): instead of planning over physical states X, we plan over information states η — all the knowledge the robot currently holds. An information state summarizes the full history of observations and actions into a compact form that's sufficient for decision-making.
The key insight: sensing uncertainty doesn't make planning impossible. It makes the space of things to plan over larger and stranger. Instead of asking "what action is optimal in state x?", we ask: "what action is optimal given information state η?" The planning algorithm's structure is identical — only the state space has changed from X to I.
LaValle's formulation (Section 12.1) frames I-space planning as a standard planning problem where the state is now an information state η ∈ Ihist. The transition: when you apply action u in information state η and observe y, the new information state is η' = ηtrans(η, u, y). The goal is a set of information states IG that are "good enough" — perhaps all η where the robot is confident it's at the goal.
This is called the I-state transition equation. It's the counterpart to x' = f(x, u) in deterministic planning, but now the new state depends on the observation y as well — because observations update our knowledge. Planning in I-space means finding an action policy π(η) that drives information states toward IG.
Two flavors of I-space appear in Chapter 12. In the nondeterministic case (Section 12.1.2), we assume worst-case nature: the robot tracks the set of all states still consistent with every observation. This set is the nondeterministic I-state, sometimes written ηnd ⊆ X. It shrinks as observations rule out impossible states and grows when actions could send the robot anywhere in a set.
In the probabilistic case (Section 12.1.3), we maintain a full probability distribution b(x) = P(x | history) — the belief state. Belief state updates follow Bayes' rule: apply action, propagate uncertainty forward, then condition on the new observation. This is the foundation of all the filter-based methods that follow.
Localization is the problem where the map is given and the robot must determine its own position. Think of waking up in a hotel room in a foreign city. You can see out the window. You know the city's street layout. But you don't know which room you're in. Every observation narrows down the possibilities until — ideally — only one location is consistent.
Formally: the robot operates in a known environment X with known map M. It takes action uk and receives observation yk at each step. The goal is to maintain a belief bk(x) = P(xk = x | y0..k, u0..k-1) that concentrates on the robot's true position over time.
Global localization: the robot starts completely lost — initial belief is uniform across all positions. Must figure out location from scratch. Hardest case.
Position tracking: the robot knows where it started (initial belief is sharp), but sensor and motion noise accumulate over time. The filter prevents drift from growing unbounded. Easier — the filter never has to "find" the robot, only keep track.
Kidnapped robot: the robot knew where it was, but was suddenly moved without warning. The belief still peaks at the old location, but the true position has changed. The filter must detect the inconsistency and relocalize. Hardest case for tracking-based systems.
The observation model P(y | x) is the heart of localization — it says how likely observation y is if the robot is truly at position x. For a range sensor (laser or sonar), a simple model: expected range rexp(x, θ) is computed by ray-casting through the known map in direction θ. Actual measurement rmeas is modeled as Gaussian: P(rmeas | x) = Ν(rexp(x,θ), σ2). A robot at the right position will get readings close to expected. A robot in the wrong position gets large discrepancies — low likelihood — and its weight in the belief drops.
The product is over all range readings i (multiple beams). This model assumes independent noise on each beam — a simplification, but it works well in practice.
The motion model P(x' | x, u) says: if I'm at x and apply action u, where do I end up? For a wheeled robot with velocity commands (v, ω), the odometry model computes the expected new pose and adds Gaussian noise in translation and rotation. The noise parameters are tuned to match the actual wheel slippage and encoder error of the specific robot.
The Kalman filter is elegant but works only for Gaussian distributions over continuous state. For localization on a grid — where the belief can be multimodal (three corridors, all look identical) — we need something more flexible. Enter the particle filter, also called Monte Carlo Localization (MCL).
A particle filter represents the belief not as a formula but as a cloud of samples. Each particle is a hypothesis: "the robot might be here." Particles in likely locations multiply; particles in unlikely locations die out. After enough observations, the cloud collapses onto the true location.
The simulation below runs a full particle filter on a 1D corridor. The robot (true position: red line) moves along a corridor with walls at known locations. The sensor measures distance to the nearest wall (noisy). Particles (blue dots) start uniformly distributed. Watch them converge.
The corridor has walls at fixed positions. Robot (red) moves right with small noise. Sensor measures nearest wall distance. N particles (blue) track the belief. Watch convergence from global uncertainty.
After weighting, particles near the wrong location have tiny weights. If you never resample, you'd have 200 particles with weights [0.001, 0.001, ..., 0.9, ...] — still wasting compute on dead hypotheses. Resampling with replacement from the weight distribution draws more samples from high-weight regions and fewer from low-weight regions. The effect: computation automatically concentrates where it matters.
python import numpy as np def particle_filter_step(particles, weights, action, obs, walls, sigma): # 1. Apply motion model with noise motion_noise = 0.05 particles = particles + action + np.random.randn(len(particles)) * motion_noise particles = np.clip(particles, 0, 1) # stay in corridor [0,1] # 2. Weight by sensor model P(obs | particle) for i, p in enumerate(particles): expected = min(abs(p - w) for w in walls) weights[i] = np.exp(-0.5 * ((obs - expected) / sigma)**2) weights += 1e-10 # prevent all-zero weights /= weights.sum() # 3. Resample indices = np.random.choice(len(particles), len(particles), p=weights) particles = particles[indices] weights = np.ones(len(particles)) / len(particles) return particles, weights
Flip the localization problem: now the robot knows exactly where it is (GPS, perfect odometry) but has never seen the environment before. The environment is unknown. The task: build a map from observations. This is the mapping problem.
The most common representation for 2D environments is the occupancy grid. The environment is discretized into a grid of cells. Each cell stores a probability P(occupied | observations so far) — the probability that there's an obstacle there. Start all cells at 0.5 (unknown). As the laser scanner sweeps, update each cell using Bayes' rule.
Working with raw probabilities causes numerical issues (products of many small numbers underflow). Instead, work in log-odds: l(x) = log(p/(1-p)). Positive l means likely occupied; negative l means likely free; zero means unknown. The Bayes update becomes addition in log-odds — cheap and numerically stable.
A laser ray that hits a cell increments l (occupied evidence). A ray that passes through a cell decrements l (free-space evidence). After many sweeps, occupied cells lock in at high positive l, free cells at high negative l, and only truly ambiguous cells stay near zero.
A robot (blue) at the center sweeps a laser. White = free, dark = occupied, gray = unknown. Click to add a wall obstacle, then rescan to see the grid update.
The standard occupancy grid has three hidden assumptions. First, cell independence: each cell is updated independently, ignoring that neighboring cells tend to be correlated (walls are contiguous). Second, known pose: the robot knows exactly where it is when each scan is taken. Third, static environment: obstacles don't move between scans. Violate any of these and the map degrades: ghosting from moving objects, blurring from unknown pose, jagged edges from independent cell updates.
You need the map to localize. You need to localize to build the map. Both are unknown at once. This is Simultaneous Localization and Mapping (SLAM) — one of the central open problems in robotics for decades, now largely solved in practice but still theoretically rich.
The key insight that unlocks SLAM: landmarks. If the robot can identify the same physical feature (a corner, a door frame, a unique reflector) across multiple observations from different positions, it gains a constraint: "wherever I was when I saw landmark 3 the first time, and wherever I was the second time, these two poses are geometrically related by the landmark's position." Accumulate enough such constraints and the system is overdetermined — you can solve for all poses and landmark positions simultaneously.
The original SLAM solution (Smith, Self, Cheeseman, 1987) used the Extended Kalman Filter. The state vector x = [robot pose, landmark1 position, ..., landmarkN position]. The covariance matrix P captures uncertainty in all variables and — crucially — the correlations between them. When you see landmark 3 again, you update not just its position estimate but also the robot pose and all other landmarks (because if you were wrong about landmark 3's position, you were probably wrong about nearby landmarks too).
EKF-SLAM has O(N2) complexity per step where N is the number of landmarks — the covariance matrix is N×N and every update touches it all. For 100 landmarks: 10,000 numbers updated per step. Fine. For 10,000 landmarks: 100 million numbers per step. Intractable. This motivated the shift to sparse graph-based methods.
Modern SLAM (2005+) builds an explicit pose graph: each robot pose is a node, each observation constraint is an edge. After collecting data, run a global nonlinear optimizer (Gauss-Newton, Levenberg-Marquardt) to find the best consistent assignment. The dramatic moment: loop closure. The robot returns to a previously-visited location. Recognizing this (via scan matching or visual place recognition) adds a long-range edge in the graph. The optimizer then "pulls" the entire trajectory into geometric consistency — the accumulated drift from odometry errors is corrected globally in one shot.
The robot traverses a loop. Odometry drift accumulates (red path). When loop closure is detected, graph optimization corrects all poses at once (blue corrected path). Toggle between drifted and corrected.
Imagine a security guard (the pursuer) searching a museum at night. Somewhere in the building is an intruder (the evader) who moves adversarially — always trying to avoid detection. The guard can see everything in its line of sight. Can the guard guarantee to spot the intruder, no matter how fast or clever the intruder is?
This is the pursuit-evasion problem (Section 12.4 in LaValle). It's a game between the pursuer and an adversarial evader. Sensing is via visibility — the pursuer sees any point in its unobstructed line of sight. The evader is "caught" when it enters the pursuer's visible region. The question is combinatorial: does a winning strategy for the pursuer even exist in a given environment, and if so, what is it?
The classical version (continuous, circular arena): a lion and a man both run at speed 1 inside a circle. Can the lion always catch the man? Surprisingly: in the continuous-time, continuous-space version, the lion can always match the man's radius from the center and converge — but the geometry is subtle. The discrete grid version used in robotics is cleaner: the evader moves one cell per step, the pursuer moves one cell per step, and the pursuer wins when it enters the evader's cell or the evader enters the pursuer's visible region.
A region is cleared if the pursuer can guarantee no evader is hidden there. A region becomes contaminated if the evader could have slipped back in while the pursuer was looking elsewhere. The pursuit strategy must systematically expand the cleared region without allowing recontamination.
In polygonal environments, this leads to a beautiful structure: the pursuer must guard certain critical edges (corridors, doorways) while sweeping the cleared region. LaValle shows (via results from Suzuki and Tanigawa, 1992) that the minimum number of pursuers needed to clear a polygonal environment equals the minimum number of "guards" needed to block all escapes during the sweep — a quantity related to the topology of the free space.
Pursuer (blue) sweeps through a L-shaped room. Green = cleared region. Red = contaminated. The evader (red dot) tries to stay in uncleared territory. Step through the pursuit strategy.
A robot arm must pick up a bolt. The bolt's orientation is unknown — it might be at any angle in [0°, 360°]. The gripper has no force sensor, only a camera that's occluded by the gripper itself during grasping. How does the robot guarantee a successful grasp?
This problem (Section 12.5) is manipulation under uncertainty. Unlike localization — where we track a continuous state — manipulation under uncertainty often uses a nondeterministic I-state: we track the set of all possible object configurations still consistent with what the robot has done and observed. A good grasp strategy must succeed for every configuration in this set.
A polygonal object rests on a table with unknown orientation θ ∈ [0°, 360°). Push it against a flat fence. After the push, the longest flat face of the polygon is flush against the fence — orientation is now determined by geometry, not initial conditions. Push in a perpendicular direction against a second fence. Now the position is also fixed. The robot has localized the object using only two pushes and no sensors.
LaValle formalizes this as I-space planning. The I-state is a subset of configuration space — the set of all (x, y, θ) values still possible. Each action reduces this set (ideally). The goal is an I-state equal to a single configuration: {(x*, y*, θ*)}. The sequence of actions that achieves this is called a sensorless plan.
The same principle extends to grasping. A parallel-jaw gripper closing on an object with unknown orientation: some gripper widths will definitely miss (too narrow), some will definitely catch (the polygon's narrowest chord is less than gripper width), some will sometimes catch and sometimes not. A compliant grasp strategy uses finger forces and geometry to guarantee the object is captured and re-oriented to a fixed configuration inside the gripper, regardless of initial orientation — no vision required.
All previous planning problems had information as a side effect: we acted to reach a goal, and observations updated our belief as a byproduct. But sometimes gathering information is the goal itself. An exploration robot must visit unknown terrain. A scientific instrument must be pointed to gather the most informative data. A surgeon must probe to determine tumor margins before cutting.
Information gathering planning (Section 12.6) explicitly optimizes the information value of actions. The robot chooses actions not to minimize distance to a goal, but to minimize uncertainty — to drive the belief distribution toward certainty as efficiently as possible.
Entropy H(b) of a belief distribution b quantifies how uncertain the robot is: H(b) = -∑x b(x) log b(x). A uniform belief (completely lost) has maximum entropy. A delta distribution (perfectly localized) has zero entropy. An information-gathering policy minimizes expected entropy after observations.
The information gain of an action u in belief state b is I(b, u) = H(b) - Ey[H(b' | y)]: entropy before minus expected entropy after taking action u and seeing observation y. Choose the action that maximizes information gain — this is the one-step greedy information gathering policy.
In a partially-known map, the robot must balance exploration (visiting unknown regions to reduce uncertainty) with exploitation (navigating efficiently through what's already known). Frontier-based exploration is a classic heuristic: identify the boundaries between known-free and unknown regions (frontiers), navigate to the nearest frontier, and repeat. Simple, effective in practice, and provably complete for finite environments.
Gray = unknown, white = explored free, dark = wall. Blue robot navigates to nearest frontier (orange). Watch the known region expand. Entropy decreases with each exploration step.
Combining information gathering with SLAM yields Active SLAM: the robot plans its trajectory to build the best possible map in minimum time. At each step, the robot chooses its next waypoint by balancing three criteria: information gain (how much will this reduce map uncertainty?), loop closure opportunity (does this allow me to revisit a known area and correct drift?), and travel cost (how far is it?). Active SLAM turns map building from passive data collection into deliberate scientific inquiry.
Chapter 12 sits at the intersection of robotics, estimation theory, game theory, and information theory. The ideas here are foundational to almost every real deployed robot and to several currently active research fronts.
| Concept | LaValle source | Chapter 12 role |
|---|---|---|
| I-space framework | Ch 11 | Foundation — planning is done in I-space |
| Stochastic transitions | Ch 10 (MDPs) | Needed for probabilistic belief propagation |
| POMDPs | Ch 10.6 | Localization is a POMDP; belief state = I-state |
| Combinatorial planning | Ch 6 | Pursuit-evasion uses visibility graph methods |
| Sampling-based planning | Ch 5 | Particle filters ARE sampling-based belief propagation |
FastSLAM (2002) replaced EKF-SLAM's O(N²) covariance with a Rao-Blackwellized particle filter: each particle maintains its own set of independent landmark Kalman filters. O(N log N) per step. Enabled real-time SLAM for the first time on large maps.
iSAM / g²o / GTSAM (2008-2012) brought sparse matrix methods to graph SLAM. The pose graph is solved incrementally as new constraints arrive, rather than from scratch. Now the dominant paradigm in production SLAM systems (Google Cartographer, LiDAR-based autonomous vehicle stacks).
Neural SLAM / NeRF-based SLAM (2020+) replaces the occupancy grid and landmark map with neural implicit representations. The robot builds a differentiable 3D model of its environment. Gradient descent on rendered images vs. camera frames jointly optimizes the map and robot trajectory.
| Limitation | What breaks | Modern fix |
|---|---|---|
| Grid localization scales poorly | O(|X|) per step — exponential in dimensions | Particle filter, KD-tree belief |
| EKF-SLAM is O(N²) | Slow for large maps (N > 1000 landmarks) | FastSLAM, graph SLAM |
| Pursuit-evasion is PSPACE-hard | Exponential in environment complexity | Heuristic guards, learned policies |
| Occupancy grids ignore structure | Cells treated independently — walls aren't smooth | Gaussian process maps, TSDF, NeRF |
| Sensorless manipulation assumes rigid bodies | Deformable objects, slipping, contact uncertainty | Compliant control, tactile sensing, sim-to-real |
"The goal of robotics is to close the loop between sensing and acting in an uncertain world. Every technique in Chapter 12 is a different way to maintain that loop in the face of ignorance — and to act optimally despite it."
— paraphrasing LaValle's framing throughout Ch 12