LaValle, Chapter 11

Sensors & Information Spaces

When the robot doesn't know where it is — how to reason, plan, and act using only what sensors reveal.

Prerequisites: Probability (distributions, Bayes' rule) + Chapter 10 MDPs. That's it.
10
Chapters
4+
Simulations
10
Quizzes

Chapter 0: The Robot Doesn't Know Where It Is

You're a robot placed blindfolded into a long hallway. Someone spins you around and puts you down. You know roughly what the hallway looks like — three doors on the left, a window at the far end — but you have no idea which part of the hallway you're in. Your GPS is offline. Your odometer has drift. The only thing you can do is listen: does the floor creak? Is there a draft from the window?

Every answer slightly updates your mental map of where you might be. You accumulate evidence. You navigate based on that accumulated evidence — not on a certain position, because you never have one. This is the central challenge of Chapter 11: how do you plan and act when you can only observe the world, not measure your state directly?

In Chapters 2 and 10, the robot always knew its state x. Feedback plans were maps from known states to actions: π(x) = u. Chapter 11 strips away that assumption. Now you only receive a sensor observation y — a noisy, incomplete glimpse of x. The question becomes: what do you do with a sequence of observations instead of a known state?

Blindfolded Robot — Sensor Observations vs. True Position

The robot (blue) is placed at a random hidden position in a 7-cell corridor. Three cells have landmarks (marked). The sensor returns "landmark nearby" or "open corridor" — but with noise. Click "New Observation" to get a reading and see how much uncertainty remains.

Place a robot and take observations.
The fundamental shift. When state is known, planning is hard enough — search over actions, compute costs. When state is hidden, planning gets a second dimension of hardness: you must simultaneously reason about what state you're in AND what actions to take given that uncertainty. Chapter 11 is about the first half — how to represent and update your uncertainty.

What changes from Chapter 10

In Chapter 10, the state x was always visible. The policy π(x) could condition on x directly. Now the state xk is hidden — you never observe it. Instead you receive observations yk from a sensor, where y is drawn from a distribution P(y | x) that depends on the hidden state. The robot must act based on something other than x.

That "something other" is the subject of this entire chapter. LaValle calls it the information state, or I-state. It encodes everything the robot knows from its entire history of observations and actions. Getting this concept right is the master key to the rest of Part IV of the textbook.

FeatureCh 10 (MDP)Ch 11 (Partial Observability)
State visibilityxk is knownxk is hidden
SensorPerfect / not neededyk ~ P(y | xk)
Policy inputπ(xk)π(ηk) — the I-state
Uncertainty sourceTransition noise onlyTransition noise + sensor noise
A robot's odometer drifts by ±10cm per meter traveled, but its GPS gives its exact position every 30 seconds. Between GPS fixes, the robot is uncertain about its state. Does Chapter 11 apply in those intervals?

Chapter 1: Observation Models

Before we can reason about uncertainty, we need to formalize how sensors work. A sensor takes the true hidden state x and produces an observation y. The crucial insight: sensors are not perfect measurement tools — they are probabilistic channels. The same state x can produce different observations on different trials.

LaValle models this with an observation space Y (the set of all possible sensor readings) and an observation mapping P(y | x) — a conditional probability distribution. For each state x, P(· | x) is a distribution over observations y ∈ Y. Some observations are more likely than others given a particular state.

Three sensor archetypes. (1) Perfect sensor: P(y|x) = 1 if y=x, 0 otherwise. Observation reveals state exactly — this recovers the fully observable MDP of Chapter 10. (2) Noisy sensor: P(y|x) is a distribution around x. A range sensor returns true distance ± Gaussian noise. (3) Lossy/categorical sensor: Y is coarser than X. A bump sensor returns only "wall" or "open" — many distinct states map to the same observation. Chapter 11 handles all three.

Example: Binary hallway sensor

The robot is in a hallway with N cells. Three cells (positions 2, 4, 6) have doorways. The sensor returns y ∈ {door, open}. If the robot is at a doorway, the sensor returns "door" with probability 0.9 and "open" with 0.1 (missed detection). If the robot is at a non-doorway cell, it returns "open" with probability 0.8 and "door" with 0.2 (false alarm). Formally:

P(door | x=doorway) = 0.9     P(open | x=doorway) = 0.1
P(door | x=open  ) = 0.2     P(open | x=open  ) = 0.8

This matrix — one row per observation, one column per state — is the sensor model. It is the robot's model of how its hardware works. The robot uses this model to infer: "given that I saw y=door, how likely is it that I'm at each possible cell?" That inference is Bayesian — it's what drives belief updating.

Why the observation model is separate from the transition model

In Chapter 10, one model (the MDP transition P(x' | x, u)) governed everything. Chapter 11 splits this into two models: the motion model P(xk+1 | xk, uk) governing how state evolves, and the sensor model P(yk | xk) governing what the sensor returns. This factorization lets us swap sensors without changing the environment model, and vice versa. It's also the structure that makes Bayes filtering tractable.

The generative model. Fully specifying (X, U, Y, P(x'|x,u), P(y|x), xI) gives you a complete generative model. You can simulate the world: (1) start at xI, (2) draw y0 ~ P(·|x0), (3) apply action u0, (4) draw x1 ~ P(·|x0,u0), (5) draw y1 ~ P(·|x1), and so on. The robot receives the y sequence and must infer x from it.
python
import numpy as np

class HallwaySensor:
    def __init__(self, n_cells=7, doorways=(1,3,5),
                 p_hit=0.9, p_false=0.2):
        self.n = n_cells
        self.doorways = set(doorways)
        self.p_hit   = p_hit    # P(door | at doorway)
        self.p_false = p_false  # P(door | not at doorway)

    def observe(self, x):
        """Sample an observation from P(y | x)."""
        at_door = x in self.doorways
        p_door = self.p_hit if at_door else self.p_false
        return 'door' if np.random.random() < p_door else 'open'

    def likelihood(self, y, x):
        """P(y | x) — the sensor likelihood."""
        at_door = x in self.doorways
        p_door = self.p_hit if at_door else self.p_false
        return p_door if y == 'door' else 1.0 - p_door
A light sensor over 5 cells returns "bright" with P=0.95 when over a white cell and P=0.1 when over a black cell. The robot observes "bright." Which state is more likely: white cell or black cell?

Chapter 2: Information States — The Master Concept

Here is the central question of Chapter 11: the robot can't condition its policy on x (hidden), so what does it condition on? The answer is the information state, or I-state ηk. Informally, the I-state is everything the robot knows at time k that is relevant for future decisions.

At time k, the robot has received observations y0, y1, ..., yk and has applied actions u0, u1, ..., uk-1. Define the history ηk = (y0, u0, y1, u1, ..., yk). This history captures everything the robot has experienced. A policy π that maps each history to an action is called a history-based policy.

I-state = sufficient statistic of history. The full history grows unboundedly as k increases. In practice, we want a compact summary — something that captures everything from the history that matters for choosing the next action, and discards the rest. This compact summary IS the I-state. The belief distribution over states is one such summary. So is a Kalman filter's mean and covariance. Both are I-states.

The I-space

The set of all possible I-states is the information space (I-space), written Ι. Just as X is the state space and U is the action space, Ι is the space of all possible information states. A policy is now a mapping π : Ι → U — from I-states to actions — instead of a mapping from states to actions.

This is a profound structural shift. In Chapter 10, the policy lived in X × U. In Chapter 11, the policy lives in Ι × U. The challenge: Ι can be enormous. If X has n states, the history-based I-space after k steps has |Y|k×|U|k-1 elements — exponential in k. This is the "curse of history."

I-state transitions

When the robot applies action uk and receives observation yk+1, its I-state updates. There is an I-state transition function:

ηk+1 = τ(ηk, uk, yk+1)

This function τ is the "engine" that keeps the I-state current. For the raw history I-state, τ simply appends: τ(history, u, y) = (history, u, y). For the belief-state I-state (covered in Chapter 4), τ runs Bayesian filtering — a much more compact operation. The choice of I-state representation determines the form of τ.

I-state ηk
Current information — compact summary of all history
↓ apply action uk, receive yk+1
ηk+1 = τ(ηk, uk, yk+1)
I-state transition updates the summary
↓ choose next action
uk+1 = π(ηk+1)
Policy maps I-state to action
↻ repeat
Why does the I-state framework unify everything? Every approach to partial observability — Bayes filters, Kalman filters, particle filters, rule-based approaches — is really an I-state representation choice. They differ only in what information they carry and how τ is computed. Seeing this unifies a literature that otherwise looks fragmented.
Which of the following is a valid I-state for a robot that has moved for 5 steps in a 2D office?

Chapter 3: Derived I-Spaces — History, Belief, and Beyond

The raw history I-state is complete but impractical — it grows without bound. Chapter 11 introduces several derived I-spaces that are more compact while preserving the information needed for optimal planning. Each derived I-space trades off completeness for tractability in a different way.

The history I-space Hk

The most basic derived I-space is the history space. At stage k, ηk = (y0, u0, y1, ..., uk-1, yk). This is the "keep everything" approach. It is complete — no information is lost — but the I-space Hk = Y × (U × Y)k grows exponentially. It is practical only for very short horizons or tiny discrete problems.

The sufficient I-space and belief states

A sufficient I-state is one that captures all the information from the history that matters for future expected reward — and nothing more. For fully probabilistic problems, the belief state achieves this. The belief state bk is a probability distribution over X: bk(x) = P(xk = x | y0, u0, ..., yk). It is the posterior distribution over the hidden state given all observations.

Belief is a sufficient statistic for history. This is a theorem, not an assumption. Given bk, the future expected reward under any policy depends on history only through bk. You can throw away the raw history once you've computed the belief. This is why Bayesian filtering is so powerful — it compresses unbounded history into a fixed-size distribution.

The belief I-space

The belief I-space Β is the set of all probability distributions over X. For a discrete X with n states, this is the (n-1)-dimensional probability simplex: b ∈ Β ⇔ b(x) ≥ 0 for all x and ∑x b(x) = 1. The I-state transition τ for belief states is Bayesian filtering:

bk+1(x') = η · P(yk+1 | x') · ∑x P(x'|x, uk) · bk(x)

where η is a normalizing constant. This two-step operation — predict then update — is the core of Bayes filtering. The predict step propagates the belief through the motion model; the update step incorporates the new observation via the sensor model.

The nondeterministic I-space (when probabilities are unavailable)

Sometimes we don't have probability models — we only know which transitions and observations are possible. The nondeterministic I-state lk ⊆ X is the set of states the robot might currently occupy, given what it has observed. Update rule: lk+1 = {x' : ∃ x ∈ lk, P(x'|x,uk) > 0, and P(yk+1|x') > 0}. This is the "set-membership" or "worst-case" I-state — no probabilities, just possibility.

I-Space TypeI-StateSizeRequires
HistoryFull (y,u) sequenceExponential in kNothing beyond observations
NondeterministicSubset l ⊆ X2|X|Feasibility model only
Belief (probabilistic)Distribution b over X|X|-1 dimensionsP(x'|x,u) and P(y|x)
Gaussian (Kalman)Mean μ + covariance ΣO(n2)Linear Gaussian models
After receiving observation y5, the belief state b5 has been computed. Does the robot still need to store (y0, u0, ..., y4, u4) to make the next optimal decision?

Chapter 4: Showcase — Robot in Hallway

This is the payoff chapter. You will run a robot through a 7-cell hallway, take sensor observations, and watch the belief histogram update in real time. The robot starts with a uniform prior — it thinks it might be anywhere. Each observation shrinks or reshapes the belief. Multiple observations pin it down.

Three cells (positions 1, 3, 5) are doorways. The sensor returns "door" with probability 0.85 at doorways and "door" with probability 0.1 at open cells. After each observation, Bayes' rule updates the belief. The motion model: when the robot moves right, it reaches the next cell with probability 0.8 or stays with probability 0.2 (wheels slip on the linoleum).

Belief Histogram — Bayesian Localization in a Discrete Hallway

Orange cells = doorways. Blue dot = robot (true hidden position). Teal bars = current belief b(x). Take observations and move to watch beliefs update.

Last observation: —
Entropy: — bits
Watch entropy fall. Information entropy H(b) = -∑x b(x) log2 b(x) measures how spread out the belief is. Uniform belief (no idea where you are) has maximum entropy = log2(7) ≈ 2.81 bits. As observations arrive, entropy drops toward 0 (certainty). You'll see this in the status bar. Entropy is the formal measure of remaining uncertainty.

Tracing one update by hand

Say the robot observes y = "door." The prior is uniform: b(x) = 1/7 for each cell x ∈ {0,1,2,3,4,5,6}. The sensor model gives P(door|x=1) = 0.85 (doorway), P(door|x=0) = 0.10 (open). The Bayesian update is:

b'(x) = P(door | x) · b(x)  /  ∑x'' P(door | x'') · b(x'')

Numerator for x=1 (doorway): 0.85 × (1/7) = 0.1214. Numerator for x=0 (open): 0.10 × (1/7) = 0.0143. Normalizer η = 3 × 0.1214 + 4 × 0.0143 = 0.3643 + 0.0571 = 0.4214. After normalization: b'(1) = 0.1214/0.4214 = 0.288, b'(0) = 0.0143/0.4214 = 0.034. The three doorways together hold 3 × 0.288 = 0.864 of the probability mass — a single observation concentrated most of the belief onto doorway cells.

python
import numpy as np

def bayes_update(belief, obs, sensor_model):
    """Multiply belief by likelihood and renormalize."""
    likelihood = np.array([sensor_model[obs][x]
                           for x in range(len(belief))])
    posterior = likelihood * belief
    return posterior / posterior.sum()  # normalize

def motion_update(belief, transition_matrix):
    """Predict: convolve belief through motion model."""
    return transition_matrix.T @ belief   # P(x'|x) * b(x), summed over x

# Hallway: 7 cells, doorways at 1,3,5
doorways = {1, 3, 5}
sensor_model = {
    'door': [0.85 if x in doorways else 0.10 for x in range(7)],
    'open': [0.15 if x in doorways else 0.90 for x in range(7)],
}
belief = np.ones(7) / 7  # uniform prior

# After observing 'door':
belief = bayes_update(belief, 'door', sensor_model)
print(belief.round(3))  # [.034, .288, .034, .288, .034, .288, .034]

entropy = -np.sum(belief * np.log2(belief + 1e-10))
print(f"Entropy: {entropy:.2f} bits")  # 1.81 bits (down from 2.81)
The robot observes "open" twice in a row. What happens to the belief mass at doorway cells relative to open cells?

Chapter 5: Continuous Information Spaces

So far we've worked with discrete state spaces — a finite number of cells, a finite belief vector. But most robots operate in continuous spaces: position x ∈ ℝ2, orientation θ ∈ S1, velocity v ∈ ℝ3. When X is continuous, the belief state b(x) is not a vector but a probability density function (PDF) over X.

The Bayesian update generalizes to continuous X by replacing sums with integrals. The predict step convolves the prior with the motion model:

bk+1(x') = ∫X P(x'|x, uk) · bk(x) dx

The update step multiplies by the sensor likelihood and renormalizes:

bk+1(x') = P(yk+1 | x') · bk+1(x')  /  ∫X P(yk+1|x'') · bk+1(x'') dx''
The integral is usually intractable. For general continuous X with non-Gaussian models, there is no closed-form solution to the Bayesian update integral. This is the fundamental computational challenge of continuous partially observable planning. Three practical solutions: (1) Kalman filter — exact when models are linear and Gaussian (Chapter 6). (2) Particle filter — approximate via Monte Carlo samples. (3) Grid approximation — discretize X and use the discrete update.

The nondeterministic continuous I-space

When probabilistic models aren't available, the nondeterministic I-state generalizes from subsets of a discrete X to subsets of continuous X. Lk ⊆ X is a region in state space. If state dynamics are bounded — say, the robot's position can drift by at most ε per step — then Lk+1 is the ε-expansion of the motion-applied Lk, intersected with the observation-consistent set. This is called bounded-uncertainty planning and is useful in robust control applications.

Gaussian belief as a continuous I-state

For linear systems with Gaussian noise, the belief density at every step is a Gaussian: bk(x) = 𝒩(x ; μk, Σk). This is remarkable: no matter how many steps occur, the belief is fully described by just the mean vector μk and the covariance matrix Σk. The I-space is ℝn × 𝒮n (means × positive-definite matrices) — infinite-dimensional X, yet finite-dimensional I-space. This is the Kalman filter.

Gaussian Belief Spreading and Contracting

The 1D belief starts as a narrow Gaussian at x=0. Each "predict" step spreads it (motion noise). Each "update" step contracts it (observation narrows uncertainty). Adjust the noise parameters to see how they affect localization.

Motion noise σQ 0.5
Sensor noise σR 1.0
Mean: 0.00, Std: 1.00
A robot's true state is 1D. The motion model adds Gaussian noise with variance Q=4. The sensor measures with variance R=1. After a predict step, then an update step, which variance dominates the posterior?

Chapter 6: Kalman Filter as I-State Computation

The Kalman filter is not a mystery — it is exactly the Bayesian belief update for linear Gaussian systems, executed in closed form. Chapter 11 presents it as a special case of the I-state framework. This grounds the Kalman filter in planning theory and clarifies what assumptions it makes.

The linear Gaussian model

State transition: xk+1 = A · xk + B · uk + wk, where wk ~ 𝒩(0, Q). Observation: yk = C · xk + vk, where vk ~ 𝒩(0, R). Here A is the state transition matrix, B maps control input to state, C maps state to observation space, Q is process noise covariance, and R is measurement noise covariance. Both noise terms are zero-mean Gaussians — hence "linear Gaussian."

Why Gaussian belief stays Gaussian. Start with a Gaussian prior b0(x) = 𝒩(μ0, Σ0). Apply a linear transformation: A·x + w where w is Gaussian. The result is still Gaussian — linear maps of Gaussians are Gaussian, and the sum of two Gaussians is Gaussian. Multiply by a Gaussian likelihood: the product of two Gaussians is proportional to a Gaussian. So the belief stays Gaussian forever. This closure is what makes the Kalman filter exact and efficient.

The two-step update — as I-state transitions

Predict stepmotion): Given I-state (μk, Σk) and action uk, compute the predicted I-state after motion but before the new observation:

μk+1 = A · μk + B · uk      Σk+1 = A · Σk · AT + Q

Update stepsensor): Incorporate new observation yk+1 to compute the posterior I-state:

K = Σ · CT · (C · Σ · CT + R)−1
μk+1 = μk+1 + K · (yk+1 − C · μk+1)     Σk+1 = (I − K · C) · Σk+1

Here K is the Kalman gain — a trust slider between what the motion model predicts and what the sensor measures. K near I (identity) means "trust the sensor"; K near 0 means "trust the prediction." K is determined automatically by the ratio of prediction uncertainty Σ to measurement uncertainty R.

The Kalman filter IS the I-state transition. Every time the robot acts and observes, τ runs one predict-update cycle. The I-space for the Kalman filter is ℝn × 𝒮+n (means and positive-definite covariances). A policy for a partially observable linear-Gaussian problem conditions its action choices on (μk, Σk) — the Kalman I-state.
python
import numpy as np

def kalman_predict(mu, Sigma, A, B, u, Q):
    mu_pred   = A @ mu + B @ u
    Sig_pred  = A @ Sigma @ A.T + Q
    return mu_pred, Sig_pred

def kalman_update(mu_pred, Sig_pred, y, C, R):
    S = C @ Sig_pred @ C.T + R          # innovation covariance
    K = Sig_pred @ C.T @ np.linalg.inv(S)  # Kalman gain
    innov = y - C @ mu_pred             # observation residual
    mu_post   = mu_pred + K @ innov
    Sig_post  = (np.eye(len(mu_pred)) - K @ C) @ Sig_pred
    return mu_post, Sig_post

# 1D example: tracking a robot moving at constant velocity
A = np.array([[1, 1], [0, 1]])  # position, velocity
B = np.zeros((2, 1))
C = np.array([[1, 0]])           # observe position only
Q = np.diag([0.1, 0.01])         # small process noise
R = np.array([[1.0]])              # measurement noise

mu = np.array([0.0, 1.0])         # initial: pos=0, vel=1
Sigma = np.eye(2) * 0.5

for k in range(10):
    mu, Sigma = kalman_predict(mu, Sigma, A, B, np.zeros(1), Q)
    y_obs = np.array([k + 1.0 + np.random.randn() * 1.0])
    mu, Sigma = kalman_update(mu, Sigma, y_obs, C, R)
    print(ff"k={k}: pos estimate={mu[0]:.2f} ± {np.sqrt(Sigma[0,0]):.2f}")
In the Kalman filter, the covariance Σk after the update step is always ______ the predicted covariance Σk.

Chapter 7: Computing Probabilistic I-States

For most real systems — nonlinear dynamics, non-Gaussian noise, discrete mixtures — the Kalman filter doesn't apply. Chapter 11 surveys three families of algorithms for computing belief states in harder settings.

Grid-based belief approximation

Discretize the continuous state space X into a grid of cells. Each cell holds a probability mass value. The belief update runs exactly as in the discrete hallway: multiply by sensor likelihood, renormalize. This is simple and correct, but scales poorly: a 2D grid at 0.1m resolution covering a 10m×10m room needs 10,000 cells. A 3D space with heading needs millions. Grid methods are practical only for low-dimensional spaces (≤ 3D).

Particle filters — Monte Carlo belief

A particle filter represents the belief as a set of M weighted samples (particles): {(x(i), w(i))}i=1..M. Each particle x(i) is a hypothesis about the current state; its weight w(i) is proportional to how well it explains the observations. The algorithm:

1. Sample
For each particle i, sample x(i)k+1 ~ P(x'|x(i)k, uk)
2. Weight
Set w(i) = P(yk+1 | x(i)k+1) for each particle
3. Resample
Draw M new particles with probability proportional to w(i)
↻ next step

The set of particles approximates the belief: bk(x) ≈ (1/M) ∑i δ(x - x(i)k) after resampling. As M → ∞, the approximation converges. In practice, M=500-2000 particles is sufficient for many problems. Particle filters handle nonlinear motion and sensor models effortlessly.

Particle collapse (degeneracy). If all particles drift far from the true state — due to a poor prior, a sudden large motion, or a rare observation — the weights become extremely skewed: a few particles get nearly all the weight. After resampling, only those particles survive, and the approximation collapses to a few points. Solutions: (1) increase M, (2) add "injection" of random particles to recover from kidnapping, (3) use better proposal distributions (guided sampling).

The POMDP value function is piecewise linear and convex over beliefs

When solving a Partially Observable MDP (POMDP) — choosing actions optimally given a belief state — the value function V(b) maps belief states to expected future reward. LaValle (pp. 599-607) establishes a key structural result: for finite-horizon POMDPs with finite X, U, Y, the value function at each time step is piecewise linear and convex (PWLC) over the belief simplex Β. This means V(b) = maxii · b) for a set of α-vectors, one per "policy tree." The complexity grows exponentially in horizon — solving POMDPs exactly is PSPACE-hard — but approximate methods (PBVI, SARSOP, POMCP) exploit this structure.

python
import numpy as np

class ParticleFilter:
    def __init__(self, n_particles, state_dim, motion_fn, sensor_fn):
        self.M = n_particles
        # Initialize particles uniformly in state space
        self.particles = np.random.uniform(0, 1, (n_particles, state_dim))
        self.weights   = np.ones(n_particles) / n_particles
        self.motion_fn = motion_fn   # samples x' ~ P(x'|x,u)
        self.sensor_fn = sensor_fn   # returns P(y|x)

    def update(self, u, y):
        # 1. Propagate through motion model
        self.particles = np.array([
            self.motion_fn(x, u) for x in self.particles
        ])
        # 2. Compute weights from sensor likelihood
        self.weights = np.array([
            self.sensor_fn(y, x) for x in self.particles
        ])
        self.weights += 1e-300         # avoid zero weights
        self.weights /= self.weights.sum()  # normalize
        # 3. Resample (systematic resampling)
        indices = np.random.choice(self.M, self.M, p=self.weights)
        self.particles = self.particles[indices]
        self.weights = np.ones(self.M) / self.M

    def mean(self):
        return self.particles.mean(axis=0)
A particle filter uses M=100 particles. After an update, 98 particles have weight near 0 and 2 particles have nearly all the weight. What is this phenomenon called and what is the standard fix?

Chapter 8: Game-Theoretic Information Spaces

Everything so far assumed one decision-maker (the robot) facing a stochastic but non-adversarial environment. Chapter 11 extends the I-state framework to multi-player games — situations where other rational agents are acting simultaneously and their actions affect you.

The two-player zero-sum case

Consider a pursuit-evasion game: a pursuer (robot) tries to catch an evader. Both move simultaneously. Neither can see the other's true state. The pursuer has a belief about where the evader is; the evader has a belief about where the pursuer is. Each player's I-state must model not just the environment but the other player's possible actions and beliefs.

For a two-player zero-sum game with partial observability, the I-state for player 1 is η1,k = (y1,0, u1,0, ..., y1,k) — player 1's own observation-action history. Player 2 has η2,k separately. The minimax value function over belief states generalizes the POMDP value function, but now with an outer max (player 1 maximizes) and inner min (player 2 minimizes).

I-states in games are player-specific. There is no single "world belief state." Each player has their own I-state — their own posterior over the joint state of the world (including the other player's state and potentially the other player's beliefs). This leads to the notion of interactive belief states in the game theory literature, where a player may model the other player's model, which models the first player's model, ad infinitum (the "I believe you believe I believe…" regress). Finite-depth truncations make this tractable.

Nondeterministic game I-states

When probability models are unavailable, nondeterministic game I-states are sets. The pursuer maintains l1,k ⊆ X — the set of states the evader might be in, given what the pursuer has observed. The evader maintains l2,k ⊆ X for the pursuer. Planning becomes a minimax search over these sets: the pursuer picks an action to shrink its uncertainty about the evader, while the evader acts to maximize its uncertainty from the pursuer's perspective.

Visibility-based pursuit evasion

LaValle's canonical example (pp. 624-629) is pursuit-evasion in a polygonal environment. The pursuer (a robot with a limited field of view) tries to guarantee that the evader is visible. The I-state is the contamination region — the set of locations where an evader could be hiding without the pursuer having seen it. The pursuer's goal: shrink the contamination region to empty. This is equivalent to a coverage problem on the nondeterministic I-space.

SettingI-State TypeValue FunctionSolution Method
POMDP (1 agent)Belief b over XPWLC over ΒValue iteration on α-vectors
Zero-sum game (2 agents)Each player's own beliefMinimax-PWLCMinimax value iteration
Pursuit-evasion (nondeterministic)Contamination set L ⊆ XSet containmentRetrograde analysis on 2|X|
In the pursuit-evasion I-space framework, the pursuer's I-state is the "contamination region." When the contamination region becomes empty, what does that mean?

Chapter 9: Connections — Where to Go Next

Chapter 11 establishes the theoretical framework for planning under partial observability. The I-state is the unifying concept. But the framework is deliberately general — it doesn't prescribe how to solve the resulting I-space planning problems. That is the subject of later chapters in LaValle and of a large research literature.

What Chapter 11 built

The three lasting ideas. (1) I-states replace states: when state is hidden, you plan in the information space. The structure of the problem doesn't change — you still have a space, transitions, and a goal — but the space is Ι instead of X. (2) Belief is the sufficient I-state: for probabilistic problems, the posterior distribution is all you need. (3) I-spaces nest within each other: history is the most expressive I-space; Gaussian (Kalman) and set-membership (nondeterministic) are compressed, problem-specific special cases.

Known limitations of the Chapter 11 framework

The pure Bayesian approach requires a model: P(x'|x,u) and P(y|x). If the model is wrong — and it always is, to some degree — the belief can diverge from reality without any signal. This is the model mismatch problem. Robust methods (nondeterministic I-states, worst-case planning) guard against this, at the cost of being overly conservative. Active research is on data-driven models that learn P(x'|x,u) and P(y|x) from experience.

Computing the optimal policy in the POMDP I-space is PSPACE-hard in general. The exact algorithms (enumeration of α-vectors) are exponential in horizon. Approximation algorithms like PBVI, SARSOP, and Monte Carlo tree search (POMCP) make large-scale POMDPs tractable — but Chapter 11's exact framework is the foundation they build on.

Comparison of I-state representations

RepresentationExpressivenessTractabilityBest for
Full history ηkMaximumExponential in kVery short horizons, small Y
Nondeterministic l ⊆ XFeasibility only2|X|Guaranteed safe behaviors, no prob. model
Belief b over X (discrete)Sufficient|X|-simplexSmall discrete POMDP
Gaussian (μ, Σ)Sufficient (linear Gaussian)O(n2)Linear systems, robotics localization
Particle set {x(i)}ApproximateO(M · |X|)Nonlinear non-Gaussian systems

Related micro-lessons

The I-state framework connects directly to several estimation and planning lessons:

The key takeaway from Chapter 11. Every practical localization, filtering, and planning system is implicitly choosing an I-state representation and an I-state transition function τ. The Kalman filter chose (μ, Σ). The particle filter chose a set of samples. SLAM chose a joint distribution over pose and map. Understanding them all as I-states makes it clear what assumptions each makes and when each breaks down. That understanding is what Chapter 11 gives you.
"The information space is not a computational convenience. It is the correct space in which to pose the problem."
— Steven M. LaValle, Planning Algorithms, §11.1
A robot uses a particle filter with M=1000 particles to estimate its position in a building. As it navigates, the particles cluster tightly around one room. This is analogous to which change in the belief I-state?