Thrun et al., Chapter 4

Nonparametric Filters

When Gaussians aren't enough: histogram filters and particle filters for arbitrary belief shapes.

Prerequisites: Chapter 2 (Bayes filter), Chapter 3 (Gaussian filters), basic probability.
11
Chapters
2
Simulations
11
Quizzes

Chapter 0: Beyond Gaussians

The Kalman filter from Chapter 3 is elegant, but it makes a strong assumption: the belief is always a single Gaussian. That means it can only represent one hypothesis about where the robot is. What if the robot is genuinely uncertain between two distinct locations?

Picture a robot in a long hallway with identical doors. After sensing one door, it might be at door 1 or door 2 or door 3. This is a multi-modal belief — multiple peaks of probability. A single Gaussian cannot represent this. It would either pick one peak and ignore the others, or smear into a wide blob that doesn't represent any of them well.

The core problem: Gaussians are unimodal. Real robot beliefs are often multi-modal. We need representations that can capture arbitrary belief shapes — multiple peaks, heavy tails, irregular contours. These are called nonparametric representations because they don't assume a fixed functional form.

This chapter introduces two nonparametric approaches:

Histogram filters decompose the state space into grid cells and assign a probability to each cell. Think of it as a bar chart over locations. They can represent any shape, but their resolution is limited by computation.

Particle filters represent the belief using a set of random samples (particles). Where particles cluster densely, the belief is high. They can also represent any shape, and they focus computation where it matters most.

Check: Why can't a Kalman filter handle a robot that might be at door 1 OR door 3?

Chapter 1: The Histogram Filter

The simplest nonparametric filter chops the state space into a grid and maintains one probability per cell. For a discrete state space (like "which room am I in?"), this is exact. For a continuous state space (like x-y position), it's an approximation — the finer the grid, the better.

pseudocode
# Discrete Bayes Filter (Histogram Filter)
Input: {pk,t-1}, ut, zt

for all grid cells k dok,t = Σi p(Xt=xk | ut, Xt-1=xi) · pi,t-1   # prediction
  pk,t = η · p(zt | Xt=xk) · p̅k,t              # update
endfor

return {pk,t}

This is just the Bayes filter from Chapter 2, with the integral replaced by a sum over grid cells. Each cell k stores a probability pk,t. The prediction step sums up contributions from all cells (weighted by the motion model). The update step multiplies by the measurement likelihood.

AspectHistogram Filter
RepresentationOne probability per grid cell
Belief shapeArbitrary (piecewise constant)
ComputationO(K2) prediction, O(K) update, K = number of cells
LimitationScales poorly to high dimensions (curse of dimensionality)
Think of it this way: A histogram filter is like polling every location: "What's the probability the robot is here?" after each motion and measurement. It's thorough but expensive — if you need a fine 3D grid (x, y, θ) at 15cm and 5° resolution in a 50m × 50m building, that's millions of cells.
Check: What is the main computational bottleneck of histogram filters?

Chapter 2: Continuous State Histograms

When the state space is continuous (like a robot's x-y-θ pose), we discretize it into a grid:

range(Xt) = x1,t ∪ x2,t ∪ … ∪ xK,t

Each region xk,t gets a single probability value. Within each region, the belief is assumed uniform. This means the posterior is a piecewise constant approximation to the true continuous density.

The conditional probabilities are approximated by evaluating at the center of each cell:

p(zt | xk,t) ≈ p(zt | x̂k,t)

where x̂k,t is the mean of region xk,t.

Resolution trade-off: Fine grids give accurate beliefs but cost more to compute. Coarse grids are fast but lose information. For a 3D robot pose with 15cm spatial and 5° angular resolution in a 30m × 30m building, you need roughly 30/0.15 × 30/0.15 × 360/5 ≈ 2.9 million cells. Each Bayes filter cycle touches every cell.

Two strategies for managing resolution:

Fixed fine-grained grids: High accuracy, high cost. Best with pre-caching (compute expensive ray-casts once) and selective updating (only update cells with non-negligible probability).

Variable-resolution grids: Coarse where the belief is spread out, fine where it's concentrated. Tree-based structures (e.g., quad-trees) adapt resolution dynamically. This is the adaptive approach.

Check: In a histogram filter for continuous states, what approximation is used within each cell?

Chapter 3: Binary Bayes Filters

A special case worth highlighting: what if the state is binary and doesn't change over time? For example, "Is this grid cell occupied or free?" This is the foundation of occupancy grid mapping (Chapter 9).

For binary static state, the Bayes filter simplifies dramatically. Using the log-odds representation:

lt = log p(x) / (1 - p(x))

The update becomes a simple addition:

lt = lt-1 + log p(x|zt) / (1 - p(x|zt)) - l0

where l0 is the prior log-odds. Each measurement just adds its "evidence" to the running total. This is fast, numerically stable, and avoids the probabilities-near-zero problem.

Why log-odds? Probabilities near 0 or 1 cause numerical issues. Log-odds map the interval (0,1) to (-∞, +∞), making addition the natural update operation. To recover the probability: p = 1 - 1/(1 + el).
Check: What advantage does the log-odds representation have for binary Bayes filters?

Chapter 4: The Particle Filter Idea

The particle filter takes a completely different approach to representing beliefs. Instead of dividing space into cells, it uses a collection of random samples called particles:

Xt = { xt[1], xt[2], …, xt[M] }

Each particle xt[m] is a concrete hypothesis: "The robot might be here, in this exact pose." Where particles cluster, probability is high. Where they're sparse, probability is low.

The key insight: Instead of maintaining probabilities everywhere (like a histogram), particles concentrate computation where the belief actually is. If the belief has two peaks, particles cluster at both peaks. Empty regions get no particles and no computation. This is why particle filters scale so much better than histograms to high-dimensional spaces.

How many particles do we need? In practice, M = 100 to 10,000 depending on the problem. More particles give better approximation. Some implementations adapt M based on the complexity of the current belief.

The magic: as M → ∞, the particle approximation converges to the true posterior. For finite M, there's some approximation error, but in practice even a few hundred particles work remarkably well.

Check: How do particle filters concentrate computation where it matters?

Chapter 5: PF Algorithm

The particle filter algorithm has three elegant steps for each time step:

pseudocode
# Particle Filter
Input: Xt-1, ut, zt

# Step 1: Predict — sample new poses from motion model
for m = 1 to M do
  xt[m] ~ p(xt | ut, xt-1[m])        # sample from motion model
  wt[m] = p(zt | xt[m])             # importance weight
endfor

# Step 2: Resample — survival of the fittest
for m = 1 to M do
  draw i with probability ∝ wt[i]
  add xt[i] to new Xt
endfor

return Xt

Step 1 — Predict: Each old particle is "moved" by sampling from the motion model. If the robot moved forward, each particle moves forward plus some random noise. This creates the predicted particle set.

Step 1b — Weight: Each predicted particle is weighted by how well the measurement matches. A particle that's consistent with the sensor reading gets a high weight; one that contradicts it gets a low weight.

Step 2 — Resample: Draw M new particles from the weighted set, with replacement. High-weight particles get duplicated; low-weight particles vanish. This is Darwinian selection — hypotheses that explain the data survive, others die.

The resampling trick: After resampling, all particles have equal weight. The distribution of particles approximates bel(xt). Resampling prevents "particle starvation" — without it, a few particles would accumulate all the weight while the rest become irrelevant.
Check: What is the role of the importance weight w[m] in the particle filter?

Chapter 6: PF Demo

Let's watch a particle filter track a robot in a 1D hallway. The robot starts with global uncertainty (particles spread everywhere), then moves and senses landmarks. Watch how particles cluster around the true position after a few measurements.

1D Particle Filter Localization

Green line = true position. Orange ticks = particles. The robot senses distance to landmarks (blue triangles). Click Step to advance, or Auto-run to animate.

t = 0
Particles M200
What to notice: (1) Initially, particles are spread uniformly — the robot has no idea where it is. (2) After the first measurement, particles cluster near positions consistent with the reading. Multiple clusters form because several positions match. (3) After a few more steps of moving and sensing, one cluster dominates — the robot has localized itself. This is multi-modal to unimodal convergence, something a Kalman filter could never do.
Check: After resampling, what happens to particles in low-probability regions?

Chapter 7: Importance Sampling

The theoretical foundation of particle filters is importance sampling. The idea: we want to sample from a target distribution f (the posterior), but we can't do it directly. Instead, we sample from a proposal distribution g (the motion model), then correct using weights.

w[m] = f(x[m]) / g(x[m])

In the particle filter:

• The target f is the full posterior bel(xt) = η p(zt|xt) bel̅(xt)

• The proposal g is the prediction bel̅(xt) = p(xt|ut, xt-1)

• The weight simplifies to w = f/g = p(zt|xt) — the measurement likelihood

Why this works: We sample particles from the motion model (easy to do), then weight them by the measurement likelihood (easy to evaluate). The weighted particles approximate the posterior. Resampling converts weighted particles into unweighted ones, concentrating them in high-posterior regions. This two-step process — sample from proposal, correct by weights — is the essence of importance sampling.

Alternative proposals (beyond the motion model) can improve particle filter performance. The optimal proposal incorporates the measurement into the proposal, generating particles that are more likely to be in high-weight regions. This reduces the variance of the weights but is harder to sample from.

Check: In the standard particle filter, what is the proposal distribution?

Chapter 8: PF Properties

Particle filters have several important properties that make them widely used:

PropertyDescription
ConsistencyAs M → ∞, the particle approximation converges to the true posterior
Multi-modalCan represent any number of peaks, irregular shapes, heavy tails
AnytimeMore particles = better approximation; can trade quality for speed
SimpleOnly need to sample from motion model and evaluate measurement likelihood
Resource-adaptiveCan adjust M based on available computation time

Known limitations:

Particle depletion: In high-dimensional spaces, particles spread too thin. If no particle is near the true state, the filter cannot recover. This is the curse of dimensionality for particle filters.

Sample impoverishment: After many resamplings, the particle set may lose diversity — many particles become copies of the same few ancestors. Techniques like low-variance resampling and roughening help.

No global localization recovery: Once all particles cluster in the wrong place, the filter is stuck. Adding random particles occasionally (as in MCL, Chapter 8) helps.

Rule of thumb: The number of particles needed grows exponentially with the dimension of the state space. For a 3D robot pose (x, y, θ), a few hundred particles suffice. For a 100-dimensional SLAM state, particle filters struggle — which is why EKF or information-form methods dominate in SLAM.
Check: What is "particle depletion" and when does it occur?

Chapter 9: Comparison

Let's compare all the filter families we've seen:

FilterBelief ShapeState SpaceScalingBest For
Kalman (Ch 3)Unimodal GaussianContinuousO(n2)Linear/mildly nonlinear tracking
Histogram (Ch 4)Arbitrary (discrete)Discrete/griddedO(K2)Low-dimensional, global localization
Particle (Ch 4)Arbitrary (sampled)ContinuousO(M)Multi-modal, moderate dimensions
How to choose: If the belief is roughly Gaussian and the system is mildly nonlinear, use an EKF — it's the cheapest and most accurate. If you need multi-modal beliefs and the state is low-dimensional (≤ 3D), a histogram filter or particle filter works. For moderate dimensions (3-6D) with multi-modal beliefs, particle filters are the standard choice. For very high dimensions, you'll likely need parametric methods (EKF, information filter) with clever approximations.
Check: For global robot localization in a 2D map (multi-modal belief), which filter is most appropriate?

Chapter 10: Summary

Nonparametric filters break free of the Gaussian assumption. They can represent arbitrary beliefs — multi-modal, skewed, heavy-tailed — at the cost of more computation.

MethodWhen to use
Histogram filterDiscrete or low-dimensional continuous states; global localization on a grid
Binary Bayes / log-oddsStatic binary state estimation (e.g., occupancy mapping)
Particle filterMulti-modal continuous beliefs; moderate dimensions; the workhorse of robot localization
What comes next: With filters covered (Chapters 2-4), the next two chapters develop the motion model p(xt|ut,xt-1) and measurement model p(zt|xt) — the two probability distributions that every Bayes filter needs. Chapter 5 covers robot motion; Chapter 6 covers sensor measurements.
"The particle filter is to nonlinear estimation
what Monte Carlo is to integration:
simple, general, and surprisingly effective."
Check: What is the fundamental trade-off in nonparametric filters compared to Gaussian filters?