When Gaussians aren't enough: histogram filters and particle filters for arbitrary belief shapes.
The Kalman filter from Chapter 3 is elegant, but it makes a strong assumption: the belief is always a single Gaussian. That means it can only represent one hypothesis about where the robot is. What if the robot is genuinely uncertain between two distinct locations?
Picture a robot in a long hallway with identical doors. After sensing one door, it might be at door 1 or door 2 or door 3. This is a multi-modal belief — multiple peaks of probability. A single Gaussian cannot represent this. It would either pick one peak and ignore the others, or smear into a wide blob that doesn't represent any of them well.
This chapter introduces two nonparametric approaches:
Histogram filters decompose the state space into grid cells and assign a probability to each cell. Think of it as a bar chart over locations. They can represent any shape, but their resolution is limited by computation.
Particle filters represent the belief using a set of random samples (particles). Where particles cluster densely, the belief is high. They can also represent any shape, and they focus computation where it matters most.
The simplest nonparametric filter chops the state space into a grid and maintains one probability per cell. For a discrete state space (like "which room am I in?"), this is exact. For a continuous state space (like x-y position), it's an approximation — the finer the grid, the better.
pseudocode # Discrete Bayes Filter (Histogram Filter) Input: {pk,t-1}, ut, zt for all grid cells k do p̅k,t = Σi p(Xt=xk | ut, Xt-1=xi) · pi,t-1 # prediction pk,t = η · p(zt | Xt=xk) · p̅k,t # update endfor return {pk,t}
This is just the Bayes filter from Chapter 2, with the integral replaced by a sum over grid cells. Each cell k stores a probability pk,t. The prediction step sums up contributions from all cells (weighted by the motion model). The update step multiplies by the measurement likelihood.
| Aspect | Histogram Filter |
|---|---|
| Representation | One probability per grid cell |
| Belief shape | Arbitrary (piecewise constant) |
| Computation | O(K2) prediction, O(K) update, K = number of cells |
| Limitation | Scales poorly to high dimensions (curse of dimensionality) |
When the state space is continuous (like a robot's x-y-θ pose), we discretize it into a grid:
Each region xk,t gets a single probability value. Within each region, the belief is assumed uniform. This means the posterior is a piecewise constant approximation to the true continuous density.
The conditional probabilities are approximated by evaluating at the center of each cell:
where x̂k,t is the mean of region xk,t.
Two strategies for managing resolution:
Fixed fine-grained grids: High accuracy, high cost. Best with pre-caching (compute expensive ray-casts once) and selective updating (only update cells with non-negligible probability).
Variable-resolution grids: Coarse where the belief is spread out, fine where it's concentrated. Tree-based structures (e.g., quad-trees) adapt resolution dynamically. This is the adaptive approach.
A special case worth highlighting: what if the state is binary and doesn't change over time? For example, "Is this grid cell occupied or free?" This is the foundation of occupancy grid mapping (Chapter 9).
For binary static state, the Bayes filter simplifies dramatically. Using the log-odds representation:
The update becomes a simple addition:
where l0 is the prior log-odds. Each measurement just adds its "evidence" to the running total. This is fast, numerically stable, and avoids the probabilities-near-zero problem.
The particle filter takes a completely different approach to representing beliefs. Instead of dividing space into cells, it uses a collection of random samples called particles:
Each particle xt[m] is a concrete hypothesis: "The robot might be here, in this exact pose." Where particles cluster, probability is high. Where they're sparse, probability is low.
How many particles do we need? In practice, M = 100 to 10,000 depending on the problem. More particles give better approximation. Some implementations adapt M based on the complexity of the current belief.
The magic: as M → ∞, the particle approximation converges to the true posterior. For finite M, there's some approximation error, but in practice even a few hundred particles work remarkably well.
The particle filter algorithm has three elegant steps for each time step:
pseudocode # Particle Filter Input: Xt-1, ut, zt # Step 1: Predict — sample new poses from motion model for m = 1 to M do xt[m] ~ p(xt | ut, xt-1[m]) # sample from motion model wt[m] = p(zt | xt[m]) # importance weight endfor # Step 2: Resample — survival of the fittest for m = 1 to M do draw i with probability ∝ wt[i] add xt[i] to new Xt endfor return Xt
Step 1 — Predict: Each old particle is "moved" by sampling from the motion model. If the robot moved forward, each particle moves forward plus some random noise. This creates the predicted particle set.
Step 1b — Weight: Each predicted particle is weighted by how well the measurement matches. A particle that's consistent with the sensor reading gets a high weight; one that contradicts it gets a low weight.
Step 2 — Resample: Draw M new particles from the weighted set, with replacement. High-weight particles get duplicated; low-weight particles vanish. This is Darwinian selection — hypotheses that explain the data survive, others die.
Let's watch a particle filter track a robot in a 1D hallway. The robot starts with global uncertainty (particles spread everywhere), then moves and senses landmarks. Watch how particles cluster around the true position after a few measurements.
Green line = true position. Orange ticks = particles. The robot senses distance to landmarks (blue triangles). Click Step to advance, or Auto-run to animate.
The theoretical foundation of particle filters is importance sampling. The idea: we want to sample from a target distribution f (the posterior), but we can't do it directly. Instead, we sample from a proposal distribution g (the motion model), then correct using weights.
In the particle filter:
• The target f is the full posterior bel(xt) = η p(zt|xt) bel̅(xt)
• The proposal g is the prediction bel̅(xt) = p(xt|ut, xt-1)
• The weight simplifies to w = f/g = p(zt|xt) — the measurement likelihood
Alternative proposals (beyond the motion model) can improve particle filter performance. The optimal proposal incorporates the measurement into the proposal, generating particles that are more likely to be in high-weight regions. This reduces the variance of the weights but is harder to sample from.
Particle filters have several important properties that make them widely used:
| Property | Description |
|---|---|
| Consistency | As M → ∞, the particle approximation converges to the true posterior |
| Multi-modal | Can represent any number of peaks, irregular shapes, heavy tails |
| Anytime | More particles = better approximation; can trade quality for speed |
| Simple | Only need to sample from motion model and evaluate measurement likelihood |
| Resource-adaptive | Can adjust M based on available computation time |
Known limitations:
Particle depletion: In high-dimensional spaces, particles spread too thin. If no particle is near the true state, the filter cannot recover. This is the curse of dimensionality for particle filters.
Sample impoverishment: After many resamplings, the particle set may lose diversity — many particles become copies of the same few ancestors. Techniques like low-variance resampling and roughening help.
No global localization recovery: Once all particles cluster in the wrong place, the filter is stuck. Adding random particles occasionally (as in MCL, Chapter 8) helps.
Let's compare all the filter families we've seen:
| Filter | Belief Shape | State Space | Scaling | Best For |
|---|---|---|---|---|
| Kalman (Ch 3) | Unimodal Gaussian | Continuous | O(n2) | Linear/mildly nonlinear tracking |
| Histogram (Ch 4) | Arbitrary (discrete) | Discrete/gridded | O(K2) | Low-dimensional, global localization |
| Particle (Ch 4) | Arbitrary (sampled) | Continuous | O(M) | Multi-modal, moderate dimensions |
Nonparametric filters break free of the Gaussian assumption. They can represent arbitrary beliefs — multi-modal, skewed, heavy-tailed — at the cost of more computation.
| Method | When to use |
|---|---|
| Histogram filter | Discrete or low-dimensional continuous states; global localization on a grid |
| Binary Bayes / log-odds | Static binary state estimation (e.g., occupancy mapping) |
| Particle filter | Multi-modal continuous beliefs; moderate dimensions; the workhorse of robot localization |