Ch 4: Nonparametric Filters — Probabilistic Robotics

Chapter 0: Beyond Gaussians

The Kalman filter from Chapter 3 is elegant, but it makes a strong assumption: the belief is always a single Gaussian. That means it can only represent one hypothesis about where the robot is. What if the robot is genuinely uncertain between two distinct locations?

Picture a robot in a long hallway with identical doors. After sensing one door, it might be at door 1 or door 2 or door 3. This is a multi-modal belief — multiple peaks of probability. A single Gaussian cannot represent this. It would either pick one peak and ignore the others, or smear into a wide blob that doesn't represent any of them well.

The core problem: Gaussians are unimodal. Real robot beliefs are often multi-modal. We need representations that can capture arbitrary belief shapes — multiple peaks, heavy tails, irregular contours. These are called nonparametric representations because they don't assume a fixed functional form.

This chapter introduces two nonparametric approaches:

Histogram filters decompose the state space into grid cells and assign a probability to each cell. Think of it as a bar chart over locations. They can represent any shape, but their resolution is limited by computation.

Particle filters represent the belief using a set of random samples (particles). Where particles cluster densely, the belief is high. They can also represent any shape, and they focus computation where it matters most.

Check: Why can't a Kalman filter handle a robot that might be at door 1 OR door 3?

The Kalman filter is too slow Gaussians are unimodal — they can only represent a single peak, not multiple hypotheses The Kalman filter doesn't use sensor data

Chapter 1: The Histogram Filter

The simplest nonparametric filter chops the state space into a grid and maintains one probability per cell. For a discrete state space (like "which room am I in?"), this is exact. For a continuous state space (like x-y position), it's an approximation — the finer the grid, the better.

pseudocode
# Discrete Bayes Filter (Histogram Filter)
Input: {p_k,t-1}, u_t, z_t

for all grid cells k do
  p̅_k,t = Σ_i p(X_t=x_k | u_t, X_t-1=x_i) · p_i,t-1   # prediction
  p_k,t = η · p(z_t | X_t=x_k) · p̅_k,t              # update
endfor

return {p_k,t}

This is just the Bayes filter from Chapter 2, with the integral replaced by a sum over grid cells. Each cell k stores a probability p_k,t. The prediction step sums up contributions from all cells (weighted by the motion model). The update step multiplies by the measurement likelihood.

Aspect	Histogram Filter
Representation	One probability per grid cell
Belief shape	Arbitrary (piecewise constant)
Computation	O(K²) prediction, O(K) update, K = number of cells
Limitation	Scales poorly to high dimensions (curse of dimensionality)

Think of it this way: A histogram filter is like polling every location: "What's the probability the robot is here?" after each motion and measurement. It's thorough but expensive — if you need a fine 3D grid (x, y, θ) at 15cm and 5° resolution in a 50m × 50m building, that's millions of cells.

Check: What is the main computational bottleneck of histogram filters?

The prediction step requires summing over all cells for each cell — O(K²) The measurement update is too noisy Normalizing the probabilities takes too long

Chapter 2: Continuous State Histograms

When the state space is continuous (like a robot's x-y-θ pose), we discretize it into a grid:

range(X_t) = x_1,t ∪ x_2,t ∪ … ∪ x_K,t

Each region x_k,t gets a single probability value. Within each region, the belief is assumed uniform. This means the posterior is a piecewise constant approximation to the true continuous density.

The conditional probabilities are approximated by evaluating at the center of each cell:

p(z_t | x_k,t) ≈ p(z_t | x̂_k,t)

where x̂_k,t is the mean of region x_k,t.

Resolution trade-off: Fine grids give accurate beliefs but cost more to compute. Coarse grids are fast but lose information. For a 3D robot pose with 15cm spatial and 5° angular resolution in a 30m × 30m building, you need roughly 30/0.15 × 30/0.15 × 360/5 ≈ 2.9 million cells. Each Bayes filter cycle touches every cell.

Two strategies for managing resolution:

Fixed fine-grained grids: High accuracy, high cost. Best with pre-caching (compute expensive ray-casts once) and selective updating (only update cells with non-negligible probability).

Variable-resolution grids: Coarse where the belief is spread out, fine where it's concentrated. Tree-based structures (e.g., quad-trees) adapt resolution dynamically. This is the adaptive approach.

Check: In a histogram filter for continuous states, what approximation is used within each cell?

A Gaussian centered at the cell mean A uniform (constant) probability within each cell A particle approximation within each cell

Chapter 3: Binary Bayes Filters

A special case worth highlighting: what if the state is binary and doesn't change over time? For example, "Is this grid cell occupied or free?" This is the foundation of occupancy grid mapping (Chapter 9).

For binary static state, the Bayes filter simplifies dramatically. Using the log-odds representation:

l_t = log p(x) / (1 - p(x))

The update becomes a simple addition:

l_t = l_t-1 + log p(x|z_t) / (1 - p(x|z_t)) - l₀

where l₀ is the prior log-odds. Each measurement just adds its "evidence" to the running total. This is fast, numerically stable, and avoids the probabilities-near-zero problem.

Why log-odds? Probabilities near 0 or 1 cause numerical issues. Log-odds map the interval (0,1) to (-∞, +∞), making addition the natural update operation. To recover the probability: p = 1 - 1/(1 + e^l).

Check: What advantage does the log-odds representation have for binary Bayes filters?

Updates become simple additions — just add new measurement evidence It makes the filter work for continuous states It eliminates the need for a prior

Chapter 4: The Particle Filter Idea

The particle filter takes a completely different approach to representing beliefs. Instead of dividing space into cells, it uses a collection of random samples called particles:

X_t = { x_t^[1], x_t^[2], …, x_t^[M] }

Each particle x_t^[m] is a concrete hypothesis: "The robot might be here, in this exact pose." Where particles cluster, probability is high. Where they're sparse, probability is low.

The key insight: Instead of maintaining probabilities everywhere (like a histogram), particles concentrate computation where the belief actually is. If the belief has two peaks, particles cluster at both peaks. Empty regions get no particles and no computation. This is why particle filters scale so much better than histograms to high-dimensional spaces.

How many particles do we need? In practice, M = 100 to 10,000 depending on the problem. More particles give better approximation. Some implementations adapt M based on the complexity of the current belief.

The magic: as M → ∞, the particle approximation converges to the true posterior. For finite M, there's some approximation error, but in practice even a few hundred particles work remarkably well.

Check: How do particle filters concentrate computation where it matters?

Particles cluster in high-probability regions; low-probability regions have few or no particles They use finer grid cells near the robot They ignore all measurements far from the current estimate

Chapter 5: PF Algorithm

The particle filter algorithm has three elegant steps for each time step:

pseudocode
# Particle Filter
Input: X_t-1, u_t, z_t

# Step 1: Predict — sample new poses from motion model
for m = 1 to M do
  x_t^[m] ~ p(x_t | u_t, x_t-1^[m])        # sample from motion model
  w_t^[m] = p(z_t | x_t^[m])             # importance weight
endfor

# Step 2: Resample — survival of the fittest
for m = 1 to M do
  draw i with probability ∝ w_t^[i]
  add x_t^[i] to new X_t
endfor

return X_t

Step 1 — Predict: Each old particle is "moved" by sampling from the motion model. If the robot moved forward, each particle moves forward plus some random noise. This creates the predicted particle set.

Step 1b — Weight: Each predicted particle is weighted by how well the measurement matches. A particle that's consistent with the sensor reading gets a high weight; one that contradicts it gets a low weight.

Step 2 — Resample: Draw M new particles from the weighted set, with replacement. High-weight particles get duplicated; low-weight particles vanish. This is Darwinian selection — hypotheses that explain the data survive, others die.

The resampling trick: After resampling, all particles have equal weight. The distribution of particles approximates bel(x_t). Resampling prevents "particle starvation" — without it, a few particles would accumulate all the weight while the rest become irrelevant.

Check: What is the role of the importance weight w^[m] in the particle filter?

It determines how fast the particle moves It measures how well the particle's position explains the current measurement — the resampling probability It controls the number of particles

Chapter 6: PF Demo

Let's watch a particle filter track a robot in a 1D hallway. The robot starts with global uncertainty (particles spread everywhere), then moves and senses landmarks. Watch how particles cluster around the true position after a few measurements.

1D Particle Filter Localization

Green line = true position. Orange ticks = particles. The robot senses distance to landmarks (blue triangles). Click Step to advance, or Auto-run to animate.

t = 0

Particles M200

What to notice: (1) Initially, particles are spread uniformly — the robot has no idea where it is. (2) After the first measurement, particles cluster near positions consistent with the reading. Multiple clusters form because several positions match. (3) After a few more steps of moving and sensing, one cluster dominates — the robot has localized itself. This is multi-modal to unimodal convergence, something a Kalman filter could never do.

Check: After resampling, what happens to particles in low-probability regions?

They are eliminated (not resampled) — computation focuses on high-probability regions They stay but get higher weights They are moved to the nearest high-probability region

Chapter 7: Importance Sampling

The theoretical foundation of particle filters is importance sampling. The idea: we want to sample from a target distribution f (the posterior), but we can't do it directly. Instead, we sample from a proposal distribution g (the motion model), then correct using weights.

w^[m] = f(x^[m]) / g(x^[m])

In the particle filter:

• The target f is the full posterior bel(x_t) = η p(z_t|x_t) bel̅(x_t)

• The proposal g is the prediction bel̅(x_t) = p(x_t|u_t, x_t-1)

• The weight simplifies to w = f/g = p(z_t|x_t) — the measurement likelihood

Why this works: We sample particles from the motion model (easy to do), then weight them by the measurement likelihood (easy to evaluate). The weighted particles approximate the posterior. Resampling converts weighted particles into unweighted ones, concentrating them in high-posterior regions. This two-step process — sample from proposal, correct by weights — is the essence of importance sampling.

Alternative proposals (beyond the motion model) can improve particle filter performance. The optimal proposal incorporates the measurement into the proposal, generating particles that are more likely to be in high-weight regions. This reduces the variance of the weights but is harder to sample from.

Check: In the standard particle filter, what is the proposal distribution?

The full posterior The motion model p(x_t | u_t, x_t-1) — we move particles according to the dynamics The measurement model

Chapter 8: PF Properties

Particle filters have several important properties that make them widely used:

Property	Description
Consistency	As M → ∞, the particle approximation converges to the true posterior
Multi-modal	Can represent any number of peaks, irregular shapes, heavy tails
Anytime	More particles = better approximation; can trade quality for speed
Simple	Only need to sample from motion model and evaluate measurement likelihood
Resource-adaptive	Can adjust M based on available computation time

Known limitations:

Particle depletion: In high-dimensional spaces, particles spread too thin. If no particle is near the true state, the filter cannot recover. This is the curse of dimensionality for particle filters.

Sample impoverishment: After many resamplings, the particle set may lose diversity — many particles become copies of the same few ancestors. Techniques like low-variance resampling and roughening help.

No global localization recovery: Once all particles cluster in the wrong place, the filter is stuck. Adding random particles occasionally (as in MCL, Chapter 8) helps.

Rule of thumb: The number of particles needed grows exponentially with the dimension of the state space. For a 3D robot pose (x, y, θ), a few hundred particles suffice. For a 100-dimensional SLAM state, particle filters struggle — which is why EKF or information-form methods dominate in SLAM.

Check: What is "particle depletion" and when does it occur?

When no particles are near the true state in high-dimensional spaces — the filter can't recover When too many particles are added When the sensor noise is zero

Chapter 9: Comparison

Let's compare all the filter families we've seen:

Filter	Belief Shape	State Space	Scaling	Best For
Kalman (Ch 3)	Unimodal Gaussian	Continuous	O(n²)	Linear/mildly nonlinear tracking
Histogram (Ch 4)	Arbitrary (discrete)	Discrete/gridded	O(K²)	Low-dimensional, global localization
Particle (Ch 4)	Arbitrary (sampled)	Continuous	O(M)	Multi-modal, moderate dimensions

How to choose: If the belief is roughly Gaussian and the system is mildly nonlinear, use an EKF — it's the cheapest and most accurate. If you need multi-modal beliefs and the state is low-dimensional (≤ 3D), a histogram filter or particle filter works. For moderate dimensions (3-6D) with multi-modal beliefs, particle filters are the standard choice. For very high dimensions, you'll likely need parametric methods (EKF, information filter) with clever approximations.

Check: For global robot localization in a 2D map (multi-modal belief), which filter is most appropriate?

Kalman filter Information filter Particle filter (or histogram filter) — they can represent multiple hypotheses

Chapter 10: Summary

Nonparametric filters break free of the Gaussian assumption. They can represent arbitrary beliefs — multi-modal, skewed, heavy-tailed — at the cost of more computation.

Method	When to use
Histogram filter	Discrete or low-dimensional continuous states; global localization on a grid
Binary Bayes / log-odds	Static binary state estimation (e.g., occupancy mapping)
Particle filter	Multi-modal continuous beliefs; moderate dimensions; the workhorse of robot localization

What comes next: With filters covered (Chapters 2-4), the next two chapters develop the motion model p(x_t|u_t,x_t-1) and measurement model p(z_t|x_t) — the two probability distributions that every Bayes filter needs. Chapter 5 covers robot motion; Chapter 6 covers sensor measurements.

"The particle filter is to nonlinear estimation
what Monte Carlo is to integration:
simple, general, and surprisingly effective."

Check: What is the fundamental trade-off in nonparametric filters compared to Gaussian filters?

They can represent arbitrary belief shapes but require more computation They are faster but less accurate They only work in 1D