The Kalman filter and its extensions: when beliefs are Gaussians, inference is elegant and exact.
The Bayes filter from Chapter 2 is beautiful in theory but impractical for continuous state spaces — the integral in the prediction step doesn't have a closed form in general. We need a tractable representation for beliefs.
Enter the Gaussian (normal distribution). A multivariate Gaussian is fully described by just two things: a mean vector μ (where the peak is) and a covariance matrix Σ (how spread out and correlated the uncertainty is). For a 3D robot pose (x, y, θ), that's just 3 numbers for the mean and 6 for the covariance (since Σ is symmetric).
The Gaussian representation is enormously efficient. While a histogram filter might need millions of grid cells to represent a 3D belief, the Kalman filter needs only 9 numbers (3 for μ, 6 for Σ). The trade-off: it can only represent unimodal beliefs. It cannot handle the "I might be at position A or position B" scenario from Chapter 2. For that, we'll need particle filters (Chapter 4).
The Kalman filter assumes the world is a linear Gaussian system. This means:
Motion model: The next state is a linear function of the current state plus the control, with additive Gaussian noise:
Measurement model: The observation is a linear function of the state, with additive Gaussian noise:
| Symbol | Name | Size | What it does |
|---|---|---|---|
| At | State transition matrix | n × n | How state evolves (dynamics) |
| Bt | Control input matrix | n × m | How controls affect state |
| Rt | Process noise covariance | n × n | How much motion noise |
| Ct | Measurement matrix | k × n | What we observe from state |
| Qt | Measurement noise covariance | k × k | How much sensor noise |
Here is the complete Kalman filter. It's the Bayes filter, specialized to linear Gaussians.
pseudocode # Kalman Filter Input: μt-1, Σt-1, ut, zt # Prediction step μ̅t = At μt-1 + Bt ut Σ̅t = At Σt-1 AtT + Rt # Kalman gain Kt = Σ̅t CtT (Ct Σ̅t CtT + Qt)-1 # Update step μt = μ̅t + Kt (zt − Ct μ̅t) Σt = (I − Kt Ct) Σ̅t return μt, Σt
Four matrix operations. That's the entire Kalman filter. Let's understand each line:
Prediction: The mean moves according to the dynamics (Aμ + Bu). The covariance grows: the old covariance is rotated/stretched by A, then process noise R is added. Bigger R = more motion uncertainty.
Kalman gain K: This is the magic number. It determines how much to trust the measurement vs the prediction. If the measurement is very precise (Q is small), K is large — trust the sensor. If the prediction is very precise (Σ̅ is small), K is small — trust the dynamics.
Update: The mean gets "pulled" toward the measurement by the innovation (z - Cμ̅), weighted by K. The covariance shrinks — the measurement always reduces uncertainty.
Let's watch the Kalman filter track a moving object. The object moves along a 1D line with constant velocity plus random acceleration noise. The robot observes the position with Gaussian noise. Click Step to advance time, or Auto-run to animate.
True state (teal), noisy measurements (orange dots), KF estimate ± 2σ (shaded band). Watch the filter fuse predictions and measurements.
Where do those Kalman filter equations come from? They are the Bayes filter, with Gaussian algebra doing the heavy lifting.
Prediction step: The old belief is N(μt-1, Σt-1). The motion model says xt = Axt-1 + But + ε. Since a linear transformation of a Gaussian is Gaussian, and the sum of independent Gaussians is Gaussian:
The mean shifts by the dynamics, and the covariance is stretched by A and inflated by R.
Update step: We have bel̅(xt) = N(μ̅, Σ̅) and the measurement zt = Cxt + δ. The key mathematical fact: the product of two Gaussians is a Gaussian. Multiplying the prediction (prior) by the likelihood gives:
Where K = Σ̅CT(CΣ̅CT + Q)-1 falls out of the algebra as the optimal weighting.
Real-world robots are rarely linear. A wheeled robot turning in a circle has nonlinear motion. A camera projecting 3D points onto a 2D image is nonlinear. The standard Kalman filter breaks down because the prediction and update steps rely on linearity.
The Extended Kalman Filter (EKF) handles this by linearizing: at each time step, it approximates the nonlinear functions with their first-order Taylor expansions around the current estimate.
The EKF replaces the constant matrices A and C with Jacobians — matrices of partial derivatives evaluated at the current mean:
pseudocode # Extended Kalman Filter Input: μt-1, Σt-1, ut, zt # Prediction step (use nonlinear g, but Jacobian G for covariance) μ̅t = g(ut, μt-1) Gt = ∂g/∂x |μt-1 Σ̅t = Gt Σt-1 GtT + Rt # Kalman gain (use Jacobian H for measurement) Ht = ∂h/∂x |μ̅t Kt = Σ̅t HtT (Ht Σ̅t HtT + Qt)-1 # Update step (use nonlinear h for innovation) μt = μ̅t + Kt (zt − h(μ̅t)) Σt = (I − Kt Ht) Σ̅t return μt, Σt
Notice the pattern: the mean is propagated through the true nonlinear functions g and h. The covariance is propagated through the linearized versions (Jacobians G and H). This hybrid approach retains the accuracy of the nonlinear model for the best estimate while using linear algebra for the uncertainty.
There's another way to represent a Gaussian: instead of (μ, Σ), use the information form (ξ, Ω) where:
Same Gaussian, different parameterization. So why bother? Because the update step becomes trivially simple:
pseudocode # Information Filter Update (measurement incorporation) Ωt = Ω̅t + CT Q-1 C ξt = ξ̅t + CT Q-1 zt
That's it. Just addition. No matrix inversion, no Kalman gain computation. The measurement information is simply added to the existing information.
The catch: the prediction step becomes expensive in the information form (it requires a matrix inversion), while in the moment form (μ, Σ) the prediction is cheap. So neither form is universally better — it depends on whether your bottleneck is prediction (use moments) or updates (use information).
| Operation | Moment form (μ, Σ) | Information form (ξ, Ω) |
|---|---|---|
| Prediction | Cheap (matrix multiply) | Expensive (requires inversion) |
| Update | Moderate (Kalman gain) | Cheap (just addition) |
| Best for | Tracking with few sensors | SLAM, multi-sensor fusion |
Just as the EKF extends the Kalman filter to nonlinear systems, the Extended Information Filter (EIF) extends the information filter. The same linearization trick applies: replace the constant matrices A and C with Jacobians G and H.
The EIF is particularly important for SLAM (Chapters 10-12). The information matrix Ω in a SLAM problem has a special structure: it is sparse. Most landmarks are not directly connected to each other — they only connect through the robot's pose. This sparsity makes storage and certain operations much cheaper.
We'll revisit this in great detail in Chapters 11-12. For now, the key takeaway: the information form and the moment form are dual representations of the same Gaussian. The information form has computational advantages for problems with many measurements and sparse structure.
Let's compare the four Gaussian filters we've seen:
| Filter | Linearity | Representation | Prediction | Update |
|---|---|---|---|---|
| KF | Linear | Moments (μ, Σ) | O(n2) | O(kn2) |
| EKF | Nonlinear | Moments (μ, Σ) | O(n2) + Jacobian | O(kn2) + Jacobian |
| IF | Linear | Information (ξ, Ω) | O(n3) | O(k2n) |
| EIF | Nonlinear | Information (ξ, Ω) | O(n3) | O(k2n) + Jacobian |
Where n is the state dimension and k is the measurement dimension. The moment form is cheaper for prediction-heavy problems. The information form is cheaper for update-heavy problems with sparse structure.
Gaussian filters are the workhorse of probabilistic robotics. Their efficiency and mathematical elegance make them the first choice whenever the Gaussian assumption is reasonable.
| Algorithm | When to use |
|---|---|
| Kalman Filter | Linear dynamics + linear sensing + Gaussian noise (gold standard) |
| Extended KF | Mildly nonlinear systems with small uncertainty |
| Information Filter | Multi-sensor fusion with linear models |
| Extended IF | SLAM and other problems with sparse information structure |