Thrun et al., Chapter 3

Gaussian Filters

The Kalman filter and its extensions: when beliefs are Gaussians, inference is elegant and exact.

Prerequisites: Chapter 2 (Bayes filter), basic linear algebra (matrices, inverses).
11
Chapters
2
Simulations
11
Quizzes

Chapter 0: Why Gaussian?

The Bayes filter from Chapter 2 is beautiful in theory but impractical for continuous state spaces — the integral in the prediction step doesn't have a closed form in general. We need a tractable representation for beliefs.

Enter the Gaussian (normal distribution). A multivariate Gaussian is fully described by just two things: a mean vector μ (where the peak is) and a covariance matrix Σ (how spread out and correlated the uncertainty is). For a 3D robot pose (x, y, θ), that's just 3 numbers for the mean and 6 for the covariance (since Σ is symmetric).

p(x) = det(2πΣ) exp(-½(x-μ)TΣ-1(x-μ))
The magical property: If you start with a Gaussian belief and apply a linear motion model with Gaussian noise, and observe through a linear measurement model with Gaussian noise, then the updated belief is also Gaussian. The entire Bayes filter reduce to simple matrix algebra. This is the Kalman filter.

The Gaussian representation is enormously efficient. While a histogram filter might need millions of grid cells to represent a 3D belief, the Kalman filter needs only 9 numbers (3 for μ, 6 for Σ). The trade-off: it can only represent unimodal beliefs. It cannot handle the "I might be at position A or position B" scenario from Chapter 2. For that, we'll need particle filters (Chapter 4).

Check: What is the key limitation of representing beliefs as Gaussians?

Chapter 1: Linear Gaussian Systems

The Kalman filter assumes the world is a linear Gaussian system. This means:

Motion model: The next state is a linear function of the current state plus the control, with additive Gaussian noise:

xt = At xt-1 + Bt ut + εt,    εt ~ N(0, Rt)

Measurement model: The observation is a linear function of the state, with additive Gaussian noise:

zt = Ct xt + δt,    δt ~ N(0, Qt)
SymbolNameSizeWhat it does
AtState transition matrixn × nHow state evolves (dynamics)
BtControl input matrixn × mHow controls affect state
RtProcess noise covariancen × nHow much motion noise
CtMeasurement matrixk × nWhat we observe from state
QtMeasurement noise covariancek × kHow much sensor noise
Example: A robot moving in 1D. State x = [position, velocity]. If the robot accelerates with command u, then A = [[1, dt], [0, 1]], B = [[0.5 dt2], [dt]], and C = [1, 0] (we observe position but not velocity). The noise matrices R and Q capture how uncertain the motion and sensing are.
Check: In the linear Gaussian model, what does the matrix Ct represent?

Chapter 2: The Kalman Filter Algorithm

Here is the complete Kalman filter. It's the Bayes filter, specialized to linear Gaussians.

pseudocode
# Kalman Filter
Input: μt-1, Σt-1, ut, zt

# Prediction step
μ̅t = At μt-1 + Bt ut
Σ̅t = At Σt-1 AtT + Rt

# Kalman gain
Kt = Σ̅t CtT (Ct Σ̅t CtT + Qt)-1

# Update step
μt = μ̅t + Kt (zt − Ct μ̅t)
Σt = (I − Kt Ct) Σ̅t

return μt, Σt

Four matrix operations. That's the entire Kalman filter. Let's understand each line:

Prediction: The mean moves according to the dynamics (Aμ + Bu). The covariance grows: the old covariance is rotated/stretched by A, then process noise R is added. Bigger R = more motion uncertainty.

Kalman gain K: This is the magic number. It determines how much to trust the measurement vs the prediction. If the measurement is very precise (Q is small), K is large — trust the sensor. If the prediction is very precise (Σ̅ is small), K is small — trust the dynamics.

Update: The mean gets "pulled" toward the measurement by the innovation (z - Cμ̅), weighted by K. The covariance shrinks — the measurement always reduces uncertainty.

K is a trust slider: When K = 0, we ignore the measurement entirely. When K = I (identity), we replace the prediction entirely with the measurement. In practice, K finds the optimal balance. This is the sense in which the Kalman filter is "optimal" — among all linear estimators, it minimizes the mean squared error.
Check: If the measurement noise Q is very large, what happens to the Kalman gain K?

Chapter 3: KF Demo

Let's watch the Kalman filter track a moving object. The object moves along a 1D line with constant velocity plus random acceleration noise. The robot observes the position with Gaussian noise. Click Step to advance time, or Auto-run to animate.

1D Kalman Filter Tracking

True state (teal), noisy measurements (orange dots), KF estimate ± 2σ (shaded band). Watch the filter fuse predictions and measurements.

t = 0
Meas. noise σz2.0
What to notice: (1) The KF estimate (warm line) stays between the true state and the noisy measurements — it's smoother than the measurements but tracks the truth. (2) The uncertainty band narrows quickly in the first few steps, then stabilizes at the steady-state Kalman gain. (3) Increasing measurement noise makes the filter rely more on the dynamics model and less on the sensor.
Check: What happens to the KF uncertainty band as more measurements arrive?

Chapter 4: KF Derivation

Where do those Kalman filter equations come from? They are the Bayes filter, with Gaussian algebra doing the heavy lifting.

Prediction step: The old belief is N(μt-1, Σt-1). The motion model says xt = Axt-1 + But + ε. Since a linear transformation of a Gaussian is Gaussian, and the sum of independent Gaussians is Gaussian:

bel̅(xt) = N(Aμt-1 + But,   AΣt-1AT + R)

The mean shifts by the dynamics, and the covariance is stretched by A and inflated by R.

Update step: We have bel̅(xt) = N(μ̅, Σ̅) and the measurement zt = Cxt + δ. The key mathematical fact: the product of two Gaussians is a Gaussian. Multiplying the prediction (prior) by the likelihood gives:

μt = μ̅ + K(z - Cμ̅),    Σt = (I - KC)Σ̅

Where K = Σ̅CT(CΣ̅CT + Q)-1 falls out of the algebra as the optimal weighting.

The product-of-Gaussians trick: If p(x) = N(μ1, Σ1) and p(z|x) = N(Cx, Q), then p(x|z) is Gaussian with precisely the Kalman filter's update equations. This fact — that Gaussians are "closed" under Bayesian updating with linear observations — is what makes the Kalman filter exact. No approximation is involved, as long as the linearity and Gaussianity assumptions hold.
Check: Why is the Kalman filter exact (no approximation) for linear Gaussian systems?

Chapter 5: The EKF Idea

Real-world robots are rarely linear. A wheeled robot turning in a circle has nonlinear motion. A camera projecting 3D points onto a 2D image is nonlinear. The standard Kalman filter breaks down because the prediction and update steps rely on linearity.

The Extended Kalman Filter (EKF) handles this by linearizing: at each time step, it approximates the nonlinear functions with their first-order Taylor expansions around the current estimate.

xt = g(ut, xt-1) + εt     (nonlinear motion)
zt = h(xt) + δt     (nonlinear measurement)

The EKF replaces the constant matrices A and C with Jacobians — matrices of partial derivatives evaluated at the current mean:

Gt = ∂g / ∂x |μt-1     (replaces A)
Ht = ∂h / ∂x |μ̅t     (replaces C)
Key insight: The EKF pretends the system is locally linear at each time step. Near the current estimate, a smooth nonlinear function looks like a straight line. The Jacobian captures the slope of that line. This is an approximation — it works well when the nonlinearity is mild and the uncertainty is small, but can diverge if the system is highly nonlinear or the belief is very spread out.
Check: How does the EKF handle nonlinear motion and measurement models?

Chapter 6: EKF Algorithm

pseudocode
# Extended Kalman Filter
Input: μt-1, Σt-1, ut, zt

# Prediction step (use nonlinear g, but Jacobian G for covariance)
μ̅t = g(ut, μt-1)
Gt = ∂g/∂x |μt-1
Σ̅t = Gt Σt-1 GtT + Rt

# Kalman gain (use Jacobian H for measurement)
Ht = ∂h/∂x |μ̅t
Kt = Σ̅t HtT (Ht Σ̅t HtT + Qt)-1

# Update step (use nonlinear h for innovation)
μt = μ̅t + Kt (zt − h(μ̅t))
Σt = (I − Kt Ht) Σ̅t

return μt, Σt

Notice the pattern: the mean is propagated through the true nonlinear functions g and h. The covariance is propagated through the linearized versions (Jacobians G and H). This hybrid approach retains the accuracy of the nonlinear model for the best estimate while using linear algebra for the uncertainty.

EKF vs KF: The only difference is that A is replaced by the Jacobian G (evaluated at μt-1) and C is replaced by the Jacobian H (evaluated at μ̅t). The structure — predict, compute gain, update — is identical. If g and h happen to be linear, the Jacobians are constant and the EKF reduces to the standard KF.
Practical note: Computing Jacobians by hand is tedious and error-prone. Modern implementations often use automatic differentiation or numerical Jacobians (finite differences). Some newer filters like the Unscented Kalman Filter (UKF) avoid Jacobians entirely by using carefully chosen "sigma points" to propagate the mean and covariance through the nonlinear function. The UKF often performs better than the EKF for highly nonlinear systems.
Check: In the EKF, the mean is propagated through ___ and the covariance through ___.

Chapter 7: The Information Filter

There's another way to represent a Gaussian: instead of (μ, Σ), use the information form (ξ, Ω) where:

Ω = Σ-1    (information matrix / precision)
ξ = Σ-1μ    (information vector)

Same Gaussian, different parameterization. So why bother? Because the update step becomes trivially simple:

pseudocode
# Information Filter Update (measurement incorporation)
Ωt = Ω̅t + CT Q-1 C
ξt = ξ̅t + CT Q-1 zt

That's it. Just addition. No matrix inversion, no Kalman gain computation. The measurement information is simply added to the existing information.

Key insight: In the information form, incorporating measurements is just addition. This is because information from independent sources is additive. Two sensors that each contribute information simply add their contributions. This property makes the information filter ideal for multi-sensor fusion and for SLAM, where many landmark measurements must be combined efficiently.

The catch: the prediction step becomes expensive in the information form (it requires a matrix inversion), while in the moment form (μ, Σ) the prediction is cheap. So neither form is universally better — it depends on whether your bottleneck is prediction (use moments) or updates (use information).

OperationMoment form (μ, Σ)Information form (ξ, Ω)
PredictionCheap (matrix multiply)Expensive (requires inversion)
UpdateModerate (Kalman gain)Cheap (just addition)
Best forTracking with few sensorsSLAM, multi-sensor fusion
Check: What is the main advantage of the information form over the moment form?

Chapter 8: Extended Information Filter

Just as the EKF extends the Kalman filter to nonlinear systems, the Extended Information Filter (EIF) extends the information filter. The same linearization trick applies: replace the constant matrices A and C with Jacobians G and H.

The EIF is particularly important for SLAM (Chapters 10-12). The information matrix Ω in a SLAM problem has a special structure: it is sparse. Most landmarks are not directly connected to each other — they only connect through the robot's pose. This sparsity makes storage and certain operations much cheaper.

Why sparsity matters: In the moment form (EKF-SLAM), the covariance matrix is dense — every landmark is correlated with every other landmark through the robot's pose uncertainty. In the information form (EIF-SLAM), the information matrix is sparse — only landmarks observed from the same pose are directly linked. This sparsity is the foundation of efficient SLAM algorithms like the Sparse Extended Information Filter (Chapter 12).

We'll revisit this in great detail in Chapters 11-12. For now, the key takeaway: the information form and the moment form are dual representations of the same Gaussian. The information form has computational advantages for problems with many measurements and sparse structure.

Check: Why is the information form (EIF) attractive for SLAM?

Chapter 9: Comparison

Let's compare the four Gaussian filters we've seen:

FilterLinearityRepresentationPredictionUpdate
KFLinearMoments (μ, Σ)O(n2)O(kn2)
EKFNonlinearMoments (μ, Σ)O(n2) + JacobianO(kn2) + Jacobian
IFLinearInformation (ξ, Ω)O(n3)O(k2n)
EIFNonlinearInformation (ξ, Ω)O(n3)O(k2n) + Jacobian

Where n is the state dimension and k is the measurement dimension. The moment form is cheaper for prediction-heavy problems. The information form is cheaper for update-heavy problems with sparse structure.

The bottom line: All Gaussian filters share the same limitation — they represent beliefs as single Gaussians. This means they cannot handle multi-modal beliefs (e.g., "I might be in the kitchen or the bedroom"). For that, we need nonparametric filters (Chapter 4): histograms or particles.
Check: When is the EKF preferred over the standard KF?

Chapter 10: Summary

Gaussian filters are the workhorse of probabilistic robotics. Their efficiency and mathematical elegance make them the first choice whenever the Gaussian assumption is reasonable.

AlgorithmWhen to use
Kalman FilterLinear dynamics + linear sensing + Gaussian noise (gold standard)
Extended KFMildly nonlinear systems with small uncertainty
Information FilterMulti-sensor fusion with linear models
Extended IFSLAM and other problems with sparse information structure
What comes next: Chapter 4 introduces nonparametric filters that break free of the Gaussian assumption. Histogram filters discretize the state space; particle filters use random samples. These can represent arbitrary belief shapes — multiple peaks, heavy tails, irregular contours — at the cost of more computation.
"The Kalman filter is to estimation theory
what the transistor is to electronics."
— paraphrasing the community consensus
Check: What is the fundamental limitation shared by ALL Gaussian filters?