Ch 3: Gaussian Filters — Probabilistic Robotics

Chapter 0: Why Gaussian?

The Bayes filter from Chapter 2 is beautiful in theory but impractical for continuous state spaces — the integral in the prediction step doesn't have a closed form in general. We need a tractable representation for beliefs.

Enter the Gaussian (normal distribution). A multivariate Gaussian is fully described by just two things: a mean vector μ (where the peak is) and a covariance matrix Σ (how spread out and correlated the uncertainty is). For a 3D robot pose (x, y, θ), that's just 3 numbers for the mean and 6 for the covariance (since Σ is symmetric).

p(x) = det(2πΣ)^-½ exp(-½(x-μ)^TΣ^-1(x-μ))

The magical property: If you start with a Gaussian belief and apply a linear motion model with Gaussian noise, and observe through a linear measurement model with Gaussian noise, then the updated belief is also Gaussian. The entire Bayes filter reduce to simple matrix algebra. This is the Kalman filter.

The Gaussian representation is enormously efficient. While a histogram filter might need millions of grid cells to represent a 3D belief, the Kalman filter needs only 9 numbers (3 for μ, 6 for Σ). The trade-off: it can only represent unimodal beliefs. It cannot handle the "I might be at position A or position B" scenario from Chapter 2. For that, we'll need particle filters (Chapter 4).

Check: What is the key limitation of representing beliefs as Gaussians?

Gaussians require too much memory Gaussians are unimodal — they can only represent one "peak" of probability Gaussians only work in 1D

Chapter 1: Linear Gaussian Systems

The Kalman filter assumes the world is a linear Gaussian system. This means:

Motion model: The next state is a linear function of the current state plus the control, with additive Gaussian noise:

x_t = A_t x_t-1 + B_t u_t + ε_t, ε_t ~ N(0, R_t)

Measurement model: The observation is a linear function of the state, with additive Gaussian noise:

z_t = C_t x_t + δ_t, δ_t ~ N(0, Q_t)

Symbol	Name	Size	What it does
A_t	State transition matrix	n × n	How state evolves (dynamics)
B_t	Control input matrix	n × m	How controls affect state
R_t	Process noise covariance	n × n	How much motion noise
C_t	Measurement matrix	k × n	What we observe from state
Q_t	Measurement noise covariance	k × k	How much sensor noise

Example: A robot moving in 1D. State x = [position, velocity]. If the robot accelerates with command u, then A = [[1, dt], [0, 1]], B = [[0.5 dt²], [dt]], and C = [1, 0] (we observe position but not velocity). The noise matrices R and Q capture how uncertain the motion and sensing are.

Check: In the linear Gaussian model, what does the matrix C_t represent?

The state transition dynamics The process noise covariance The measurement matrix — it maps from state space to observation space

Chapter 2: The Kalman Filter Algorithm

Here is the complete Kalman filter. It's the Bayes filter, specialized to linear Gaussians.

pseudocode
# Kalman Filter
Input: μ_t-1, Σ_t-1, u_t, z_t

# Prediction step
μ̅_t = A_t μ_t-1 + B_t u_t
Σ̅_t = A_t Σ_t-1 A_t^T + R_t

# Kalman gain
K_t = Σ̅_t C_t^T (C_t Σ̅_t C_t^T + Q_t)^-1

# Update step
μ_t = μ̅_t + K_t (z_t − C_t μ̅_t)
Σ_t = (I − K_t C_t) Σ̅_t

return μ_t, Σ_t

Four matrix operations. That's the entire Kalman filter. Let's understand each line:

Prediction: The mean moves according to the dynamics (Aμ + Bu). The covariance grows: the old covariance is rotated/stretched by A, then process noise R is added. Bigger R = more motion uncertainty.

Kalman gain K: This is the magic number. It determines how much to trust the measurement vs the prediction. If the measurement is very precise (Q is small), K is large — trust the sensor. If the prediction is very precise (Σ̅ is small), K is small — trust the dynamics.

Update: The mean gets "pulled" toward the measurement by the innovation (z - Cμ̅), weighted by K. The covariance shrinks — the measurement always reduces uncertainty.

K is a trust slider: When K = 0, we ignore the measurement entirely. When K = I (identity), we replace the prediction entirely with the measurement. In practice, K finds the optimal balance. This is the sense in which the Kalman filter is "optimal" — among all linear estimators, it minimizes the mean squared error.

Check: If the measurement noise Q is very large, what happens to the Kalman gain K?

K becomes small — we trust the measurement less K becomes large — we trust the measurement more K stays the same regardless of Q

Chapter 3: KF Demo

Let's watch the Kalman filter track a moving object. The object moves along a 1D line with constant velocity plus random acceleration noise. The robot observes the position with Gaussian noise. Click Step to advance time, or Auto-run to animate.

1D Kalman Filter Tracking

True state (teal), noisy measurements (orange dots), KF estimate ± 2σ (shaded band). Watch the filter fuse predictions and measurements.

t = 0

Meas. noise σ_z2.0

What to notice: (1) The KF estimate (warm line) stays between the true state and the noisy measurements — it's smoother than the measurements but tracks the truth. (2) The uncertainty band narrows quickly in the first few steps, then stabilizes at the steady-state Kalman gain. (3) Increasing measurement noise makes the filter rely more on the dynamics model and less on the sensor.

Check: What happens to the KF uncertainty band as more measurements arrive?

It narrows and stabilizes at a steady-state width It keeps growing forever It shrinks to exactly zero

Chapter 4: KF Derivation

Where do those Kalman filter equations come from? They are the Bayes filter, with Gaussian algebra doing the heavy lifting.

Prediction step: The old belief is N(μ_t-1, Σ_t-1). The motion model says x_t = Ax_t-1 + Bu_t + ε. Since a linear transformation of a Gaussian is Gaussian, and the sum of independent Gaussians is Gaussian:

bel̅(x_t) = N(Aμ_t-1 + Bu_t, AΣ_t-1A^T + R)

The mean shifts by the dynamics, and the covariance is stretched by A and inflated by R.

Update step: We have bel̅(x_t) = N(μ̅, Σ̅) and the measurement z_t = Cx_t + δ. The key mathematical fact: the product of two Gaussians is a Gaussian. Multiplying the prediction (prior) by the likelihood gives:

μ_t = μ̅ + K(z - Cμ̅), Σ_t = (I - KC)Σ̅

Where K = Σ̅C^T(CΣ̅C^T + Q)^-1 falls out of the algebra as the optimal weighting.

The product-of-Gaussians trick: If p(x) = N(μ₁, Σ₁) and p(z|x) = N(Cx, Q), then p(x|z) is Gaussian with precisely the Kalman filter's update equations. This fact — that Gaussians are "closed" under Bayesian updating with linear observations — is what makes the Kalman filter exact. No approximation is involved, as long as the linearity and Gaussianity assumptions hold.

Check: Why is the Kalman filter exact (no approximation) for linear Gaussian systems?

Because it uses very many particles Because the product of Gaussians with linear transformations is always Gaussian — the closed-form update is exact Because it iterates until convergence

Chapter 5: The EKF Idea

Real-world robots are rarely linear. A wheeled robot turning in a circle has nonlinear motion. A camera projecting 3D points onto a 2D image is nonlinear. The standard Kalman filter breaks down because the prediction and update steps rely on linearity.

The Extended Kalman Filter (EKF) handles this by linearizing: at each time step, it approximates the nonlinear functions with their first-order Taylor expansions around the current estimate.

x_t = g(u_t, x_t-1) + ε_t (nonlinear motion)

z_t = h(x_t) + δ_t (nonlinear measurement)

The EKF replaces the constant matrices A and C with Jacobians — matrices of partial derivatives evaluated at the current mean:

G_t = ∂g / ∂x |_{μ_t-1} (replaces A)

H_t = ∂h / ∂x |_{μ̅_t} (replaces C)

Key insight: The EKF pretends the system is locally linear at each time step. Near the current estimate, a smooth nonlinear function looks like a straight line. The Jacobian captures the slope of that line. This is an approximation — it works well when the nonlinearity is mild and the uncertainty is small, but can diverge if the system is highly nonlinear or the belief is very spread out.

Check: How does the EKF handle nonlinear motion and measurement models?

It linearizes them using first-order Taylor expansion (Jacobians) at the current estimate It uses particle sampling to approximate the nonlinearity It discretizes the state space into a grid

Chapter 6: EKF Algorithm

pseudocode
# Extended Kalman Filter
Input: μ_t-1, Σ_t-1, u_t, z_t

# Prediction step (use nonlinear g, but Jacobian G for covariance)
μ̅_t = g(u_t, μ_t-1)
G_t = ∂g/∂x |_{μ_t-1}
Σ̅_t = G_t Σ_t-1 G_t^T + R_t

# Kalman gain (use Jacobian H for measurement)
H_t = ∂h/∂x |_{μ̅_t}
K_t = Σ̅_t H_t^T (H_t Σ̅_t H_t^T + Q_t)^-1

# Update step (use nonlinear h for innovation)
μ_t = μ̅_t + K_t (z_t − h(μ̅_t))
Σ_t = (I − K_t H_t) Σ̅_t

return μ_t, Σ_t

Notice the pattern: the mean is propagated through the true nonlinear functions g and h. The covariance is propagated through the linearized versions (Jacobians G and H). This hybrid approach retains the accuracy of the nonlinear model for the best estimate while using linear algebra for the uncertainty.

EKF vs KF: The only difference is that A is replaced by the Jacobian G (evaluated at μ_t-1) and C is replaced by the Jacobian H (evaluated at μ̅_t). The structure — predict, compute gain, update — is identical. If g and h happen to be linear, the Jacobians are constant and the EKF reduces to the standard KF.

Practical note: Computing Jacobians by hand is tedious and error-prone. Modern implementations often use automatic differentiation or numerical Jacobians (finite differences). Some newer filters like the Unscented Kalman Filter (UKF) avoid Jacobians entirely by using carefully chosen "sigma points" to propagate the mean and covariance through the nonlinear function. The UKF often performs better than the EKF for highly nonlinear systems.

Check: In the EKF, the mean is propagated through ___ and the covariance through ___.

The nonlinear function; its Jacobian (linearized version) The Jacobian; the nonlinear function Both through the Jacobian

Chapter 7: The Information Filter

There's another way to represent a Gaussian: instead of (μ, Σ), use the information form (ξ, Ω) where:

Ω = Σ^-1 (information matrix / precision)

ξ = Σ^-1μ (information vector)

Same Gaussian, different parameterization. So why bother? Because the update step becomes trivially simple:

pseudocode
# Information Filter Update (measurement incorporation)
Ω_t = Ω̅_t + C^T Q^-1 C
ξ_t = ξ̅_t + C^T Q^-1 z_t

That's it. Just addition. No matrix inversion, no Kalman gain computation. The measurement information is simply added to the existing information.

Key insight: In the information form, incorporating measurements is just addition. This is because information from independent sources is additive. Two sensors that each contribute information simply add their contributions. This property makes the information filter ideal for multi-sensor fusion and for SLAM, where many landmark measurements must be combined efficiently.

The catch: the prediction step becomes expensive in the information form (it requires a matrix inversion), while in the moment form (μ, Σ) the prediction is cheap. So neither form is universally better — it depends on whether your bottleneck is prediction (use moments) or updates (use information).

Operation	Moment form (μ, Σ)	Information form (ξ, Ω)
Prediction	Cheap (matrix multiply)	Expensive (requires inversion)
Update	Moderate (Kalman gain)	Cheap (just addition)
Best for	Tracking with few sensors	SLAM, multi-sensor fusion

Check: What is the main advantage of the information form over the moment form?

Measurement updates are additive — just add new sensor information Predictions are faster It can represent multi-modal distributions

Chapter 8: Extended Information Filter

Just as the EKF extends the Kalman filter to nonlinear systems, the Extended Information Filter (EIF) extends the information filter. The same linearization trick applies: replace the constant matrices A and C with Jacobians G and H.

The EIF is particularly important for SLAM (Chapters 10-12). The information matrix Ω in a SLAM problem has a special structure: it is sparse. Most landmarks are not directly connected to each other — they only connect through the robot's pose. This sparsity makes storage and certain operations much cheaper.

Why sparsity matters: In the moment form (EKF-SLAM), the covariance matrix is dense — every landmark is correlated with every other landmark through the robot's pose uncertainty. In the information form (EIF-SLAM), the information matrix is sparse — only landmarks observed from the same pose are directly linked. This sparsity is the foundation of efficient SLAM algorithms like the Sparse Extended Information Filter (Chapter 12).

We'll revisit this in great detail in Chapters 11-12. For now, the key takeaway: the information form and the moment form are dual representations of the same Gaussian. The information form has computational advantages for problems with many measurements and sparse structure.

Check: Why is the information form (EIF) attractive for SLAM?

SLAM has only one landmark The information matrix is naturally sparse in SLAM, making storage and updates efficient The information form doesn't need linearization

Chapter 9: Comparison

Let's compare the four Gaussian filters we've seen:

Filter	Linearity	Representation	Prediction	Update
KF	Linear	Moments (μ, Σ)	O(n²)	O(kn²)
EKF	Nonlinear	Moments (μ, Σ)	O(n²) + Jacobian	O(kn²) + Jacobian
IF	Linear	Information (ξ, Ω)	O(n³)	O(k²n)
EIF	Nonlinear	Information (ξ, Ω)	O(n³)	O(k²n) + Jacobian

Where n is the state dimension and k is the measurement dimension. The moment form is cheaper for prediction-heavy problems. The information form is cheaper for update-heavy problems with sparse structure.

The bottom line: All Gaussian filters share the same limitation — they represent beliefs as single Gaussians. This means they cannot handle multi-modal beliefs (e.g., "I might be in the kitchen or the bedroom"). For that, we need nonparametric filters (Chapter 4): histograms or particles.

Check: When is the EKF preferred over the standard KF?

When the state space is discrete When the motion or measurement models are nonlinear When we need multi-modal beliefs

Chapter 10: Summary

Gaussian filters are the workhorse of probabilistic robotics. Their efficiency and mathematical elegance make them the first choice whenever the Gaussian assumption is reasonable.

Algorithm	When to use
Kalman Filter	Linear dynamics + linear sensing + Gaussian noise (gold standard)
Extended KF	Mildly nonlinear systems with small uncertainty
Information Filter	Multi-sensor fusion with linear models
Extended IF	SLAM and other problems with sparse information structure

What comes next: Chapter 4 introduces nonparametric filters that break free of the Gaussian assumption. Histogram filters discretize the state space; particle filters use random samples. These can represent arbitrary belief shapes — multiple peaks, heavy tails, irregular contours — at the cost of more computation.

"The Kalman filter is to estimation theory
what the transistor is to electronics."
— paraphrasing the community consensus

Check: What is the fundamental limitation shared by ALL Gaussian filters?

They can only represent unimodal (single-peaked) beliefs They require too much memory They don't work in real-time