The Complete Beginner's Path

Understand the Kalman
Filter

The algorithm that guided Apollo to the Moon, helps your phone know where you are, and is the backbone of every self-driving car's perception system.

Prerequisites: Basic algebra + Intuition for uncertainty. That's it.
11
Chapters
10+
Simulations
0
Assumed Knowledge

Chapter 0: Why Estimation?

Imagine you're tracking a ball rolling across a table. You have two sources of information: a physics model that predicts where the ball should be, and a camera that measures where it actually is. The problem? The physics model is approximate (friction, air resistance, imperfect initial conditions), and the camera is noisy (pixel jitter, lighting changes, motion blur).

Neither source alone is reliable. The physics model drifts over time. The camera jumps around randomly. But what if you could combine them in the smartest possible way? That's what the Kalman filter does: it fuses noisy predictions with noisy measurements to produce an estimate that's better than either alone.

The core idea: You have two uncertain guesses. The Kalman filter combines them, weighting each by how confident you are, to produce an optimal estimate. It's like asking two friends for directions — you trust the local more than the tourist.
Noisy Measurements vs Truth

The teal line is the true trajectory. The red dots are noisy camera measurements. Notice how scattered they are.

Check: Why can't we just use the raw camera measurements?

Chapter 1: Gaussians — The Language of Uncertainty

Everything the Kalman filter does is expressed in the language of Gaussian distributions (bell curves). A Gaussian is defined by just two numbers: the mean (your best guess) and the variance (how uncertain you are). A small variance means you're confident. A large variance means you're shrugging your shoulders.

Why Gaussians? Because they have a magical property: the combination of two Gaussians is another Gaussian. This means the Kalman filter can fuse predictions and measurements using simple arithmetic — no expensive computations, no approximations. It's exact.

p(x) = (1 / √(2πσ²)) · exp( −(x − μ)² / 2σ² )
Interactive Gaussian

Drag the sliders to change the mean (μ) and standard deviation (σ). Watch how the bell curve reshapes.

Mean μ0.0
Std dev σ1.0
Key insight: Small σ = tall, narrow peak = "I'm very sure." Large σ = short, wide hump = "I have no idea." The Kalman filter is fundamentally about shrinking σ by combining information.
Check: What do the two parameters of a Gaussian represent?

Chapter 2: The State — What Are We Tracking?

Before we can filter anything, we need to decide: what are we tracking? For a ball rolling on a table, we care about its position and its velocity. Together, these form the state vector x. The state is not what we observe — it's what we believe about the world.

We also keep track of our uncertainty about the state: the covariance matrix P. If P is small, we're confident. If P is large, we're unsure. The Kalman filter updates both x and P at every step.

State Vector x
x = [position, velocity]¹
Covariance P
P = [[σ²pos, σpv], [σpv, σ²vel]]
Together
A Gaussian belief: N(x, P)
1D example: If x = [5, 2], we believe the ball is at position 5 moving at speed 2. The covariance P tells us how confident we are in each of those numbers, and whether errors in position and velocity are correlated.
Check: What does the state vector represent?

Chapter 3: The Predict Step

At each timestep, before we look at any measurement, we use physics to predict where the ball should be. If the ball was at position 5 with velocity 2, then after one timestep it should be at position 7. This is the predict step.

The state transition matrix F encodes our motion model. For constant velocity: new_position = old_position + velocity × dt. But our model isn't perfect, so we add process noise Q to account for things we can't model (wind, bumps, etc.). Uncertainty always grows during prediction.

x̂¯ = F · x̂ + B · u       P¯ = F · P · F¹ + Q
Intuition: Prediction is like dead reckoning. A sailor says "I was here, going this fast, so I must be there now." It works short-term but drifts without corrections. The longer you predict without measuring, the more uncertain you become.
Predict step grows uncertainty (blue), Update step shrinks it (green)
SymbolMeaning
FState transition matrix (physics model)
BControl input matrix
uControl input (e.g., applied force)
QProcess noise covariance (model uncertainty)
Predicted covariance (uncertainty after predict)
Check: What happens to uncertainty during the predict step?

Chapter 4: The Measurement

Now the camera takes a picture and says "I see the ball at position 7.3." But we know the camera is noisy — it might be off by a bit. The measurement z has its own uncertainty, captured by the measurement noise covariance R.

The observation matrix H maps our state to what the sensor actually sees. If we're tracking [position, velocity] but the camera only measures position, then H = [1, 0] — it picks out just the position part.

z = H · x + v      where v ~ N(0, R)
Example: Our state is [pos, vel] = [7, 2]. The camera measures position only, so H = [1, 0]. The true measurement would be 7, but the camera reports z = 7.3 because of noise with variance R.
Measurement Noise

The teal line is the true position. Red dots are noisy measurements. Adjust R to see the effect.

Noise R1.0
Check: What does the matrix H do?

Chapter 5: The Kalman Gain

Here's the million-dollar question: how much should we trust the measurement vs our prediction? The Kalman gain K answers this. It's a number between 0 and 1 (roughly speaking) that acts as a trust slider.

When your sensor is very accurate (R is small), K is large — you trust the measurement. When your prediction is very confident (P is small), K is small — you trust the prediction. The formula automatically balances these:

K = P¯ H¹ (H P¯ H¹ + R)¯¹
K is a TRUST SLIDER. K near 1 = "I believe the sensor." K near 0 = "I believe my prediction." The beauty is that K is computed automatically from the uncertainties — you never have to tune it by hand.
Interactive Kalman Gain

Adjust measurement noise R and prediction uncertainty P to see how the Kalman gain K responds. (1D case: K = P / (P + R))

Pred. uncert. P2.0
Meas. noise R2.0
K = P / (P + R) = 0.50
Check: If your sensor is very noisy (large R), what happens to K?

Chapter 6: The Update Step

Now we fuse the prediction and measurement. The innovation (or residual) is the difference between what we measured and what we expected: y = z − H x̂¯. The Kalman gain K scales this innovation and adds it to our prediction:

x̂ = x̂¯ + K(z − H x̂¯)       P = (I − K H) P¯

The updated covariance P is always smaller than the predicted covariance P¯. This is the magic: every measurement, no matter how noisy, makes our estimate more certain. Information only accumulates; it never destroys.

Key insight: The posterior Gaussian is ALWAYS narrower than either the prediction or the measurement alone. Two uncertain estimates combined yield a more certain result. This is provably optimal for linear Gaussian systems.
Gaussian Fusion

The blue Gaussian is the prediction. The red is the measurement. The green is the fused result — always the narrowest!

Predict μ-1.0
Predict σ1.5
Measure μ1.0
Measure σ1.0
Check: After the update step, how does the uncertainty compare?

Chapter 7: The Full Algorithm

That's it. The entire Kalman filter is just two steps repeated forever: Predict (use physics to guess forward, uncertainty grows) then Update (incorporate a measurement, uncertainty shrinks). The predict-update cycle is the heartbeat of the filter.

Initialize
Set x̂ (initial guess) and P (initial uncertainty)
Predict
x̂¯ = F x̂ + B u    P¯ = F P F¹ + Q
Kalman Gain
K = P¯ H¹ (H P¯ H¹ + R)¯¹
Update
x̂ = x̂¯ + K(z − H x̂¯)    P = (I − KH) P¯
↓ repeat

The Complete KF in Python

Python
import numpy as np

def kalman_filter(x, P, F, H, Q, R, measurements):
    """Run a Kalman filter over a sequence of measurements."""
    estimates = []
    for z in measurements:
        # ── Predict ──
        x_pred = F @ x                    # state prediction
        P_pred = F @ P @ F.T + Q          # covariance prediction

        # ── Update ──
        y = z - H @ x_pred                # innovation (residual)
        S = H @ P_pred @ H.T + R          # innovation covariance
        K = P_pred @ H.T @ np.linalg.inv(S)  # Kalman gain
        x = x_pred + K @ y                # updated state
        P = (np.eye(len(x)) - K @ H) @ P_pred  # updated covariance

        estimates.append(x.copy())
    return estimates

Variable Reference

VariableShapeMeaning
xn×1State estimate (what we believe)
Pn×nState covariance (our uncertainty)
Fn×nState transition (physics model)
Hm×nObservation matrix (state → measurement)
Qn×nProcess noise (model uncertainty)
Rm×mMeasurement noise (sensor uncertainty)
Kn×mKalman gain (trust slider)
zm×1Measurement vector (what we observe)
Check: In what order do the two main steps execute?

Chapter 8: Track a Moving Object

Time to see the Kalman filter in action. Below, a ball bounces around the canvas. The gray circle is the true ball. The red dots are noisy camera measurements. The green circle is the Kalman filter's estimate, and the green ellipse shows the uncertainty.

Play with the sliders to see what happens when the sensor is noisy or the model is uncertain. Watch how the filter smoothly tracks the ball even when individual measurements are wildly off.

Live Kalman Tracker
Process noise Q0.50
Meas. noise R15.0
Step: 0
Experiment: Crank R way up (noisy sensor). Notice the green estimate barely flinches — it trusts the model. Now crank R way down (perfect sensor). The estimate snaps to each measurement. That's the Kalman gain at work.
Check: When you increase measurement noise R, the Kalman filter...

Chapter 9: Sensor Fusion

Real systems rarely have just one sensor. Your phone has GPS, an accelerometer, a gyroscope, a barometer, and Wi-Fi signal strength — all measuring different things with different noise characteristics. The Kalman filter handles this naturally: each sensor provides a measurement z with its own H and R, and the filter fuses them all.

Below, two sensors track a 1D target. Sensor A (like GPS) has low update rate but decent accuracy. Sensor B (like an accelerometer) updates fast but drifts. The Kalman filter combines both to outperform either alone.

Two-Sensor Fusion
Sensor A noise2.0
Sensor B noise5.0
Why not just average? Averaging treats both sensors equally. The Kalman filter weights them by their precision: a sensor with half the noise gets twice the influence. It's the optimal weighting — provably, no algorithm can do better.
Check: When fusing two sensors, the Kalman filter weights each sensor by...

Chapter 10: Limitations & Beyond

The Kalman filter is optimal — but only under two assumptions: the system is linear and the noise is Gaussian. What if your system is nonlinear (a rocket re-entering the atmosphere, a car turning)? What if the noise distribution has heavy tails or multiple modes?

Several extensions address these limitations. Each trades off accuracy against computational cost. The original KF remains the gold standard when its assumptions hold — which is surprisingly often in practice.

FilterHandlesHowCost
KFLinear + GaussianExact equationsVery low
EKFMildly nonlinearLinearize with JacobiansLow
UKFMore nonlinearSigma points (no Jacobians)Medium
ParticleAny distributionMonte Carlo samplingHigh
Where from here? The Kalman filter is a gateway to a huge family of estimation techniques. If you understand predict→update and the idea of fusing uncertain information, you have the conceptual foundation for all of them.
In the wild: Apollo navigation (1960s), GPS receivers, drone autopilots, self-driving cars (sensor fusion), financial time series, weather prediction, robot SLAM, phone orientation tracking — the Kalman filter is everywhere.
"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem."
— John Tukey

You now understand optimal estimation. Every sensor reading you encounter is noisy. Now you know how to see through the noise.