The algorithm that guided Apollo to the Moon, helps your phone know where you are, and is the backbone of every self-driving car's perception system.
Imagine you're tracking a ball rolling across a table. You have two sources of information: a physics model that predicts where the ball should be, and a camera that measures where it actually is. The problem? The physics model is approximate (friction, air resistance, imperfect initial conditions), and the camera is noisy (pixel jitter, lighting changes, motion blur).
Neither source alone is reliable. The physics model drifts over time. The camera jumps around randomly. But what if you could combine them in the smartest possible way? That's what the Kalman filter does: it fuses noisy predictions with noisy measurements to produce an estimate that's better than either alone.
The teal line is the true trajectory. The red dots are noisy camera measurements. Notice how scattered they are.
Everything the Kalman filter does is expressed in the language of Gaussian distributions (bell curves). A Gaussian is defined by just two numbers: the mean (your best guess) and the variance (how uncertain you are). A small variance means you're confident. A large variance means you're shrugging your shoulders.
Why Gaussians? Because they have a magical property: the combination of two Gaussians is another Gaussian. This means the Kalman filter can fuse predictions and measurements using simple arithmetic — no expensive computations, no approximations. It's exact.
Drag the sliders to change the mean (μ) and standard deviation (σ). Watch how the bell curve reshapes.
Before we can filter anything, we need to decide: what are we tracking? For a ball rolling on a table, we care about its position and its velocity. Together, these form the state vector x. The state is not what we observe — it's what we believe about the world.
We also keep track of our uncertainty about the state: the covariance matrix P. If P is small, we're confident. If P is large, we're unsure. The Kalman filter updates both x and P at every step.
At each timestep, before we look at any measurement, we use physics to predict where the ball should be. If the ball was at position 5 with velocity 2, then after one timestep it should be at position 7. This is the predict step.
The state transition matrix F encodes our motion model. For constant velocity: new_position = old_position + velocity × dt. But our model isn't perfect, so we add process noise Q to account for things we can't model (wind, bumps, etc.). Uncertainty always grows during prediction.
| Symbol | Meaning |
|---|---|
| F | State transition matrix (physics model) |
| B | Control input matrix |
| u | Control input (e.g., applied force) |
| Q | Process noise covariance (model uncertainty) |
| P¯ | Predicted covariance (uncertainty after predict) |
Now the camera takes a picture and says "I see the ball at position 7.3." But we know the camera is noisy — it might be off by a bit. The measurement z has its own uncertainty, captured by the measurement noise covariance R.
The observation matrix H maps our state to what the sensor actually sees. If we're tracking [position, velocity] but the camera only measures position, then H = [1, 0] — it picks out just the position part.
The teal line is the true position. Red dots are noisy measurements. Adjust R to see the effect.
Here's the million-dollar question: how much should we trust the measurement vs our prediction? The Kalman gain K answers this. It's a number between 0 and 1 (roughly speaking) that acts as a trust slider.
When your sensor is very accurate (R is small), K is large — you trust the measurement. When your prediction is very confident (P is small), K is small — you trust the prediction. The formula automatically balances these:
Adjust measurement noise R and prediction uncertainty P to see how the Kalman gain K responds. (1D case: K = P / (P + R))
Now we fuse the prediction and measurement. The innovation (or residual) is the difference between what we measured and what we expected: y = z − H x̂¯. The Kalman gain K scales this innovation and adds it to our prediction:
The updated covariance P is always smaller than the predicted covariance P¯. This is the magic: every measurement, no matter how noisy, makes our estimate more certain. Information only accumulates; it never destroys.
The blue Gaussian is the prediction. The red is the measurement. The green is the fused result — always the narrowest!
That's it. The entire Kalman filter is just two steps repeated forever: Predict (use physics to guess forward, uncertainty grows) then Update (incorporate a measurement, uncertainty shrinks). The predict-update cycle is the heartbeat of the filter.
Python import numpy as np def kalman_filter(x, P, F, H, Q, R, measurements): """Run a Kalman filter over a sequence of measurements.""" estimates = [] for z in measurements: # ── Predict ── x_pred = F @ x # state prediction P_pred = F @ P @ F.T + Q # covariance prediction # ── Update ── y = z - H @ x_pred # innovation (residual) S = H @ P_pred @ H.T + R # innovation covariance K = P_pred @ H.T @ np.linalg.inv(S) # Kalman gain x = x_pred + K @ y # updated state P = (np.eye(len(x)) - K @ H) @ P_pred # updated covariance estimates.append(x.copy()) return estimates
| Variable | Shape | Meaning |
|---|---|---|
| x | n×1 | State estimate (what we believe) |
| P | n×n | State covariance (our uncertainty) |
| F | n×n | State transition (physics model) |
| H | m×n | Observation matrix (state → measurement) |
| Q | n×n | Process noise (model uncertainty) |
| R | m×m | Measurement noise (sensor uncertainty) |
| K | n×m | Kalman gain (trust slider) |
| z | m×1 | Measurement vector (what we observe) |
Time to see the Kalman filter in action. Below, a ball bounces around the canvas. The gray circle is the true ball. The red dots are noisy camera measurements. The green circle is the Kalman filter's estimate, and the green ellipse shows the uncertainty.
Play with the sliders to see what happens when the sensor is noisy or the model is uncertain. Watch how the filter smoothly tracks the ball even when individual measurements are wildly off.
Real systems rarely have just one sensor. Your phone has GPS, an accelerometer, a gyroscope, a barometer, and Wi-Fi signal strength — all measuring different things with different noise characteristics. The Kalman filter handles this naturally: each sensor provides a measurement z with its own H and R, and the filter fuses them all.
Below, two sensors track a 1D target. Sensor A (like GPS) has low update rate but decent accuracy. Sensor B (like an accelerometer) updates fast but drifts. The Kalman filter combines both to outperform either alone.
The Kalman filter is optimal — but only under two assumptions: the system is linear and the noise is Gaussian. What if your system is nonlinear (a rocket re-entering the atmosphere, a car turning)? What if the noise distribution has heavy tails or multiple modes?
Several extensions address these limitations. Each trades off accuracy against computational cost. The original KF remains the gold standard when its assumptions hold — which is surprisingly often in practice.
| Filter | Handles | How | Cost |
|---|---|---|---|
| KF | Linear + Gaussian | Exact equations | Very low |
| EKF | Mildly nonlinear | Linearize with Jacobians | Low |
| UKF | More nonlinear | Sigma points (no Jacobians) | Medium |
| Particle | Any distribution | Monte Carlo sampling | High |
You now understand optimal estimation. Every sensor reading you encounter is noisy. Now you know how to see through the noise.