Beyond Linearity

Understand the Extended
Kalman Filter

The Kalman filter assumes straight lines. Real systems curve, twist, and wrap around. The EKF handles all of that by approximating curves with tangent lines at each step.

Prerequisites: Kalman Filter basics + Derivatives (calculus I). Familiarity with matrices helps.
9
Chapters
6+
Simulations
1
Live Robot Demo

Chapter 0: When the World Is Nonlinear

The standard Kalman filter is beautiful — but it makes a critical assumption: the system must be linear. That means the next state is a matrix times the current state, and measurements are a matrix times the state. In math: x' = Fx and z = Hx.

Real systems almost never obey this. A radar station measures range and bearing to an aircraft — that's a square root and an arctangent, not a matrix multiply. A robot turns by an angle and drives forward — that involves sines and cosines. A satellite re-entering the atmosphere experiences exponential drag. All nonlinear.

The problem: The Kalman filter's predict and update equations assume linearity. If you plug nonlinear functions into F and H, the math breaks — Gaussians pushed through nonlinear functions are no longer Gaussian. We need a workaround.
Linear vs Nonlinear Transformation

The blue Gaussian is the input. Watch how a linear function preserves the bell shape, but a nonlinear function distorts it. Toggle between them.

Examples of nonlinear systems in the wild:

SystemNonlinearity
Radar trackingRange = sqrt(x² + y²), bearing = atan2(y, x)
Robot navigationMotion involves sin(θ), cos(θ)
Spacecraft re-entryDrag is exponential in altitude
PendulumRestoring force is sin(θ), not θ
Check: Why can't we use a standard Kalman filter for radar tracking?

Chapter 1: Linearization — Tangent Lines to the Rescue

The key idea of the EKF is simple: if the function is curved, approximate it with a straight line at the current operating point. This is exactly what a first-order Taylor expansion does. Near any point, a smooth curve looks like its tangent line.

For a 1D function f(x), the linearization around point a is: f(x) ≈ f(a) + f'(a)(x - a). The derivative f'(a) is the slope of the tangent line. For multi-variable functions, we use the Jacobian matrix instead of a simple derivative.

f(x) ≈ f(a) + f'(a) · (x − a)
Intuition: Imagine you're standing on a hillside and the fog is thick. You can only see the ground right at your feet. The slope you feel under your boots is the derivative. You assume the hill continues at that slope — that's linearization. It works well nearby but fails for distant points.
Function and Tangent Line

Drag the slider to move the linearization point along the curve. The orange line is the tangent. Notice how it matches well nearby but diverges further away.

Point a0.00
The EKF strategy: At each timestep, linearize the nonlinear functions around the current best estimate. Then run the normal Kalman filter equations using these linear approximations. Re-linearize at the next step around the new estimate.
Check: Linearization approximates a curve with...

Chapter 2: Jacobians — Derivatives for Vectors

When your function takes a vector in and produces a vector out, a single derivative number isn't enough. You need the Jacobian matrix: a matrix where each entry is a partial derivative. Row i, column j contains ∂fi/∂xj — how much the i-th output changes when you wiggle the j-th input.

J = ∂f / ∂x = [ [∂f₁/∂x₁, ∂f₁/∂x₂], [∂f₂/∂x₁, ∂f₂/∂x₂] ]

The Jacobian is the multi-dimensional version of "the slope." It tells you the best linear approximation of a vector function near a point. In the EKF, we compute two Jacobians at every step:

FJ — Process Jacobian
∂f / ∂x evaluated at current estimate
HJ — Measurement Jacobian
∂h / ∂x evaluated at predicted state
Example: If f(x, y) = [x² + y, sin(x)], the Jacobian is [[2x, 1], [cos(x), 0]]. Each entry is a partial derivative. At the point (1, 3), the Jacobian evaluates to [[2, 1], [0.54, 0]].
Jacobian Explorer

For f(x) = [r · cos(θ), r · sin(θ)] (polar to Cartesian), explore how the Jacobian changes with θ.

r2.0
θ0.80
Check: What does entry (i, j) of the Jacobian matrix represent?

Chapter 3: EKF Predict

In the standard KF, the predict step is: x̄ = Fx. The state transition is a matrix multiply. In the EKF, we replace this with the actual nonlinear function f(x). But for the covariance, we still need a matrix — so we use the Jacobian FJ evaluated at the current estimate.

x̄ = f(x̂)       P̄ = FJ · P · FJT + Q
Key difference from KF: The state is propagated through the true nonlinear function f(x). Only the covariance uses the linearized Jacobian FJ. This gives us the best of both worlds: accurate state prediction + tractable uncertainty propagation.

Side by Side: KF vs EKF Predict

Kalman FilterExtended Kalman Filter
Statex̄ = F · xx̄ = f(x)
CovarianceP̄ = F P FT + QP̄ = FJ P FJT + Q
F meaningConstant matrixFJ = ∂f/∂x at current x

EKF Predict in Code

Python
def ekf_predict(x, P, f, F_jacobian, Q):
    """EKF predict step."""
    x_pred = f(x)                        # nonlinear state transition
    F_j = F_jacobian(x)                  # Jacobian at current estimate
    P_pred = F_j @ P @ F_j.T + Q        # covariance propagation
    return x_pred, P_pred
Check: In the EKF predict, the state is propagated using...

Chapter 4: EKF Update

Same story for the update step. In the KF, the expected measurement is z̄ = Hx. In the EKF, we use the nonlinear measurement function h(x) to compute what we expect to see. But the Kalman gain uses the Jacobian HJ to propagate uncertainty.

y = z − h(x̄)      S = HJ P̄ HJT + R      K = P̄ HJT S−1
x̂ = x̄ + K y      P = (I − K HJ) P̄
Why h(x) matters: Consider a radar measuring range r = sqrt(x² + y²). If the predicted state is (3, 4), the expected range is h(x) = 5, not [some matrix] · [3, 4]. The nonlinear function gives the right expected measurement; the Jacobian handles the uncertainty math around it.

Side by Side: KF vs EKF Update

Kalman FilterExtended Kalman Filter
Innovationy = z − H xy = z − h(x)
Innov. cov.S = H P HT + RS = HJ P HJT + R
GainK = P HT S−1K = P HJT S−1
H meaningConstant matrixHJ = ∂h/∂x at predicted x

EKF Update in Code

Python
def ekf_update(x_pred, P_pred, z, h, H_jacobian, R):
    """EKF update step."""
    H_j = H_jacobian(x_pred)             # Jacobian at predicted state
    y = z - h(x_pred)                    # innovation
    S = H_j @ P_pred @ H_j.T + R        # innovation covariance
    K = P_pred @ H_j.T @ np.linalg.inv(S)  # Kalman gain
    x = x_pred + K @ y                  # updated state
    P = (np.eye(len(x)) - K @ H_j) @ P_pred
    return x, P
Check: In the EKF update, the innovation y is computed as...

Chapter 5: Computing Jacobians in Practice

Let's work through a concrete example: a robot at position (x, y) with heading θ observes a landmark at (lx, ly). The measurement is range and bearing:

h(x) = [ sqrt((lx−x)² + (ly−y)²) , atan2(ly−y, lx−x) − θ ]

To get HJ, we take partial derivatives with respect to x, y, and θ. Let dx = lx − x, dy = ly − y, and q = dx² + dy²:

HJ = [ [−dx/√q, −dy/√q, 0], [dy/q, −dx/q, −1] ]
Step by step:
1. ∂range/∂x = −dx / sqrt(q)  (chain rule on the square root)
2. ∂range/∂y = −dy / sqrt(q)
3. ∂range/∂θ = 0  (range doesn't depend on heading)
4. ∂bearing/∂x = dy / q  (derivative of atan2)
5. ∂bearing/∂y = −dx / q
6. ∂bearing/∂θ = −1  (bearing is relative to heading)
Range-Bearing Jacobian Visualizer

Move the robot (drag the slider) and see how the Jacobian entries change. The landmark is fixed at the orange star.

Robot x100
Robot y200

Process Model Jacobian (Robot Motion)

For a differential-drive robot with state [x, y, θ] and control [v, ω]:

f(x) = [x + v·cos(θ)·dt,   y + v·sin(θ)·dt,   θ + ω·dt]
FJ = [ [1, 0, −v·sin(θ)·dt], [0, 1, v·cos(θ)·dt], [0, 0, 1] ]
Check: Why is ∂range/∂θ = 0?

Chapter 6: EKF Robot Localization

This is the showcase. A robot moves in 2D space with 3 known landmarks. It can only measure the bearing (angle) to each landmark — not the distance. The EKF estimates the robot's position and heading from these bearing-only measurements.

Use the buttons to drive the robot. Watch the green uncertainty ellipse shrink as the filter incorporates measurements. The blue lines show bearing measurements to landmarks.

EKF Robot Localization
Meas. noise0.08
Step: 0 | x: 200 y: 220 θ: 0.00
Observe: When the robot is far from landmarks, the ellipse is large. When it moves to a spot where the landmarks are spread out in different directions, the ellipse shrinks rapidly — the geometry of the landmarks matters. This is called observability.
Check: Why does the uncertainty ellipse shrink faster when landmarks are spread out?

Chapter 7: Consistency & Divergence

The EKF is an approximation. It assumes the nonlinear function is well-described by its tangent line within the region of uncertainty. When this assumption breaks, the filter can diverge: the estimated covariance shrinks (the filter thinks it's certain), but the actual error grows. The filter becomes overconfident and wrong.

When the EKF breaks:
1. High curvature: If the function bends sharply within one standard deviation, the tangent-line approximation is poor.
2. Wrong linearization point: If the estimate is far from truth, the Jacobian is evaluated at the wrong point.
3. Large uncertainty: The tangent line only works "locally." If P is huge, the Gaussian spreads over a region where the tangent is a bad fit.
EKF Divergence Demo

Increase the nonlinearity to see how the tangent-line approximation degrades. The teal region is the true transformed distribution (sampled). The orange ellipse is the EKF's Gaussian approximation.

Nonlinearity0.50
Input spread σ0.50

This is precisely why the Unscented Kalman Filter (UKF) was invented. Instead of linearizing, the UKF passes carefully chosen sample points through the true nonlinear function, then reconstructs a Gaussian from the results. No Jacobians required, and it captures more of the curvature.

AspectEKFUKF
ApproachLinearize with JacobianSample with sigma points
Accuracy1st order (tangent line)2nd order (captures curvature)
Jacobians needed?YesNo
When it failsHigh curvature, large uncertaintyExtreme nonlinearity, multimodal
Check: EKF divergence happens when...

Chapter 8: Connections & Where to Go

The EKF sits at the center of a family of nonlinear estimation algorithms. Understanding it gives you the vocabulary and intuition to explore the rest.

Kalman Filter
The foundation. Linear + Gaussian = exact solution. The EKF reduces to the KF when f and h are linear.
↓ generalize
Extended Kalman Filter
You are here. Linearize with Jacobians. Works for mildly nonlinear systems.
↓ avoid Jacobians
Unscented Kalman Filter
Sigma-point sampling. Better accuracy, no derivatives needed. Handles stronger nonlinearity.
↓ arbitrary distributions
Particle Filter
Monte Carlo sampling. Handles any distribution, any nonlinearity. Expensive but powerful.
EKF + Mapping = SLAM. When the landmarks are unknown and must be estimated simultaneously with the robot's pose, you get Simultaneous Localization and Mapping. The EKF was the original SLAM algorithm (EKF-SLAM), augmenting the state vector with landmark positions.

The EKF Algorithm Summary

Initialize
Set x̂ (initial guess), P (initial covariance)
Predict
x̄ = f(x̂)    FJ = ∂f/∂x    P̄ = FJ P FJT + Q
Gain
HJ = ∂h/∂x    S = HJ P̄ HJT + R    K = P̄ HJT S−1
Update
y = z − h(x̄)    x̂ = x̄ + K y    P = (I − K HJ) P̄
↓ repeat
"All models are wrong, but some are useful."
— George Box

The EKF's model of nonlinear functions is wrong (tangent lines aren't curves). But it's useful enough to navigate robots, track missiles, and land spacecraft. You now understand why — and when it breaks.