Beyond Linearization

Understand the Unscented
Kalman Filter

No Jacobians. No tangent-line approximations. Just carefully chosen sample points that capture what a Gaussian looks like after it's been squeezed through a nonlinear function.

Prerequisites: Kalman Filter basics + Familiarity with EKF limitations.
9
Chapters
6+
Simulations
0
Jacobians

Chapter 0: Beyond Linearization

The Extended Kalman Filter (EKF) handles nonlinear systems by computing Jacobians — local tangent-line approximations. It works, but it has real problems: you need to derive the Jacobian analytically, and the first-order linear approximation can be wildly inaccurate when the function curves sharply.

In 1995, Simon Julier and Jeffrey Uhlmann had a striking insight: it's easier to approximate a probability distribution than to approximate an arbitrary nonlinear function. Instead of linearizing the function, why not pick a small set of sample points that capture the distribution's statistics, push them through the exact nonlinear function, and reconstruct a Gaussian from the results?

The core idea: Approximating a nonlinear function with a tangent line is fragile. Approximating a Gaussian with a handful of carefully chosen points is robust. The UKF approximates the distribution, not the function.
EKF Approach
Linearize f(x) with Jacobian → propagate Gaussian through the linear approximation
vs
UKF Approach
Pick sigma points → propagate each through exact f(x) → reconstruct Gaussian
Why "unscented"? Uhlmann reportedly chose the name because it "doesn't stink" — a tongue-in-cheek jab at the EKF's linearization assumptions. The name has no mathematical meaning.
Check: What does the UKF approximate?

Chapter 1: Sigma Points

A Gaussian in n dimensions is fully described by its mean (n numbers) and covariance (an n×n matrix). The UKF selects 2n+1 deterministic sample points — called sigma points — that exactly match these first two moments. For a 2D state, that's just 5 points.

The sigma points are the mean itself, plus pairs of points offset along each eigenvector of the covariance, scaled by a factor that depends on the tuning parameters α, β, and κ.

χ0 = μ      χi = μ + (√((n+λ)P))i      χi+n = μ − (√((n+λ)P))i

where λ = α²(n + κ) − n, and (√((n+λ)P))i is the i-th column of the matrix square root of (n+λ)P.

Interactive Sigma Points

A 2D Gaussian with 2n+1 = 5 sigma points. Drag the sliders to change the covariance shape. The orange dots are sigma points; the teal ellipse is the 2-sigma contour.

Var X (σ²x)1.5
Var Y (σ²y)0.8
Correlation ρ0.4
Key property: The sigma points are not random samples. They are deterministic. The same input always produces the same points. This makes the UKF reproducible and avoids the sampling noise of particle filters.
Check: For a state of dimension n=3, how many sigma points does the UKF use?

Chapter 2: The Unscented Transform

The unscented transform is the heart of the UKF. Given a Gaussian and a nonlinear function, it approximates the output distribution in three steps:

  1. Generate sigma points from the input Gaussian.
  2. Propagate each sigma point through the nonlinear function.
  3. Reconstruct the output Gaussian from the weighted transformed points.
μy = Σ wim · f(χi)      Py = Σ wic (f(χi) − μy)(f(χi) − μy

Each sigma point carries two weights: wm for the mean and wc for the covariance. The center point typically gets a different weight than the satellite points.

Unscented Transform: Banana Nonlinearity

Sigma points from a 2D Gaussian pass through a nonlinear "banana" transform. The teal ellipse is the input. The orange points become purple points after the transform. The green ellipse is the reconstructed output Gaussian.

Curvature0.20
Input spread σ1.20
Why it works: A Gaussian passed through a nonlinear function is no longer Gaussian. But the unscented transform captures the true mean and covariance to at least second-order accuracy — one order better than the EKF's first-order Jacobian approximation.
Check: What accuracy does the unscented transform achieve for Gaussian inputs?

Chapter 3: UKF Predict

The predict step applies the unscented transform to the process model f(x). We generate sigma points from the current belief, push them through f, and reconstruct the predicted Gaussian. Then add process noise Q.

1. Generate Sigma Points
χ = sigmaPoints(μ, P)
2. Propagate
χi* = f(χi) for each i
3. Reconstruct
μ¯ = Σ wim χi*    P¯ = Σ wici* − μ¯)(χi* − μ¯)¹ + Q

UKF Predict in Python

Python
import numpy as np

def ukf_predict(x, P, f, Q, alpha=1e-3, beta=2, kappa=0):
    """UKF predict step."""
    n = len(x)
    lam = alpha**2 * (n + kappa) - n

    # Generate sigma points
    sqrt_P = np.linalg.cholesky((n + lam) * P)
    sigmas = [x.copy()]
    for i in range(n):
        sigmas.append(x + sqrt_P[:, i])
        sigmas.append(x - sqrt_P[:, i])

    # Weights
    wm = [lam / (n + lam)] + [1 / (2*(n + lam))] * 2*n
    wc = [lam / (n + lam) + (1 - alpha**2 + beta)] + [1 / (2*(n + lam))] * 2*n

    # Propagate through f
    sigmas_f = [f(s) for s in sigmas]

    # Reconstruct mean and covariance
    x_pred = sum(w * s for w, s in zip(wm, sigmas_f))
    P_pred = Q.copy()
    for i, (w, s) in enumerate(zip(wc, sigmas_f)):
        d = s - x_pred
        P_pred += w * np.outer(d, d)

    return x_pred, P_pred
Notice: No Jacobians anywhere! The function f is used as a black box. This means you can use the UKF even when f is a simulation, a neural network, or any function you can evaluate but cannot differentiate.
Check: In the UKF predict step, what happens to the sigma points?

Chapter 4: UKF Update

The update step is similar: generate sigma points from the predicted state, push them through the measurement model h(x), and compute the cross-covariance between state and measurement spaces. This gives us the Kalman gain.

1. Sigma Points
χ = sigmaPoints(μ¯, P¯)
2. Predicted Measurements
Zi = h(χi) for each i
3. Measurement Statistics
μz = Σ wim Zi    S = Σ wic (Zi − μz)(Zi − μz)¹ + R
4. Cross-Covariance & Gain
Pxz = Σ wici − μ¯)(Zi − μz)¹    K = Pxz S¯¹
5. Update
μ = μ¯ + K(z − μz)    P = P¯ − K S K¹

UKF Update in Python

Python
def ukf_update(x_pred, P_pred, z, h, R, alpha=1e-3, beta=2, kappa=0):
    """UKF update step."""
    n = len(x_pred)
    lam = alpha**2 * (n + kappa) - n

    # Regenerate sigma points from predicted state
    sqrt_P = np.linalg.cholesky((n + lam) * P_pred)
    sigmas = [x_pred.copy()]
    for i in range(n):
        sigmas.append(x_pred + sqrt_P[:, i])
        sigmas.append(x_pred - sqrt_P[:, i])

    wm = [lam / (n + lam)] + [1/(2*(n+lam))] * 2*n
    wc = [lam / (n + lam) + 1 - alpha**2 + beta] + [1/(2*(n+lam))] * 2*n

    # Push through measurement model
    z_sigmas = [h(s) for s in sigmas]
    z_mean = sum(w*zs for w, zs in zip(wm, z_sigmas))

    # Innovation covariance S and cross-covariance Pxz
    S = R.copy()
    Pxz = np.zeros((n, len(z)))
    for w, s, zs in zip(wc, sigmas, z_sigmas):
        dz = zs - z_mean
        S  += w * np.outer(dz, dz)
        dx = s - x_pred
        Pxz += w * np.outer(dx, dz)

    # Kalman gain and update
    K = Pxz @ np.linalg.inv(S)
    x_new = x_pred + K @ (z - z_mean)
    P_new = P_pred - K @ S @ K.T
    return x_new, P_new
Same structure as the KF: Predict grows uncertainty, Update shrinks it. The only difference is how we propagate through nonlinear functions — sigma points instead of matrices or Jacobians.
Check: What additional quantity does the UKF update compute that the standard KF does not?

Chapter 5: Choosing Parameters

The UKF has three tuning parameters that control how the sigma points are placed and weighted:

ParameterTypical ValueEffect
α10-3 to 1Controls the spread of sigma points. Small α keeps them close to the mean.
β2 (optimal for Gaussians)Incorporates prior knowledge about the distribution. β=2 is optimal when the state is truly Gaussian.
κ0 or 3−nSecondary scaling parameter. κ=3−n ensures positive semi-definiteness of the covariance.
λ = α²(n + κ) − n

The composite parameter λ determines the distance of the sigma points from the mean. Larger λ pushes them further out, capturing more of the tails. Smaller λ clusters them near the center, which is safer for highly nonlinear functions.

Parameter Explorer

See how α, β, and κ affect sigma point placement and weights for n=2. The center weight can even go negative for large λ.

α0.50
β2.0
κ0.0
λ = -1.50 w0m = -3.00
In practice: Many implementations just use α=1, β=2, κ=0 and it works fine. The UKF is surprisingly robust to these parameters. Only tune them if you see numerical issues (e.g., non-positive-definite covariance).
Check: What is the optimal value of beta for Gaussian distributions?

Chapter 6: UKF vs EKF Comparison

When the nonlinearity is mild, EKF and UKF produce nearly identical results. But as the function curves more sharply, the EKF's tangent-line approximation falls apart while the UKF's sigma points continue to track accurately. Below, both filters track the same nonlinear system.

Side-by-Side: EKF vs UKF

A target moves in a circle. Both filters use range+bearing measurements. The gray path is truth, red is the EKF, green is the UKF. Increase nonlinearity to see the EKF diverge.

Nonlinearity1.5
Meas. noise0.5
EKF err: 0.00 | UKF err: 0.00
When EKF fails: The EKF linearizes around its current estimate. If that estimate is wrong (because the linearization was bad), the next linearization is even worse. This positive feedback loop causes divergence. The UKF avoids this because it never linearizes.
FeatureEKFUKF
Requires JacobianYesNo
Accuracy1st order2nd order+
Black-box f, hNoYes
ComputationJacobian eval2n+1 function evals
Divergence riskHigherLower
Check: Why does the EKF diverge under strong nonlinearity?

Chapter 7: Square-Root UKF

The standard UKF recomputes the Cholesky decomposition of P at every step. Rounding errors can cause P to lose positive semi-definiteness, crashing the filter. The square-root UKF (SR-UKF) propagates the Cholesky factor S directly, where P = SS¹.

Standard UKF
Store P → Cholesky at each step → risk of non-PSD P
vs
Square-Root UKF
Store S (where P = SS¹) → update S directly via QR/Cholesky updates → P always PSD by construction

The SR-UKF uses QR decomposition and rank-1 Cholesky updates instead of full matrix operations. This guarantees numerical stability and is roughly the same computational cost.

When to use it: If your state dimension is large (n > 10) or your system runs for many thousands of timesteps, prefer the SR-UKF. For small problems or short runs, the standard UKF is usually fine.

Key SR-UKF Operations

OperationStandard UKFSR-UKF
Store uncertaintyP (n×n)S (n×n, lower triangular)
Sigma point spreadchol((n+λ)P)√(n+λ) · S (already available)
Covariance updateWeighted outer productsQR decomposition + Cholesky rank-1 update
Guaranteed PSDNoYes, by construction
Check: What is the main advantage of the square-root UKF?

Chapter 8: Connections

The UKF sits in a family of estimation algorithms, each making different tradeoffs between accuracy, generality, and computational cost. Understanding where it fits helps you choose the right tool.

FilterHandlesApproachCost
Kalman FilterLinear + GaussianExact matrix equationsVery low
EKFMildly nonlinearJacobian linearizationLow
UKFModerately nonlinearSigma pointsMedium
Particle FilterAny distributionMonte Carlo samplingHigh
The estimation ladder: Start with the KF if your system is linear. Move to the EKF if it's mildly nonlinear and you can compute Jacobians. Use the UKF when Jacobians are unavailable or the nonlinearity is too strong. Resort to particle filters when the distribution is truly non-Gaussian (multimodal, heavy-tailed).
UKF in SLAM: The UKF is widely used in simultaneous localization and mapping (SLAM), where a robot must estimate its own position while building a map of the environment. The measurement model (range and bearing to landmarks) is highly nonlinear, making the UKF a natural choice over the EKF.

Continue learning:

"It is easier to approximate a probability distribution than to approximate an arbitrary nonlinear function or transformation."
— Simon Julier & Jeffrey Uhlmann

You now understand the unscented Kalman filter. No Jacobians, no linearization — just carefully chosen points that capture the shape of uncertainty.