No Jacobians. No tangent-line approximations. Just carefully chosen sample points that capture what a Gaussian looks like after it's been squeezed through a nonlinear function.
The Extended Kalman Filter (EKF) handles nonlinear systems by computing Jacobians — local tangent-line approximations. It works, but it has real problems: you need to derive the Jacobian analytically, and the first-order linear approximation can be wildly inaccurate when the function curves sharply.
In 1995, Simon Julier and Jeffrey Uhlmann had a striking insight: it's easier to approximate a probability distribution than to approximate an arbitrary nonlinear function. Instead of linearizing the function, why not pick a small set of sample points that capture the distribution's statistics, push them through the exact nonlinear function, and reconstruct a Gaussian from the results?
A Gaussian in n dimensions is fully described by its mean (n numbers) and covariance (an n×n matrix). The UKF selects 2n+1 deterministic sample points — called sigma points — that exactly match these first two moments. For a 2D state, that's just 5 points.
The sigma points are the mean itself, plus pairs of points offset along each eigenvector of the covariance, scaled by a factor that depends on the tuning parameters α, β, and κ.
where λ = α²(n + κ) − n, and (√((n+λ)P))i is the i-th column of the matrix square root of (n+λ)P.
A 2D Gaussian with 2n+1 = 5 sigma points. Drag the sliders to change the covariance shape. The orange dots are sigma points; the teal ellipse is the 2-sigma contour.
The unscented transform is the heart of the UKF. Given a Gaussian and a nonlinear function, it approximates the output distribution in three steps:
Each sigma point carries two weights: wm for the mean and wc for the covariance. The center point typically gets a different weight than the satellite points.
Sigma points from a 2D Gaussian pass through a nonlinear "banana" transform. The teal ellipse is the input. The orange points become purple points after the transform. The green ellipse is the reconstructed output Gaussian.
The predict step applies the unscented transform to the process model f(x). We generate sigma points from the current belief, push them through f, and reconstruct the predicted Gaussian. Then add process noise Q.
Python import numpy as np def ukf_predict(x, P, f, Q, alpha=1e-3, beta=2, kappa=0): """UKF predict step.""" n = len(x) lam = alpha**2 * (n + kappa) - n # Generate sigma points sqrt_P = np.linalg.cholesky((n + lam) * P) sigmas = [x.copy()] for i in range(n): sigmas.append(x + sqrt_P[:, i]) sigmas.append(x - sqrt_P[:, i]) # Weights wm = [lam / (n + lam)] + [1 / (2*(n + lam))] * 2*n wc = [lam / (n + lam) + (1 - alpha**2 + beta)] + [1 / (2*(n + lam))] * 2*n # Propagate through f sigmas_f = [f(s) for s in sigmas] # Reconstruct mean and covariance x_pred = sum(w * s for w, s in zip(wm, sigmas_f)) P_pred = Q.copy() for i, (w, s) in enumerate(zip(wc, sigmas_f)): d = s - x_pred P_pred += w * np.outer(d, d) return x_pred, P_pred
The update step is similar: generate sigma points from the predicted state, push them through the measurement model h(x), and compute the cross-covariance between state and measurement spaces. This gives us the Kalman gain.
Python def ukf_update(x_pred, P_pred, z, h, R, alpha=1e-3, beta=2, kappa=0): """UKF update step.""" n = len(x_pred) lam = alpha**2 * (n + kappa) - n # Regenerate sigma points from predicted state sqrt_P = np.linalg.cholesky((n + lam) * P_pred) sigmas = [x_pred.copy()] for i in range(n): sigmas.append(x_pred + sqrt_P[:, i]) sigmas.append(x_pred - sqrt_P[:, i]) wm = [lam / (n + lam)] + [1/(2*(n+lam))] * 2*n wc = [lam / (n + lam) + 1 - alpha**2 + beta] + [1/(2*(n+lam))] * 2*n # Push through measurement model z_sigmas = [h(s) for s in sigmas] z_mean = sum(w*zs for w, zs in zip(wm, z_sigmas)) # Innovation covariance S and cross-covariance Pxz S = R.copy() Pxz = np.zeros((n, len(z))) for w, s, zs in zip(wc, sigmas, z_sigmas): dz = zs - z_mean S += w * np.outer(dz, dz) dx = s - x_pred Pxz += w * np.outer(dx, dz) # Kalman gain and update K = Pxz @ np.linalg.inv(S) x_new = x_pred + K @ (z - z_mean) P_new = P_pred - K @ S @ K.T return x_new, P_new
The UKF has three tuning parameters that control how the sigma points are placed and weighted:
| Parameter | Typical Value | Effect |
|---|---|---|
| α | 10-3 to 1 | Controls the spread of sigma points. Small α keeps them close to the mean. |
| β | 2 (optimal for Gaussians) | Incorporates prior knowledge about the distribution. β=2 is optimal when the state is truly Gaussian. |
| κ | 0 or 3−n | Secondary scaling parameter. κ=3−n ensures positive semi-definiteness of the covariance. |
The composite parameter λ determines the distance of the sigma points from the mean. Larger λ pushes them further out, capturing more of the tails. Smaller λ clusters them near the center, which is safer for highly nonlinear functions.
See how α, β, and κ affect sigma point placement and weights for n=2. The center weight can even go negative for large λ.
When the nonlinearity is mild, EKF and UKF produce nearly identical results. But as the function curves more sharply, the EKF's tangent-line approximation falls apart while the UKF's sigma points continue to track accurately. Below, both filters track the same nonlinear system.
A target moves in a circle. Both filters use range+bearing measurements. The gray path is truth, red is the EKF, green is the UKF. Increase nonlinearity to see the EKF diverge.
| Feature | EKF | UKF |
|---|---|---|
| Requires Jacobian | Yes | No |
| Accuracy | 1st order | 2nd order+ |
| Black-box f, h | No | Yes |
| Computation | Jacobian eval | 2n+1 function evals |
| Divergence risk | Higher | Lower |
The standard UKF recomputes the Cholesky decomposition of P at every step. Rounding errors can cause P to lose positive semi-definiteness, crashing the filter. The square-root UKF (SR-UKF) propagates the Cholesky factor S directly, where P = SS¹.
The SR-UKF uses QR decomposition and rank-1 Cholesky updates instead of full matrix operations. This guarantees numerical stability and is roughly the same computational cost.
| Operation | Standard UKF | SR-UKF |
|---|---|---|
| Store uncertainty | P (n×n) | S (n×n, lower triangular) |
| Sigma point spread | chol((n+λ)P) | √(n+λ) · S (already available) |
| Covariance update | Weighted outer products | QR decomposition + Cholesky rank-1 update |
| Guaranteed PSD | No | Yes, by construction |
The UKF sits in a family of estimation algorithms, each making different tradeoffs between accuracy, generality, and computational cost. Understanding where it fits helps you choose the right tool.
| Filter | Handles | Approach | Cost |
|---|---|---|---|
| Kalman Filter | Linear + Gaussian | Exact matrix equations | Very low |
| EKF | Mildly nonlinear | Jacobian linearization | Low |
| UKF | Moderately nonlinear | Sigma points | Medium |
| Particle Filter | Any distribution | Monte Carlo sampling | High |
Continue learning:
You now understand the unscented Kalman filter. No Jacobians, no linearization — just carefully chosen points that capture the shape of uncertainty.