State Estimation Workbook (Advanced)

Chapter 0: Multi-Dimensional KF

A 1D Kalman filter tracks a single number — maybe temperature, or altitude. But real systems have multiple coupled variables. A falling ball has both position and velocity. A car has position, velocity, and heading. To track these, we need matrix equations.

The simplest multi-dimensional case: track position and velocity in 1D. The state vector is x = [position, velocity]^T. Physics tells us: position' = position + velocity × dt, and velocity' = velocity (constant velocity model). In matrix form:

State transition (constant velocity, dt = 1):
F = [[1, dt], [0, 1]]    x' = F · x

Measurement (observe position only):
H = [[1, 0]]    z = H · x + noise

Predict:
x̄ = F · x     P̄ = F · P · F^T + Q

Update:
K = P̄ · H^T · (H · P̄ · H^T + R)^-1
x = x̄ + K · (z − H · x̄)
P = (I − K · H) · P̄

The key insight: The 2×2 covariance matrix P tracks not just uncertainty in each variable, but also how they're correlated. The off-diagonal terms P[0,1] = P[1,0] capture position-velocity correlation. When you observe position, the Kalman gain uses this correlation to also improve your velocity estimate — even though you never directly measured velocity.

Exercise 0.1: Predict Step — State Derive

State x = [10, 2]^T (position=10m, velocity=2m/s), dt = 1s. Compute the predicted state x̄ = F · x. What is the predicted position?

meters

Show derivation

x̄ = F · x = [[1, 1], [0, 1]] · [10, 2]^T

x̄[0] = 1×10 + 1×2 = 12

x̄[1] = 0×10 + 1×2 = 2

Position advanced by velocity × dt = 2 × 1 = 2 meters. Velocity stays at 2 m/s (constant velocity model). This is just x' = x + v·dt encoded as a matrix multiply.

Exercise 0.2: Predict Step — Covariance Derive

Current covariance P = [[4, 0], [0, 1]] (position variance=4, velocity variance=1, no correlation). Process noise Q = [[0.1, 0], [0, 0.1]]. dt = 1. Compute P̄ = F·P·F^T + Q. What is P̄[0,0] (predicted position variance)?

Show derivation

F · P = [[1,1],[0,1]] · [[4,0],[0,1]] = [[4,1],[0,1]]

F · P · F^T = [[4,1],[0,1]] · [[1,0],[1,1]] = [[5,1],[1,1]]

P̄ = [[5,1],[1,1]] + [[0.1,0],[0,0.1]] = [[5.1,1],[1,1.1]]

Position variance grew from 4 to 5.1. The extra 1 came from velocity uncertainty "leaking" into position uncertainty (since position depends on velocity). Notice the off-diagonal term is now 1 — the predict step created a correlation between position and velocity errors even though there was none before. This is the magic of multi-dimensional filtering.

Exercise 0.3: Off-Diagonal Meaning Trace

After the predict step above, P̄[0,1] = 1. What does this positive off-diagonal covariance mean physically?

Position and velocity are independent If position error is positive (we're ahead), velocity error is likely also positive (we're going faster than estimated) The filter is diverging and needs to be reset Velocity is increasing over time

Show reasoning

Positive cross-covariance means the errors are positively correlated. If the ball is ahead of where we think (positive position error), it's probably because it was going faster than we estimated (positive velocity error). The filter exploits this: when a position measurement says "you're ahead," the Kalman gain corrects velocity upward too.

Exercise 0.4: Kalman Gain Derive

Using P̄ = [[5.1, 1], [1, 1.1]] and H = [[1, 0]], R = [2] (measurement noise variance). Compute S = H·P̄·H^T + R, then K = P̄·H^T·S^-1. What is K[0] (the position component of the Kalman gain)?

H^T = [[1],[0]]. P̄·H^T = [[5.1],[1]]. S is a scalar here.

Show derivation

H · P̄ · H^T = [1, 0] · [[5.1,1],[1,1.1]] · [[1],[0]] = [5.1, 1] · [[1],[0]] = 5.1

S = 5.1 + 2 = 7.1

K = P̄ · H^T / S = [[5.1],[1]] / 7.1 = [[0.718],[0.141]]

K[0] = 0.718: we trust the measurement quite a bit for position (71.8% of the residual corrects position). K[1] = 0.141: the position measurement also corrects velocity by 14.1% of the residual. This is cross-state correction in action — we never measured velocity, but the correlation in P̄ lets us improve it.

Exercise 0.5: Full Update Derive

Predicted state x̄ = [12, 2]^T, Kalman gain K = [0.718, 0.141]^T, measurement z = 13.5. Compute the updated position estimate.

meters

Show derivation

Residual = z − H · x̄ = 13.5 − [1,0] · [12, 2]^T = 13.5 − 12 = 1.5

x = x̄ + K · residual = [12, 2]^T + [0.718, 0.141]^T × 1.5

x = [12 + 1.077, 2 + 0.211]^T = [13.077, 2.211]^T

The measurement says we're at 13.5, the prediction said 12. The filter compromises at 13.077, closer to the measurement because K[0] = 0.718 is relatively high. And velocity was bumped from 2.0 to 2.211 — the filter inferred "if we're further ahead than predicted, we must be going faster."

Exercise 0.6: Implement kfPredict() Build

Implement the predict step for a 2D state. F and P are 2×2 arrays ([[a,b],[c,d]]). Return { x: [x0,x1], P: [[...],[...]] }.

Show solution

javascript
function kfPredict(x, P, F, Q) {
  // x' = F * x
  const xp = [
    F[0][0]*x[0] + F[0][1]*x[1],
    F[1][0]*x[0] + F[1][1]*x[1]
  ];
  // P' = F*P*F^T + Q  (2x2 multiply)
  const FP = [
    [F[0][0]*P[0][0]+F[0][1]*P[1][0], F[0][0]*P[0][1]+F[0][1]*P[1][1]],
    [F[1][0]*P[0][0]+F[1][1]*P[1][0], F[1][0]*P[0][1]+F[1][1]*P[1][1]]
  ];
  // FP * F^T
  const Pp = [
    [FP[0][0]*F[0][0]+FP[0][1]*F[0][1]+Q[0][0], FP[0][0]*F[1][0]+FP[0][1]*F[1][1]+Q[0][1]],
    [FP[1][0]*F[0][0]+FP[1][1]*F[0][1]+Q[1][0], FP[1][0]*F[1][0]+FP[1][1]*F[1][1]+Q[1][1]]
  ];
  return { x: xp, P: Pp };
}

Chapter 1: EKF Jacobians

When the motion model is nonlinear, the Kalman filter's F matrix doesn't exist — there's no single matrix that maps old state to new state. Instead, we have a function x' = f(x, u). The EKF linearizes this function at the current state by computing its Jacobian: a matrix of partial derivatives.

Consider a robot in 2D. Its state is [x, y, θ]^T (position and heading). Given control inputs v (forward velocity) and ω (angular velocity), the motion model is:

f(x, y, θ) = [x + v·cos(θ)·dt, y + v·sin(θ)·dt, θ + ω·dt]^T

Jacobian F_J = ∂f/∂[x, y, θ]:
F_J = [[1, 0, −v·sin(θ)·dt], [0, 1, v·cos(θ)·dt], [0, 0, 1]]

Why the Jacobian matters: The Jacobian replaces the F matrix in the covariance predict step: P̄ = F_J · P · F_J^T + Q. The state is predicted using the nonlinear function x' = f(x), but uncertainty is propagated using the linear approximation. This is the core EKF trick.

Exercise 1.1: Compute the Jacobian Derive

For the robot model f = [x + v·cos(θ)·dt, y + v·sin(θ)·dt, θ + ω·dt], compute ∂f₁/∂θ (how does the x-update change when θ changes). Evaluate at θ = π/6 (30°), v = 2 m/s, dt = 0.1s.

Show derivation

∂f₁/∂θ = ∂(x + v·cos(θ)·dt)/∂θ = −v·sin(θ)·dt

= −2 × sin(π/6) × 0.1 = −2 × 0.5 × 0.1 = −0.1

Negative sign: if the robot turns more (increases θ), the x-component of its motion decreases (the robot moves less in x and more in y). At θ=30°, this sensitivity is −0.1 meters per radian of heading change.

Exercise 1.2: Full Jacobian Evaluation Derive

Evaluate F_J[1,2] (row 2, col 3 — ∂f₂/∂θ) at θ = π/4 (45°), v = 3 m/s, dt = 0.5s.

Show derivation

∂f₂/∂θ = ∂(y + v·sin(θ)·dt)/∂θ = v·cos(θ)·dt

= 3 × cos(π/4) × 0.5 = 3 × 0.7071 × 0.5 = 1.061

Positive: turning more (increasing θ) makes the robot move more in the y-direction at this heading. At 45°, the sensitivity is symmetric between x and y — both partial derivatives have the same magnitude (1.061).

Exercise 1.3: Jacobian of a Simpler Model Trace

For a 1D model f(x) = x + x²·dt, the Jacobian is a scalar. What is it?

2x · dt x² · dt 1 + 2x · dt 1 + x² · dt

Show derivation

df/dx = d(x + x²·dt)/dx = 1 + 2x·dt

The 1 comes from the linear term x (its derivative is 1). The 2x·dt comes from the quadratic term. At x=0, the Jacobian is 1 — the system looks linear. As |x| grows, the Jacobian deviates from 1, meaning the linearization captures less of the true dynamics.

Exercise 1.4: Nonlinear Predict Derive

Robot at state [3, 1, π/3]^T (x=3, y=1, θ=60°). Controls: v=2 m/s, ω=0.1 rad/s, dt=0.5s. Compute the predicted y-coordinate using f₂ = y + v·sin(θ)·dt.

meters

Show derivation

f₂ = y + v·sin(θ)·dt = 1 + 2×sin(60°)×0.5

= 1 + 2 × 0.866 × 0.5 = 1 + 0.866 = 1.866

Exercise 1.5: Implement robotJacobian() Build

Return the 3×3 Jacobian of the robot motion model as a nested array.

Show solution

javascript
function robotJacobian(theta, v, dt) {
  return [
    [1, 0, -v * Math.sin(theta) * dt],
    [0, 1,  v * Math.cos(theta) * dt],
    [0, 0,  1]
  ];
}

Exercise 1.6: Find the Bug Debug

This EKF predict step has a bug. The state prediction is correct but the covariance blows up. Click the buggy line.

function ekfPredict(x, P, f, F_jacobian, Q) {
  const xp = f(x);               // nonlinear predict
  const F = F_jacobian(x);        // Jacobian at current state
  const FP = matMul(F, P);
  const Pp = matMul(FP, F) + Q;   // should be F^T, not F!
  return { x: xp, P: Pp };
}

Show explanation

Line 5 is the bug. The covariance predict formula is P̄ = F · P · F^T + Q. The code computes F · P · F instead of F · P · F^T. When F is not symmetric (it rarely is), multiplying by F instead of F^T gives a non-symmetric P̄, which is not a valid covariance matrix. The result grows without bound because the eigenvalues of FPF can be larger than those of FPF^T.

Chapter 2: EKF Range-Bearing

You're building a robot that observes landmarks. Your sensor gives you range (distance) and bearing (angle) to a landmark at known position (l_x, l_y). The measurement model is:

h(x) = [√((l_x−x)² + (l_y−y)²), atan2(l_y−y, l_x−x) − θ]^T

Measurement Jacobian H = ∂h/∂[x, y, θ]:
Let Δx = l_x−x, Δy = l_y−y, r = √(Δx² + Δy²)
H = [[−Δx/r, −Δy/r, 0], [Δy/r², −Δx/r², −1]]

Why this is hard: Square roots and arctangents are aggressively nonlinear. The Jacobian depends on the relative position of the robot to the landmark, which means H changes at every single timestep. There's no pre-computed measurement matrix — you must re-evaluate H at the current state estimate before every update.

Exercise 2.1: Compute Range Derive

Robot at (3, 4), landmark at (6, 8). What is the expected range measurement?

meters

Show derivation

Δx = 6−3 = 3, Δy = 8−4 = 4

r = √(3² + 4²) = √(9 + 16) = √25 = 5

Exercise 2.2: Compute Bearing Derive

Same setup: robot at (3, 4) with heading θ = 0 rad, landmark at (6, 8). What is the expected bearing measurement (in radians)?

Bearing = atan2(Δy, Δx) − θ. atan2(4, 3) ≈ 0.9273 rad.

radians

Show derivation

bearing = atan2(4, 3) − 0 = 0.9273 rad ≈ 53.1°

The landmark is roughly 53 degrees to the left of the robot's forward direction (which is along the x-axis when θ=0).

Exercise 2.3: Measurement Jacobian H[0,0] Derive

Compute H[0,0] = −Δx/r for robot at (3,4), landmark at (6,8).

Show derivation

H[0,0] = −Δx/r = −3/5 = −0.6

Negative because if the robot moves in +x (Δx decreases), range decreases. The magnitude 0.6 says: 1 meter of robot x-motion changes the range by 0.6 meters. This makes sense — the landmark is not directly in front of the robot, so x-motion only partially closes the gap.

Exercise 2.4: Full H Matrix Derive

Compute the full 2×3 measurement Jacobian H for robot at (3,4), landmark at (6,8). What is H[1,1] = −Δx/r²?

Show derivation

H[1,1] = −Δx/r² = −3/25 = −0.12

Full H: [[−0.6, −0.8, 0], [0.16, −0.12, −1]]

The bearing row has much smaller magnitudes (0.16, 0.12) than the range row (0.6, 0.8) because bearing changes slowly with position but directly with heading (−1 in the θ column).

Exercise 2.5: Innovation Derive

Predicted state: robot at (3, 4, 0). Expected measurement: h(x) = [5, 0.9273]^T. Actual measurement: z = [5.2, 0.95]^T. What is the range component of the innovation (residual)?

meters

Show derivation

y = z − h(x̄) = [5.2 − 5, 0.95 − 0.9273] = [0.2, 0.0227]

The range residual is +0.2m: the measurement says the landmark is slightly farther than we predicted. The bearing residual is +0.023 rad (~1.3°): the landmark appears slightly more to the left than expected. Both are small, suggesting our prediction is good.

Exercise 2.6: Implement rangeBearingH() Build

Compute the 2×3 measurement Jacobian for range-bearing observations.

Show solution

javascript
function rangeBearingH(rx, ry, rtheta, lx, ly) {
  const dx = lx - rx, dy = ly - ry;
  const r2 = dx*dx + dy*dy;
  const r  = Math.sqrt(r2);
  return [
    [-dx/r, -dy/r, 0],
    [dy/r2, -dx/r2, -1]
  ];
}

Chapter 3: UKF Sigma Points

The UKF avoids Jacobians entirely. Instead, it picks 2n+1 deterministic sample points (sigma points) that capture the mean and covariance of the state distribution, pushes each one through the exact nonlinear function, and reconstructs a Gaussian from the transformed points.

Sigma point generation (n = state dimension):
λ = α²(n + κ) − n
χ₀ = μ
χ_i = μ + (√((n+λ)P))_i for i = 1..n
χ_i+n = μ − (√((n+λ)P))_i for i = 1..n

Weights:
w₀^m = λ/(n+λ) w_i^m = 1/(2(n+λ)) for i>0
w₀^c = λ/(n+λ) + (1−α²+β) w_i^c = w_i^m for i>0

Simplified version: Many implementations use κ as the only tuning parameter with α=1, β=0, giving λ=κ. Then: scale = √(n+κ), w₀ = κ/(n+κ), w_i = 1/(2(n+κ)). This is what we'll use for the exercises.

Exercise 3.1: How Many Sigma Points? Trace

For a state dimension n=2, how many sigma points does the UKF generate?

4 5 (2n+1) 8 2

Exercise 3.2: Compute Sigma Points Derive

n=2, μ=[1, 2]^T, P=[[4, 0],[0, 1]], κ=1 (using simplified α=1, β=0). The matrix √((n+κ)P) = √(3P). Since P is diagonal, √(3P) = [[√12, 0],[0, √3]] = [[3.464, 0],[0, 1.732]]. What is sigma point χ₁?

χ₁ = μ + column 1 of √(3P) = [1 + 3.464, 2 + 0]. Give the first component.

Show derivation

√(3P) = √(3 × [[4,0],[0,1]]) = [[√12, 0],[0, √3]] = [[3.464, 0],[0, 1.732]]

χ₀ = [1, 2]

χ₁ = [1 + 3.464, 2 + 0] = [4.464, 2]

χ₂ = [1 + 0, 2 + 1.732] = [1, 3.732]

χ₃ = [1 − 3.464, 2 − 0] = [−2.464, 2]

χ₄ = [1 − 0, 2 − 1.732] = [1, 0.268]

The sigma points form a cross pattern: the center point plus 4 points along the principal axes of the covariance ellipse. More variance in x (=4) means the x-axis points are spread further apart (3.464 from center) than the y-axis points (1.732).

Exercise 3.3: Sigma Point Weights Derive

For n=2, κ=1 (simplified): compute w₀ and w_i (for i=1..4). Verify they sum to 1. What is w₀?

Show derivation

w₀ = κ/(n+κ) = 1/(2+1) = 1/3 ≈ 0.333

w_i = 1/(2(n+κ)) = 1/(2×3) = 1/6 ≈ 0.167

Check: 1/3 + 4 × 1/6 = 1/3 + 4/6 = 1/3 + 2/3 = 1 ✓

Exercise 3.4: Propagate and Recover Mean Derive

Pass each sigma point through f(x,y) = [x², y]. Using χ₀=[1,2], χ₁=[4.464,2], χ₂=[1,3.732], χ₃=[−2.464,2], χ₄=[1,0.268] and weights w₀=1/3, w_i=1/6. Compute the recovered mean of the first component: μ' = ∑ w_i · f(χ_i)[0].

Show derivation

f(χ₀)[0] = 1² = 1

f(χ₁)[0] = 4.464² = 19.927

f(χ₂)[0] = 1² = 1

f(χ₃)[0] = (−2.464)² = 6.071

f(χ₄)[0] = 1² = 1

μ'[0] = (1/3)×1 + (1/6)×(19.927 + 1 + 6.071 + 1)

= 0.333 + (1/6)×28 = 0.333 + 4.667 = 5.0

The true mean of x² when x~N(1,4) is E[x²] = Var(x) + E[x]² = 4 + 1 = 5. The UKF gets this exactly — because x² is a second-order polynomial, and sigma points capture the first two moments perfectly. An EKF would get this wrong (it would predict 1² = 1 and miss the variance term).

Exercise 3.5: Implement sigmaPoints() Build

Generate 2n+1 sigma points for a 2D state. Use simplified κ tuning. Assume diagonal P for simplicity (so matrix sqrt is just element-wise sqrt of scaled diagonal).

Show solution

javascript
function sigmaPoints(mu, P, kappa) {
  var n = 2;
  var s = n + kappa;
  var s0 = Math.sqrt(s * P[0][0]);
  var s1 = Math.sqrt(s * P[1][1]);
  return [
    [mu[0], mu[1]],
    [mu[0]+s0, mu[1]],
    [mu[0], mu[1]+s1],
    [mu[0]-s0, mu[1]],
    [mu[0], mu[1]-s1]
  ];
}

Chapter 4: UKF vs EKF

Both the EKF and UKF handle nonlinear systems, but they approximate different things. The EKF linearizes the function (tangent line). The UKF samples the distribution (sigma points through the true function). When does each win?

Same problem, two approaches:
System: x' = x + x²·dt, state x ~ N(μ, σ²)

EKF: Linearize at μ: F_J = 1 + 2μ·dt
Predicted mean = f(μ) = μ + μ²·dt
Predicted variance = F_J² · σ² = (1+2μ·dt)² · σ²

UKF: True E[x'] = E[x + x²·dt] = μ + (μ² + σ²)·dt
The UKF captures the σ²·dt term; the EKF misses it.

When the EKF breaks: The EKF approximation degrades when (1) the function is highly curved (large second derivatives), (2) the uncertainty is large (the linearization point is far from the true state), or (3) both. In these cases, the tangent line diverges quickly from the true curve, and the predicted Gaussian is wrong in both mean and variance.

Exercise 4.1: EKF Mean Error Derive

For f(x) = x + x²·dt with μ=2, σ²=1, dt=0.1. EKF predicts mean = f(μ) = 2 + 4×0.1 = 2.4. The true mean = μ + (μ²+σ²)·dt = 2 + 5×0.1 = 2.5. What is the EKF's error in the predicted mean?

Show derivation

EKF mean = f(μ) = 2 + 4 × 0.1 = 2.4

True mean = μ + (μ² + σ²) × dt = 2 + (4+1)×0.1 = 2.5

Error = |2.4 − 2.5| = 0.1

The EKF misses the σ²·dt = 0.1 term because linearization throws away the curvature (second derivative) information. The UKF captures this term automatically through sigma point propagation. The error is σ² × d²f/dx² × dt / 2 — it grows with both uncertainty and curvature.

Exercise 4.2: When Does It Matter? Trace

If σ² = 0.01 instead of 1 (very tight uncertainty), the EKF mean error drops to 0.001. In which scenario is the EKF-UKF difference most critical?

Tracking a car on a straight highway with good GPS Tracking a missile with precise inertial sensors Re-acquiring a lost target after GPS dropout (large σ², nonlinear dynamics) Filtering temperature readings from a thermocouple

Show reasoning

The EKF-UKF gap is proportional to σ² × curvature. It matters most when uncertainty is large AND the dynamics are nonlinear. GPS dropout causes large σ²; re-acquiring a target involves highly nonlinear bearing/range measurements. The other scenarios either have small σ² (precise sensors) or linear dynamics (straight road, temperature).

Exercise 4.3: Computational Cost Derive

For a state dimension n=6 (e.g., position + velocity in 3D): the EKF must compute a 6×6 Jacobian (36 partial derivatives) and do matrix multiplies. The UKF must generate 2n+1 sigma points and propagate each. How many sigma points?

sigma points

Show derivation

2n + 1 = 2×6 + 1 = 13 sigma points

13 function evaluations vs. 36 partial derivatives. The UKF is often computationally cheaper because function evaluations are simpler than analytical derivatives. And you never need to derive the Jacobian symbolically — just evaluate f(x) at 13 points. For systems where the Jacobian is hard to derive (or impossible, like neural network dynamics), the UKF wins by default.

Exercise 4.4: EKF vs UKF Decision Design

Order these steps for choosing between EKF and UKF in a new robotics project.

Test with representative noise levels Check if Jacobian is analytically derivable Assess degree of nonlinearity Choose EKF if mild, UKF if strong nonlinearity

Show reasoning

First assess nonlinearity (how curved are the dynamics?). Then check if the Jacobian is tractable. Then test both with real noise levels. Finally choose based on results. Many practitioners start with EKF because it's simpler, then switch to UKF only if linearization error causes problems. The UKF has no downside except slightly more code complexity and Cholesky decomposition overhead.

Chapter 5: Particle Filters

The Kalman filter (and EKF/UKF) assume the belief is Gaussian. But what if the posterior is bimodal ("the robot is either in the kitchen or the bedroom") or has sharp edges? Particle filters represent the belief as a set of weighted samples (particles). No Gaussian assumption at all.

Particle filter cycle:
1. Sample: draw N particles from prior or propagate existing ones through f(x)
2. Weight: w_i = p(z | x_i) — how well does each particle explain the measurement?
3. Normalize: w_i = w_i / ∑ w_j
4. Resample: draw N new particles with probability proportional to weights

Effective sample size: N_eff = 1 / ∑ w_i²

Weight degeneracy: After a few steps without resampling, one particle gets weight ~1 and all others get ~0. The effective sample size drops to 1. This is the fundamental failure mode of particle filters. Resampling (step 4) combats this by duplicating high-weight particles and killing low-weight ones.

Exercise 5.1: Weight Update Derive

5 particles at positions [1, 3, 5, 7, 9]. Measurement z=4 with Gaussian likelihood p(z|x) = exp(−(z−x)²/8). Compute the unnormalized weight for particle at x=3.

Show derivation

w = exp(−(4−3)²/8) = exp(−1/8) = exp(−0.125) = 0.8825

This particle is 1 unit from the measurement. The (z−x)²/8 denominator comes from 2σ² where σ²=4. The closer a particle is to the measurement, the higher its weight.

Exercise 5.2: Normalize Weights Derive

Unnormalized weights for particles at [1,3,5,7,9] with z=4: [0.3247, 0.8825, 0.8825, 0.3247, 0.0439]. What is the normalized weight of the particle at x=5?

Show derivation

Sum = 0.3247 + 0.8825 + 0.8825 + 0.3247 + 0.0439 = 2.4583

w₃ (x=5) = 0.8825 / 2.4583 = 0.359

Particles at x=3 and x=5 get equal weight (both are 1 unit from z=4). Together they hold 72% of the total weight. The particle at x=9 is 5 units away and gets only 1.8% — it's practically dead.

Exercise 5.3: Effective Sample Size Derive

Normalized weights: [0.132, 0.359, 0.359, 0.132, 0.018]. Compute N_eff = 1/∑w_i².

Show derivation

∑w_i² = 0.132² + 0.359² + 0.359² + 0.132² + 0.018²

= 0.0174 + 0.1289 + 0.1289 + 0.0174 + 0.0003 = 0.2929

N_eff = 1/0.2929 = 3.41

Out of 5 particles, only ~3.4 are "effective." If all weights were equal (0.2 each), N_eff would be 5. The closer N_eff is to 1, the more degenerate the particle set. A common rule: resample when N_eff < N/2.

Exercise 5.4: Systematic Resampling Trace

Cumulative weights: [0.132, 0.491, 0.850, 0.982, 1.0]. Systematic resampling with u₀=0.1 and step=0.2 gives thresholds [0.1, 0.3, 0.5, 0.7, 0.9]. How many times is particle 2 (cumulative range 0.132 to 0.491) selected?

1 time (threshold 0.3 falls in [0.132, 0.491]) 2 times (thresholds 0.3 and 0.5 fall in this range) 0 times

Show reasoning

Thresholds: 0.1, 0.3, 0.5, 0.7, 0.9. Cumulative ranges: particle 1=[0, 0.132], particle 2=(0.132, 0.491], particle 3=(0.491, 0.850], particle 4=(0.850, 0.982], particle 5=(0.982, 1.0].

0.1 → particle 1, 0.3 → particle 2, 0.5 → particle 3, 0.7 → particle 3, 0.9 → particle 4. Particle 2 is selected once. Particle 3 gets two copies (it had weight 0.359). Particle 5 (weight 0.018) gets zero copies — it's been killed by resampling.

Exercise 5.5: Implement particleWeights() Build

Compute normalized particle weights given positions, a measurement, and measurement variance.

Show solution

javascript
function particleWeights(positions, z, sigma2) {
  const raw = positions.map(x =>
    Math.exp(-(z - x) * (z - x) / (2 * sigma2))
  );
  const sum = raw.reduce((a, b) => a + b, 0);
  return raw.map(w => w / sum);
}

Exercise 5.6: Degeneracy Detection Derive

After 10 timesteps without resampling, weights become [0.97, 0.01, 0.01, 0.005, 0.005]. Compute N_eff. Should you resample?

Show derivation

∑w_i² = 0.97² + 0.01² + 0.01² + 0.005² + 0.005² = 0.9409 + 0.0001 + 0.0001 + 0.000025 + 0.000025 = 0.9412

N_eff = 1/0.9412 = 1.062

N_eff = 1.06 out of 5 — catastrophic degeneracy. One particle has 97% of the weight. You absolutely must resample. After resampling, you'd have ~5 copies of the dominant particle (with slight perturbation) and zero copies of the others. This is why particle filters need hundreds or thousands of particles for real problems.

Chapter 6: Sensor Fusion

Real robots don't rely on a single sensor. A self-driving car fuses GPS (absolute but noisy), IMU (fast but drifts), wheel encoders (precise but slips), cameras (rich but slow), and LiDAR. The Kalman filter framework handles this naturally: different sensors just mean different H matrices and noise covariances R, applied in separate update steps.

Multi-sensor update strategy:
Option 1: Sequential updates — apply each sensor's update one after another
Option 2: Stacked measurement — concatenate all measurements into one z, stack H matrices

Typical sensor characteristics:
GPS: position σ = 5m, rate = 1 Hz, no drift
IMU: acceleration σ = 0.1 m/s², rate = 100 Hz, drifts
Magnetometer: heading σ = 10°, rate = 10 Hz, susceptible to interference

Complementary filtering: GPS gives good long-term position (no drift) but poor short-term (noisy, slow). IMU gives good short-term (smooth, fast) but poor long-term (drift accumulates). Fusing them gives the best of both: drift-free, smooth, fast position estimates. This is the fundamental insight behind all INS/GPS systems.

Exercise 6.1: GPS-Only Uncertainty Derive

GPS gives position with σ = 5m. After one GPS measurement, what is the position variance? (Assume no prior knowledge, so the update is just the measurement.)

m²

Show derivation

σ² = 5² = 25 m²

With no prior, the posterior is just the measurement distribution. After multiple GPS readings, the variance decreases, but never below the measurement noise floor (since each reading is independent).

Exercise 6.2: Fuse GPS + IMU Position Derive

GPS says position = 100m (σ = 5m). IMU dead-reckoning says position = 102m (σ = 3m). Using the 1D Kalman fusion formula, what is the fused position estimate?

Fused = (σ₂² · z₁ + σ₁² · z₂) / (σ₁² + σ₂²)

meters

Show derivation

Fused = (9 × 100 + 25 × 102) / (25 + 9) = (900 + 2550) / 34 = 3450 / 34 = 101.47

The fused estimate is closer to the IMU reading (102) because the IMU has lower variance (9 vs 25). The fusion trusts the more precise sensor more. The fused variance is 1/(1/25 + 1/9) = 25×9/34 = 6.62 m² (σ = 2.57m) — better than either sensor alone.

Exercise 6.3: Fused Variance Derive

Compute the fused variance from the GPS+IMU fusion above. σ_GPS=5m, σ_IMU=3m.

m²

Show derivation

σ_fused² = 1/(1/σ₁² + 1/σ₂²) = 1/(1/25 + 1/9) = 1/(0.04 + 0.111) = 1/0.151 = 6.62 m²

σ_fused = √6.62 = 2.57 m

The fused uncertainty (2.57m) is better than both GPS (5m) and IMU (3m). This is the fundamental benefit of sensor fusion: combining independent measurements always reduces uncertainty. The fused variance is always less than the smallest individual variance.

Exercise 6.4: State Vector Design Trace

You want to fuse GPS (position), IMU (acceleration), and magnetometer (heading). What's the minimum state vector to track all relevant quantities?

[x, y] — just position [x, y, vx, vy] — position and velocity [x, y, vx, vy, θ] — position, velocity, and heading [x, y, vx, vy, ax, ay, θ] — position, velocity, acceleration, heading

Show reasoning

[x, y, vx, vy, θ] is the minimum. GPS directly observes [x, y]. IMU acceleration integrates to velocity, so we need [vx, vy] in the state to fuse IMU. Magnetometer observes θ directly. We don't need acceleration in the state because we can treat it as a control input rather than a state variable (a common design choice). However, if you want to estimate IMU bias, you'd add [b_ax, b_ay] for a 7-element state.

Exercise 6.5: IMU Drift Derive

An IMU measures acceleration with bias b = 0.01 m/s². After integrating for t = 60 seconds, what is the position drift error? (Drift from constant bias: Δx = ½ b t²)

meters

Show derivation

Δx = ½ × 0.01 × 60² = 0.005 × 3600 = 18 m

A tiny 0.01 m/s² bias causes 18 meters of drift in just one minute. Drift grows quadratically with time — at 10 minutes it would be 1800m. This is why IMU-only navigation fails quickly and GPS corrections are essential. It's also why estimating the bias as part of the state vector is critical for any production system.

Chapter 7: Observability

Can you estimate the full state from the measurements you have? If you only measure position, can you figure out velocity? If you only measure voltage in a circuit, can you infer current? The answer depends on observability — a fundamental property of the system.

Observability matrix:
O = [H; H·F; H·F²; ...; H·F^n-1]

The system is observable if and only if rank(O) = n (full rank).

For n=2: O = [[H], [H·F]] — a (2×2) matrix for scalar measurements.

Why this matters for filters: If a state is unobservable, the Kalman filter cannot reduce its uncertainty — the corresponding diagonal of P will grow without bound. Designing a sensor suite that makes all states observable is a prerequisite for any estimation system.

Exercise 7.1: Position from Position Derive

State = [position, velocity], F = [[1,1],[0,1]], H = [1, 0] (measure position). Compute the observability matrix O = [H; H·F]. Is the system observable?

rank of O

Show derivation

H = [1, 0]

H · F = [1, 0] · [[1,1],[0,1]] = [1, 1]

O = [[1, 0], [1, 1]]

det(O) = 1×1 − 0×1 = 1 ≠ 0 → rank = 2

Rank 2 = n, so the system is fully observable. Even though we only measure position, the dynamics couple position and velocity. By observing how position changes over time, we can infer velocity. This is exactly what the Kalman filter does — the off-diagonal terms in P capture this position-velocity coupling.

Exercise 7.2: Velocity from Velocity Derive

Same F = [[1,1],[0,1]], but now H = [0, 1] (measure velocity only). Compute O = [H; H·F] and its rank.

rank of O

Show derivation

H = [0, 1]

H · F = [0, 1] · [[1,1],[0,1]] = [0, 1]

O = [[0, 1], [0, 1]]

Rows are identical → rank = 1 < n = 2

Not observable! Measuring velocity alone cannot determine position, because in the constant-velocity model, velocity doesn't depend on position. The velocity measurement gives us v directly, but there's no way to figure out x from repeated v measurements — x could be any value. Physically: knowing how fast you're going doesn't tell you where you are.

Exercise 7.3: Making It Observable Trace

The system with H=[0,1] measuring velocity is unobservable. Which change would make it observable?

Increase the measurement rate Change the dynamics to include drag: velocity' = 0.9 × velocity + 0.1 × position Decrease measurement noise R Increase process noise Q

Show reasoning

If F = [[1,1],[0.1,0.9]], then H·F = [0,1]·[[1,1],[0.1,0.9]] = [0.1, 0.9]. Now O = [[0,1],[0.1,0.9]], which has det = −0.1 ≠ 0 → rank 2. The system is observable because velocity now depends on position (drag/spring force), so observing velocity changes reveals position information. Measurement rate and noise affect estimation quality, not observability itself — observability is a structural property of (F, H).

Exercise 7.4: 3-State Observability Derive

State = [x, v, a] (position, velocity, acceleration), F = [[1,1,0.5],[0,1,1],[0,0,1]], H = [1,0,0]. Compute O = [H; HF; HF²]. What is the (2,3) entry of O (row 2, column 3)?

Show derivation

H = [1, 0, 0]

HF = [1,0,0] · [[1,1,0.5],[0,1,1],[0,0,1]] = [1, 1, 0.5]

HF² = HF · F = [1,1,0.5] · [[1,1,0.5],[0,1,1],[0,0,1]] = [1, 2, 2]

O = [[1,0,0],[1,1,0.5],[1,2,2]]

O(2,3) = 0.5. The full O has rank 3 (det = 1×(1×2 − 0.5×2) − 0×(...) + 0×(...) = 1), so the system is fully observable. Position measurements alone can recover velocity and acceleration, because F couples them.

Exercise 7.5: Implement obsMatrix() Build

Build the observability matrix for a 2-state system. Return [[H], [HF]] as a 2×2 array.

Show solution

javascript
function obsMatrix(H, F) {
  const HF = [
    H[0]*F[0][0] + H[1]*F[1][0],
    H[0]*F[0][1] + H[1]*F[1][1]
  ];
  return [[H[0], H[1]], [HF[0], HF[1]]];
}

Chapter 9: Capstone — INS/GPS Fusion

Everything comes together in a full inertial navigation system (INS) fused with GPS. This is the system inside every drone, self-driving car, and smartphone. The state vector is large, the dynamics are nonlinear, and sensor fusion is essential for robustness.

Full 9-state INS/GPS state vector:
x = [p_x, p_y, p_z, v_x, v_y, v_z, φ, θ, ψ]^T
(3D position, 3D velocity, roll/pitch/yaw)

Predict (IMU-driven):
position' = position + velocity · dt
velocity' = velocity + R(φ,θ,ψ) · a_body · dt + g · dt
attitude' = attitude + ω_body · dt

GPS Update: H_GPS = [[I₃, 0₃, 0₃]] (observes only position)

GPS outage: When GPS is lost, the filter runs on IMU prediction only. Position uncertainty grows quadratically (IMU drift). A typical MEMS IMU drifts ~18m after 60 seconds (from the bias calculation in Ch6). After 5 minutes: 18 × 25 = 450m. This is why autonomous vehicles need redundant absolute position sensors (LiDAR localization, visual odometry).

Exercise 9.1: State Dimension Trace

A production INS adds 3 accelerometer biases and 3 gyroscope biases to the state. What is the total state dimension?

9 12 15 (9 states + 3 accel biases + 3 gyro biases) 21

Show reasoning

9 base states (pos, vel, attitude) + 3 accelerometer biases + 3 gyroscope biases = 15. The biases are slowly time-varying, modeled as random walks in the state transition. By estimating them online, the filter can subtract the bias from future IMU readings, dramatically reducing drift. Some systems also add 3 magnetometer biases for 18 total states.

Exercise 9.2: Predict Step — Position Derive

Current state: p = [100, 200, 50]m, v = [10, 5, −2] m/s. dt = 0.01s (100Hz IMU). What is the predicted p_x?

meters

Show derivation

p_x' = p_x + v_x × dt = 100 + 10 × 0.01 = 100.1 m

At 100Hz, each predict step moves only 0.1m. This tiny step is why IMU integration is smooth — each step is essentially linear. The nonlinearity accumulates over many steps, which is where drift comes from.

Exercise 9.3: Velocity Update with Gravity Derive

IMU reads a_body = [0, 0, 9.81] m/s² (stationary, sensing gravity). Gravity vector g = [0, 0, −9.81] m/s². The rotation R = I (level). Current v_z = 0 m/s, dt = 0.01s. What is predicted v_z?

v' = v + R · a_body · dt + g · dt

m/s

Show derivation

v_z' = 0 + (1 × 9.81) × 0.01 + (−9.81) × 0.01 = 0.0981 − 0.0981 = 0

When stationary, the IMU reads +g upward (it measures the reaction force against gravity). Adding the gravity compensation g = [0,0,−9.81] cancels this out, giving zero net acceleration. This is the most common source of bugs in INS code: forgetting to subtract gravity, causing the filter to think the vehicle is accelerating upward at 9.81 m/s².

Exercise 9.4: GPS Update Derive

After 100 predict steps (1 second), predicted position = [101, 205, 49.8]. GPS measurement = [100.5, 204.8, 50.1] with R = diag(25, 25, 25) (5m σ each axis). Predicted P for position block = diag(2, 2, 2). What is the Kalman gain K for the x-component? (1D formula per axis since everything is diagonal.)

Show derivation

K_x = P_xx / (P_xx + R_xx) = 2 / (2 + 25) = 2/27 = 0.074

Updated p_x = 101 + 0.074 × (100.5 − 101) = 101 + 0.074 × (−0.5) = 101 − 0.037 = 100.963

K = 0.074: the filter barely moves toward the GPS reading. Why? Because the IMU-propagated uncertainty (P=2 m², σ=1.4m) is much smaller than GPS noise (R=25 m², σ=5m). After only 1 second, the IMU is still more trustworthy. After 60 seconds without GPS, P would grow to ~324 m² (σ=18m), and K would jump to ~0.93. This is the GPS/IMU handoff in action.

Exercise 9.5: GPS Outage Drift Derive

GPS is lost for 120 seconds. IMU accelerometer bias = 0.05 m/s². How much position drift accumulates? (Δx = ½ b t²)

meters

Show derivation

Δx = ½ × 0.05 × 120² = 0.025 × 14400 = 360 m

360 meters of drift in 2 minutes from a 0.05 m/s² bias. This is why GPS outages are critical for autonomous vehicles. Solutions: (1) estimate and subtract the bias (the 15-state filter), (2) use additional absolute position sources (LiDAR, visual odometry), (3) use a higher-quality IMU with smaller bias (tactical grade: 0.001 m/s², giving only 7.2m drift in 2 min).

Exercise 9.6: P Matrix Size Derive

For the 15-state INS/GPS filter, the covariance matrix P is 15×15. How many unique elements does P have? (P is symmetric.)

elements

Show derivation

Unique elements = n(n+1)/2 = 15 × 16 / 2 = 120

120 values to track, and FPF^T involves 15×15 matrix multiplications at 100Hz. That's ~675,000 multiply-adds per second just for covariance propagation. This is why embedded INS systems use the square-root form (Cholesky) or the Joseph form for numerical stability — and why the UKF with 31 sigma points is sometimes preferred over the EKF for this problem.

Exercise 9.7: Design the Pipeline Design

Order the INS/GPS filter pipeline for one cycle at 100Hz (IMU rate).

Read IMU (accel + gyro) EKF predict with IMU Check if GPS available GPS update (if available) Output fused state estimate

Show reasoning

1. Read IMU — get latest accelerometer and gyroscope measurements. 2. EKF predict — propagate state and covariance forward using IMU data. 3. Check GPS — GPS arrives at 1Hz, so 99 out of 100 cycles have no GPS. 4. GPS update (if available) — correct position using GPS measurement. 5. Output state — publish the current best estimate to downstream systems. The predict step runs every cycle (100Hz), but the GPS update only fires ~once per second.

State Estimation

Chapter 0: Multi-Dimensional KF

Chapter 1: EKF Jacobians

Chapter 2: EKF Range-Bearing

Chapter 3: UKF Sigma Points

Chapter 4: UKF vs EKF

Chapter 5: Particle Filters

Chapter 6: Sensor Fusion

Chapter 7: Observability

Chapter 8: Process & Measurement Noise Tuning

Chapter 9: Capstone — INS/GPS Fusion