Workbook — State Estimation (Advanced)

State Estimation

Multi-dimensional Kalman filters, EKF Jacobians, UKF sigma points, particle filters, sensor fusion, observability, and noise tuning. Every calculation worked by hand with instant feedback.

Prerequisites: Kalman Filter basics (1D predict/update) + Matrix multiplication + Partial derivatives. Familiarity with the EKF/UKF lessons helps.
10
Chapters
56
Exercises
5
Exercise Types
Mastery
0 / 56 exercises (0%)
0
Day Streak
Best: 0

Chapter 0: Multi-Dimensional KF

A 1D Kalman filter tracks a single number — maybe temperature, or altitude. But real systems have multiple coupled variables. A falling ball has both position and velocity. A car has position, velocity, and heading. To track these, we need matrix equations.

The simplest multi-dimensional case: track position and velocity in 1D. The state vector is x = [position, velocity]T. Physics tells us: position' = position + velocity × dt, and velocity' = velocity (constant velocity model). In matrix form:

State transition (constant velocity, dt = 1):
F = [[1, dt], [0, 1]]    x' = F · x

Measurement (observe position only):
H = [[1, 0]]    z = H · x + noise

Predict:
x̄ = F · x     P̄ = F · P · FT + Q

Update:
K = P̄ · HT · (H · P̄ · HT + R)-1
x = x̄ + K · (z − H · x̄)
P = (I − K · H) · P̄
The key insight: The 2×2 covariance matrix P tracks not just uncertainty in each variable, but also how they're correlated. The off-diagonal terms P[0,1] = P[1,0] capture position-velocity correlation. When you observe position, the Kalman gain uses this correlation to also improve your velocity estimate — even though you never directly measured velocity.
Exercise 0.1: Predict Step — State Derive

State x = [10, 2]T (position=10m, velocity=2m/s), dt = 1s. Compute the predicted state x̄ = F · x. What is the predicted position?

meters
Show derivation
x̄ = F · x = [[1, 1], [0, 1]] · [10, 2]T
x̄[0] = 1×10 + 1×2 = 12
x̄[1] = 0×10 + 1×2 = 2

Position advanced by velocity × dt = 2 × 1 = 2 meters. Velocity stays at 2 m/s (constant velocity model). This is just x' = x + v·dt encoded as a matrix multiply.

Exercise 0.2: Predict Step — Covariance Derive

Current covariance P = [[4, 0], [0, 1]] (position variance=4, velocity variance=1, no correlation). Process noise Q = [[0.1, 0], [0, 0.1]]. dt = 1. Compute P̄ = F·P·FT + Q. What is P̄[0,0] (predicted position variance)?

Show derivation
F · P = [[1,1],[0,1]] · [[4,0],[0,1]] = [[4,1],[0,1]]
F · P · FT = [[4,1],[0,1]] · [[1,0],[1,1]] = [[5,1],[1,1]]
P̄ = [[5,1],[1,1]] + [[0.1,0],[0,0.1]] = [[5.1,1],[1,1.1]]

Position variance grew from 4 to 5.1. The extra 1 came from velocity uncertainty "leaking" into position uncertainty (since position depends on velocity). Notice the off-diagonal term is now 1 — the predict step created a correlation between position and velocity errors even though there was none before. This is the magic of multi-dimensional filtering.

Exercise 0.3: Off-Diagonal Meaning Trace
After the predict step above, P̄[0,1] = 1. What does this positive off-diagonal covariance mean physically?
Show reasoning

Positive cross-covariance means the errors are positively correlated. If the ball is ahead of where we think (positive position error), it's probably because it was going faster than we estimated (positive velocity error). The filter exploits this: when a position measurement says "you're ahead," the Kalman gain corrects velocity upward too.

Exercise 0.4: Kalman Gain Derive

Using P̄ = [[5.1, 1], [1, 1.1]] and H = [[1, 0]], R = [2] (measurement noise variance). Compute S = H·P̄·HT + R, then K = P̄·HT·S-1. What is K[0] (the position component of the Kalman gain)?

HT = [[1],[0]]. P̄·HT = [[5.1],[1]]. S is a scalar here.

Show derivation
H · P̄ · HT = [1, 0] · [[5.1,1],[1,1.1]] · [[1],[0]] = [5.1, 1] · [[1],[0]] = 5.1
S = 5.1 + 2 = 7.1
K = P̄ · HT / S = [[5.1],[1]] / 7.1 = [[0.718],[0.141]]

K[0] = 0.718: we trust the measurement quite a bit for position (71.8% of the residual corrects position). K[1] = 0.141: the position measurement also corrects velocity by 14.1% of the residual. This is cross-state correction in action — we never measured velocity, but the correlation in P̄ lets us improve it.

Exercise 0.5: Full Update Derive

Predicted state x̄ = [12, 2]T, Kalman gain K = [0.718, 0.141]T, measurement z = 13.5. Compute the updated position estimate.

meters
Show derivation
Residual = z − H · x̄ = 13.5 − [1,0] · [12, 2]T = 13.5 − 12 = 1.5
x = x̄ + K · residual = [12, 2]T + [0.718, 0.141]T × 1.5
x = [12 + 1.077, 2 + 0.211]T = [13.077, 2.211]T

The measurement says we're at 13.5, the prediction said 12. The filter compromises at 13.077, closer to the measurement because K[0] = 0.718 is relatively high. And velocity was bumped from 2.0 to 2.211 — the filter inferred "if we're further ahead than predicted, we must be going faster."

Exercise 0.6: Implement kfPredict() Build

Implement the predict step for a 2D state. F and P are 2×2 arrays ([[a,b],[c,d]]). Return { x: [x0,x1], P: [[...],[...]] }.

Show solution
javascript
function kfPredict(x, P, F, Q) {
  // x' = F * x
  const xp = [
    F[0][0]*x[0] + F[0][1]*x[1],
    F[1][0]*x[0] + F[1][1]*x[1]
  ];
  // P' = F*P*F^T + Q  (2x2 multiply)
  const FP = [
    [F[0][0]*P[0][0]+F[0][1]*P[1][0], F[0][0]*P[0][1]+F[0][1]*P[1][1]],
    [F[1][0]*P[0][0]+F[1][1]*P[1][0], F[1][0]*P[0][1]+F[1][1]*P[1][1]]
  ];
  // FP * F^T
  const Pp = [
    [FP[0][0]*F[0][0]+FP[0][1]*F[0][1]+Q[0][0], FP[0][0]*F[1][0]+FP[0][1]*F[1][1]+Q[0][1]],
    [FP[1][0]*F[0][0]+FP[1][1]*F[0][1]+Q[1][0], FP[1][0]*F[1][0]+FP[1][1]*F[1][1]+Q[1][1]]
  ];
  return { x: xp, P: Pp };
}

Chapter 1: EKF Jacobians

When the motion model is nonlinear, the Kalman filter's F matrix doesn't exist — there's no single matrix that maps old state to new state. Instead, we have a function x' = f(x, u). The EKF linearizes this function at the current state by computing its Jacobian: a matrix of partial derivatives.

Consider a robot in 2D. Its state is [x, y, θ]T (position and heading). Given control inputs v (forward velocity) and ω (angular velocity), the motion model is:

f(x, y, θ) = [x + v·cos(θ)·dt,   y + v·sin(θ)·dt,   θ + ω·dt]T

Jacobian FJ = ∂f/∂[x, y, θ]:
FJ = [[1, 0, −v·sin(θ)·dt], [0, 1, v·cos(θ)·dt], [0, 0, 1]]
Why the Jacobian matters: The Jacobian replaces the F matrix in the covariance predict step: P̄ = FJ · P · FJT + Q. The state is predicted using the nonlinear function x' = f(x), but uncertainty is propagated using the linear approximation. This is the core EKF trick.
Exercise 1.1: Compute the Jacobian Derive

For the robot model f = [x + v·cos(θ)·dt, y + v·sin(θ)·dt, θ + ω·dt], compute ∂f1/∂θ (how does the x-update change when θ changes). Evaluate at θ = π/6 (30°), v = 2 m/s, dt = 0.1s.

Show derivation
∂f1/∂θ = ∂(x + v·cos(θ)·dt)/∂θ = −v·sin(θ)·dt
= −2 × sin(π/6) × 0.1 = −2 × 0.5 × 0.1 = −0.1

Negative sign: if the robot turns more (increases θ), the x-component of its motion decreases (the robot moves less in x and more in y). At θ=30°, this sensitivity is −0.1 meters per radian of heading change.

Exercise 1.2: Full Jacobian Evaluation Derive

Evaluate FJ[1,2] (row 2, col 3 — ∂f2/∂θ) at θ = π/4 (45°), v = 3 m/s, dt = 0.5s.

Show derivation
∂f2/∂θ = ∂(y + v·sin(θ)·dt)/∂θ = v·cos(θ)·dt
= 3 × cos(π/4) × 0.5 = 3 × 0.7071 × 0.5 = 1.061

Positive: turning more (increasing θ) makes the robot move more in the y-direction at this heading. At 45°, the sensitivity is symmetric between x and y — both partial derivatives have the same magnitude (1.061).

Exercise 1.3: Jacobian of a Simpler Model Trace
For a 1D model f(x) = x + x²·dt, the Jacobian is a scalar. What is it?
Show derivation
df/dx = d(x + x²·dt)/dx = 1 + 2x·dt

The 1 comes from the linear term x (its derivative is 1). The 2x·dt comes from the quadratic term. At x=0, the Jacobian is 1 — the system looks linear. As |x| grows, the Jacobian deviates from 1, meaning the linearization captures less of the true dynamics.

Exercise 1.4: Nonlinear Predict Derive

Robot at state [3, 1, π/3]T (x=3, y=1, θ=60°). Controls: v=2 m/s, ω=0.1 rad/s, dt=0.5s. Compute the predicted y-coordinate using f2 = y + v·sin(θ)·dt.

meters
Show derivation
f2 = y + v·sin(θ)·dt = 1 + 2×sin(60°)×0.5
= 1 + 2 × 0.866 × 0.5 = 1 + 0.866 = 1.866
Exercise 1.5: Implement robotJacobian() Build

Return the 3×3 Jacobian of the robot motion model as a nested array.

Show solution
javascript
function robotJacobian(theta, v, dt) {
  return [
    [1, 0, -v * Math.sin(theta) * dt],
    [0, 1,  v * Math.cos(theta) * dt],
    [0, 0,  1]
  ];
}
Exercise 1.6: Find the Bug Debug

This EKF predict step has a bug. The state prediction is correct but the covariance blows up. Click the buggy line.

function ekfPredict(x, P, f, F_jacobian, Q) {
  const xp = f(x);               // nonlinear predict
  const F = F_jacobian(x);        // Jacobian at current state
  const FP = matMul(F, P);
  const Pp = matMul(FP, F) + Q;   // should be F^T, not F!
  return { x: xp, P: Pp };
}
Show explanation

Line 5 is the bug. The covariance predict formula is P̄ = F · P · FT + Q. The code computes F · P · F instead of F · P · FT. When F is not symmetric (it rarely is), multiplying by F instead of FT gives a non-symmetric P̄, which is not a valid covariance matrix. The result grows without bound because the eigenvalues of FPF can be larger than those of FPFT.

Chapter 2: EKF Range-Bearing

You're building a robot that observes landmarks. Your sensor gives you range (distance) and bearing (angle) to a landmark at known position (lx, ly). The measurement model is:

h(x) = [√((lx−x)² + (ly−y)²),   atan2(ly−y, lx−x) − θ]T

Measurement Jacobian H = ∂h/∂[x, y, θ]:
Let Δx = lx−x, Δy = ly−y, r = √(Δx² + Δy²)
H = [[−Δx/r, −Δy/r, 0], [Δy/r², −Δx/r², −1]]
Why this is hard: Square roots and arctangents are aggressively nonlinear. The Jacobian depends on the relative position of the robot to the landmark, which means H changes at every single timestep. There's no pre-computed measurement matrix — you must re-evaluate H at the current state estimate before every update.
Exercise 2.1: Compute Range Derive

Robot at (3, 4), landmark at (6, 8). What is the expected range measurement?

meters
Show derivation
Δx = 6−3 = 3,   Δy = 8−4 = 4
r = √(3² + 4²) = √(9 + 16) = √25 = 5
Exercise 2.2: Compute Bearing Derive

Same setup: robot at (3, 4) with heading θ = 0 rad, landmark at (6, 8). What is the expected bearing measurement (in radians)?

Bearing = atan2(Δy, Δx) − θ. atan2(4, 3) ≈ 0.9273 rad.

radians
Show derivation
bearing = atan2(4, 3) − 0 = 0.9273 rad ≈ 53.1°

The landmark is roughly 53 degrees to the left of the robot's forward direction (which is along the x-axis when θ=0).

Exercise 2.3: Measurement Jacobian H[0,0] Derive

Compute H[0,0] = −Δx/r for robot at (3,4), landmark at (6,8).

Show derivation
H[0,0] = −Δx/r = −3/5 = −0.6

Negative because if the robot moves in +x (Δx decreases), range decreases. The magnitude 0.6 says: 1 meter of robot x-motion changes the range by 0.6 meters. This makes sense — the landmark is not directly in front of the robot, so x-motion only partially closes the gap.

Exercise 2.4: Full H Matrix Derive

Compute the full 2×3 measurement Jacobian H for robot at (3,4), landmark at (6,8). What is H[1,1] = −Δx/r²?

Show derivation
H[1,1] = −Δx/r² = −3/25 = −0.12
Full H: [[−0.6, −0.8, 0], [0.16, −0.12, −1]]

The bearing row has much smaller magnitudes (0.16, 0.12) than the range row (0.6, 0.8) because bearing changes slowly with position but directly with heading (−1 in the θ column).

Exercise 2.5: Innovation Derive

Predicted state: robot at (3, 4, 0). Expected measurement: h(x) = [5, 0.9273]T. Actual measurement: z = [5.2, 0.95]T. What is the range component of the innovation (residual)?

meters
Show derivation
y = z − h(x̄) = [5.2 − 5, 0.95 − 0.9273] = [0.2, 0.0227]

The range residual is +0.2m: the measurement says the landmark is slightly farther than we predicted. The bearing residual is +0.023 rad (~1.3°): the landmark appears slightly more to the left than expected. Both are small, suggesting our prediction is good.

Exercise 2.6: Implement rangeBearingH() Build

Compute the 2×3 measurement Jacobian for range-bearing observations.

Show solution
javascript
function rangeBearingH(rx, ry, rtheta, lx, ly) {
  const dx = lx - rx, dy = ly - ry;
  const r2 = dx*dx + dy*dy;
  const r  = Math.sqrt(r2);
  return [
    [-dx/r, -dy/r, 0],
    [dy/r2, -dx/r2, -1]
  ];
}

Chapter 3: UKF Sigma Points

The UKF avoids Jacobians entirely. Instead, it picks 2n+1 deterministic sample points (sigma points) that capture the mean and covariance of the state distribution, pushes each one through the exact nonlinear function, and reconstructs a Gaussian from the transformed points.

Sigma point generation (n = state dimension):
λ = α²(n + κ) − n
χ0 = μ
χi = μ + (√((n+λ)P))i   for i = 1..n
χi+n = μ − (√((n+λ)P))i   for i = 1..n

Weights:
w0m = λ/(n+λ)    wim = 1/(2(n+λ)) for i>0
w0c = λ/(n+λ) + (1−α²+β)    wic = wim for i>0
Simplified version: Many implementations use κ as the only tuning parameter with α=1, β=0, giving λ=κ. Then: scale = √(n+κ), w0 = κ/(n+κ), wi = 1/(2(n+κ)). This is what we'll use for the exercises.
Exercise 3.1: How Many Sigma Points? Trace
For a state dimension n=2, how many sigma points does the UKF generate?
Exercise 3.2: Compute Sigma Points Derive

n=2, μ=[1, 2]T, P=[[4, 0],[0, 1]], κ=1 (using simplified α=1, β=0). The matrix √((n+κ)P) = √(3P). Since P is diagonal, √(3P) = [[√12, 0],[0, √3]] = [[3.464, 0],[0, 1.732]]. What is sigma point χ1?

χ1 = μ + column 1 of √(3P) = [1 + 3.464, 2 + 0]. Give the first component.

Show derivation
√(3P) = √(3 × [[4,0],[0,1]]) = [[√12, 0],[0, √3]] = [[3.464, 0],[0, 1.732]]
χ0 = [1, 2]
χ1 = [1 + 3.464, 2 + 0] = [4.464, 2]
χ2 = [1 + 0, 2 + 1.732] = [1, 3.732]
χ3 = [1 − 3.464, 2 − 0] = [−2.464, 2]
χ4 = [1 − 0, 2 − 1.732] = [1, 0.268]

The sigma points form a cross pattern: the center point plus 4 points along the principal axes of the covariance ellipse. More variance in x (=4) means the x-axis points are spread further apart (3.464 from center) than the y-axis points (1.732).

Exercise 3.3: Sigma Point Weights Derive

For n=2, κ=1 (simplified): compute w0 and wi (for i=1..4). Verify they sum to 1. What is w0?

Show derivation
w0 = κ/(n+κ) = 1/(2+1) = 1/3 ≈ 0.333
wi = 1/(2(n+κ)) = 1/(2×3) = 1/6 ≈ 0.167
Check: 1/3 + 4 × 1/6 = 1/3 + 4/6 = 1/3 + 2/3 = 1 ✓
Exercise 3.4: Propagate and Recover Mean Derive

Pass each sigma point through f(x,y) = [x², y]. Using χ0=[1,2], χ1=[4.464,2], χ2=[1,3.732], χ3=[−2.464,2], χ4=[1,0.268] and weights w0=1/3, wi=1/6. Compute the recovered mean of the first component: μ' = ∑ wi · f(χi)[0].

Show derivation
f(χ0)[0] = 1² = 1
f(χ1)[0] = 4.464² = 19.927
f(χ2)[0] = 1² = 1
f(χ3)[0] = (−2.464)² = 6.071
f(χ4)[0] = 1² = 1
μ'[0] = (1/3)×1 + (1/6)×(19.927 + 1 + 6.071 + 1)
= 0.333 + (1/6)×28 = 0.333 + 4.667 = 5.0

The true mean of x² when x~N(1,4) is E[x²] = Var(x) + E[x]² = 4 + 1 = 5. The UKF gets this exactly — because x² is a second-order polynomial, and sigma points capture the first two moments perfectly. An EKF would get this wrong (it would predict 1² = 1 and miss the variance term).

Exercise 3.5: Implement sigmaPoints() Build

Generate 2n+1 sigma points for a 2D state. Use simplified κ tuning. Assume diagonal P for simplicity (so matrix sqrt is just element-wise sqrt of scaled diagonal).

Show solution
javascript
function sigmaPoints(mu, P, kappa) {
  var n = 2;
  var s = n + kappa;
  var s0 = Math.sqrt(s * P[0][0]);
  var s1 = Math.sqrt(s * P[1][1]);
  return [
    [mu[0], mu[1]],
    [mu[0]+s0, mu[1]],
    [mu[0], mu[1]+s1],
    [mu[0]-s0, mu[1]],
    [mu[0], mu[1]-s1]
  ];
}

Chapter 4: UKF vs EKF

Both the EKF and UKF handle nonlinear systems, but they approximate different things. The EKF linearizes the function (tangent line). The UKF samples the distribution (sigma points through the true function). When does each win?

Same problem, two approaches:
System: x' = x + x²·dt,   state x ~ N(μ, σ²)

EKF: Linearize at μ: FJ = 1 + 2μ·dt
Predicted mean = f(μ) = μ + μ²·dt
Predicted variance = FJ² · σ² = (1+2μ·dt)² · σ²

UKF: True E[x'] = E[x + x²·dt] = μ + (μ² + σ²)·dt
The UKF captures the σ²·dt term; the EKF misses it.
When the EKF breaks: The EKF approximation degrades when (1) the function is highly curved (large second derivatives), (2) the uncertainty is large (the linearization point is far from the true state), or (3) both. In these cases, the tangent line diverges quickly from the true curve, and the predicted Gaussian is wrong in both mean and variance.
Exercise 4.1: EKF Mean Error Derive

For f(x) = x + x²·dt with μ=2, σ²=1, dt=0.1. EKF predicts mean = f(μ) = 2 + 4×0.1 = 2.4. The true mean = μ + (μ²+σ²)·dt = 2 + 5×0.1 = 2.5. What is the EKF's error in the predicted mean?

Show derivation
EKF mean = f(μ) = 2 + 4 × 0.1 = 2.4
True mean = μ + (μ² + σ²) × dt = 2 + (4+1)×0.1 = 2.5
Error = |2.4 − 2.5| = 0.1

The EKF misses the σ²·dt = 0.1 term because linearization throws away the curvature (second derivative) information. The UKF captures this term automatically through sigma point propagation. The error is σ² × d²f/dx² × dt / 2 — it grows with both uncertainty and curvature.

Exercise 4.2: When Does It Matter? Trace
If σ² = 0.01 instead of 1 (very tight uncertainty), the EKF mean error drops to 0.001. In which scenario is the EKF-UKF difference most critical?
Show reasoning

The EKF-UKF gap is proportional to σ² × curvature. It matters most when uncertainty is large AND the dynamics are nonlinear. GPS dropout causes large σ²; re-acquiring a target involves highly nonlinear bearing/range measurements. The other scenarios either have small σ² (precise sensors) or linear dynamics (straight road, temperature).

Exercise 4.3: Computational Cost Derive

For a state dimension n=6 (e.g., position + velocity in 3D): the EKF must compute a 6×6 Jacobian (36 partial derivatives) and do matrix multiplies. The UKF must generate 2n+1 sigma points and propagate each. How many sigma points?

sigma points
Show derivation
2n + 1 = 2×6 + 1 = 13 sigma points

13 function evaluations vs. 36 partial derivatives. The UKF is often computationally cheaper because function evaluations are simpler than analytical derivatives. And you never need to derive the Jacobian symbolically — just evaluate f(x) at 13 points. For systems where the Jacobian is hard to derive (or impossible, like neural network dynamics), the UKF wins by default.

Exercise 4.4: EKF vs UKF Decision Design

Order these steps for choosing between EKF and UKF in a new robotics project.

?
?
?
?
Test with representative noise levels Check if Jacobian is analytically derivable Assess degree of nonlinearity Choose EKF if mild, UKF if strong nonlinearity
Show reasoning

First assess nonlinearity (how curved are the dynamics?). Then check if the Jacobian is tractable. Then test both with real noise levels. Finally choose based on results. Many practitioners start with EKF because it's simpler, then switch to UKF only if linearization error causes problems. The UKF has no downside except slightly more code complexity and Cholesky decomposition overhead.

Chapter 5: Particle Filters

The Kalman filter (and EKF/UKF) assume the belief is Gaussian. But what if the posterior is bimodal ("the robot is either in the kitchen or the bedroom") or has sharp edges? Particle filters represent the belief as a set of weighted samples (particles). No Gaussian assumption at all.

Particle filter cycle:
1. Sample: draw N particles from prior or propagate existing ones through f(x)
2. Weight: wi = p(z | xi) — how well does each particle explain the measurement?
3. Normalize: wi = wi / ∑ wj
4. Resample: draw N new particles with probability proportional to weights

Effective sample size: Neff = 1 / ∑ wi²
Weight degeneracy: After a few steps without resampling, one particle gets weight ~1 and all others get ~0. The effective sample size drops to 1. This is the fundamental failure mode of particle filters. Resampling (step 4) combats this by duplicating high-weight particles and killing low-weight ones.
Exercise 5.1: Weight Update Derive

5 particles at positions [1, 3, 5, 7, 9]. Measurement z=4 with Gaussian likelihood p(z|x) = exp(−(z−x)²/8). Compute the unnormalized weight for particle at x=3.

Show derivation
w = exp(−(4−3)²/8) = exp(−1/8) = exp(−0.125) = 0.8825

This particle is 1 unit from the measurement. The (z−x)²/8 denominator comes from 2σ² where σ²=4. The closer a particle is to the measurement, the higher its weight.

Exercise 5.2: Normalize Weights Derive

Unnormalized weights for particles at [1,3,5,7,9] with z=4: [0.3247, 0.8825, 0.8825, 0.3247, 0.0439]. What is the normalized weight of the particle at x=5?

Show derivation
Sum = 0.3247 + 0.8825 + 0.8825 + 0.3247 + 0.0439 = 2.4583
w3 (x=5) = 0.8825 / 2.4583 = 0.359

Particles at x=3 and x=5 get equal weight (both are 1 unit from z=4). Together they hold 72% of the total weight. The particle at x=9 is 5 units away and gets only 1.8% — it's practically dead.

Exercise 5.3: Effective Sample Size Derive

Normalized weights: [0.132, 0.359, 0.359, 0.132, 0.018]. Compute Neff = 1/∑wi².

Show derivation
∑wi² = 0.132² + 0.359² + 0.359² + 0.132² + 0.018²
= 0.0174 + 0.1289 + 0.1289 + 0.0174 + 0.0003 = 0.2929
Neff = 1/0.2929 = 3.41

Out of 5 particles, only ~3.4 are "effective." If all weights were equal (0.2 each), Neff would be 5. The closer Neff is to 1, the more degenerate the particle set. A common rule: resample when Neff < N/2.

Exercise 5.4: Systematic Resampling Trace
Cumulative weights: [0.132, 0.491, 0.850, 0.982, 1.0]. Systematic resampling with u0=0.1 and step=0.2 gives thresholds [0.1, 0.3, 0.5, 0.7, 0.9]. How many times is particle 2 (cumulative range 0.132 to 0.491) selected?
Show reasoning

Thresholds: 0.1, 0.3, 0.5, 0.7, 0.9. Cumulative ranges: particle 1=[0, 0.132], particle 2=(0.132, 0.491], particle 3=(0.491, 0.850], particle 4=(0.850, 0.982], particle 5=(0.982, 1.0].

0.1 → particle 1, 0.3 → particle 2, 0.5 → particle 3, 0.7 → particle 3, 0.9 → particle 4. Particle 2 is selected once. Particle 3 gets two copies (it had weight 0.359). Particle 5 (weight 0.018) gets zero copies — it's been killed by resampling.

Exercise 5.5: Implement particleWeights() Build

Compute normalized particle weights given positions, a measurement, and measurement variance.

Show solution
javascript
function particleWeights(positions, z, sigma2) {
  const raw = positions.map(x =>
    Math.exp(-(z - x) * (z - x) / (2 * sigma2))
  );
  const sum = raw.reduce((a, b) => a + b, 0);
  return raw.map(w => w / sum);
}
Exercise 5.6: Degeneracy Detection Derive

After 10 timesteps without resampling, weights become [0.97, 0.01, 0.01, 0.005, 0.005]. Compute Neff. Should you resample?

Show derivation
∑wi² = 0.97² + 0.01² + 0.01² + 0.005² + 0.005² = 0.9409 + 0.0001 + 0.0001 + 0.000025 + 0.000025 = 0.9412
Neff = 1/0.9412 = 1.062

Neff = 1.06 out of 5 — catastrophic degeneracy. One particle has 97% of the weight. You absolutely must resample. After resampling, you'd have ~5 copies of the dominant particle (with slight perturbation) and zero copies of the others. This is why particle filters need hundreds or thousands of particles for real problems.

Chapter 6: Sensor Fusion

Real robots don't rely on a single sensor. A self-driving car fuses GPS (absolute but noisy), IMU (fast but drifts), wheel encoders (precise but slips), cameras (rich but slow), and LiDAR. The Kalman filter framework handles this naturally: different sensors just mean different H matrices and noise covariances R, applied in separate update steps.

Multi-sensor update strategy:
Option 1: Sequential updates — apply each sensor's update one after another
Option 2: Stacked measurement — concatenate all measurements into one z, stack H matrices

Typical sensor characteristics:
GPS: position σ = 5m, rate = 1 Hz, no drift
IMU: acceleration σ = 0.1 m/s², rate = 100 Hz, drifts
Magnetometer: heading σ = 10°, rate = 10 Hz, susceptible to interference
Complementary filtering: GPS gives good long-term position (no drift) but poor short-term (noisy, slow). IMU gives good short-term (smooth, fast) but poor long-term (drift accumulates). Fusing them gives the best of both: drift-free, smooth, fast position estimates. This is the fundamental insight behind all INS/GPS systems.
Exercise 6.1: GPS-Only Uncertainty Derive

GPS gives position with σ = 5m. After one GPS measurement, what is the position variance? (Assume no prior knowledge, so the update is just the measurement.)

Show derivation
σ² = 5² = 25 m²

With no prior, the posterior is just the measurement distribution. After multiple GPS readings, the variance decreases, but never below the measurement noise floor (since each reading is independent).

Exercise 6.2: Fuse GPS + IMU Position Derive

GPS says position = 100m (σ = 5m). IMU dead-reckoning says position = 102m (σ = 3m). Using the 1D Kalman fusion formula, what is the fused position estimate?

Fused = (σ2² · z1 + σ1² · z2) / (σ1² + σ2²)

meters
Show derivation
Fused = (9 × 100 + 25 × 102) / (25 + 9) = (900 + 2550) / 34 = 3450 / 34 = 101.47

The fused estimate is closer to the IMU reading (102) because the IMU has lower variance (9 vs 25). The fusion trusts the more precise sensor more. The fused variance is 1/(1/25 + 1/9) = 25×9/34 = 6.62 m² (σ = 2.57m) — better than either sensor alone.

Exercise 6.3: Fused Variance Derive

Compute the fused variance from the GPS+IMU fusion above. σGPS=5m, σIMU=3m.

Show derivation
σfused² = 1/(1/σ1² + 1/σ2²) = 1/(1/25 + 1/9) = 1/(0.04 + 0.111) = 1/0.151 = 6.62 m²
σfused = √6.62 = 2.57 m

The fused uncertainty (2.57m) is better than both GPS (5m) and IMU (3m). This is the fundamental benefit of sensor fusion: combining independent measurements always reduces uncertainty. The fused variance is always less than the smallest individual variance.

Exercise 6.4: State Vector Design Trace
You want to fuse GPS (position), IMU (acceleration), and magnetometer (heading). What's the minimum state vector to track all relevant quantities?
Show reasoning

[x, y, vx, vy, θ] is the minimum. GPS directly observes [x, y]. IMU acceleration integrates to velocity, so we need [vx, vy] in the state to fuse IMU. Magnetometer observes θ directly. We don't need acceleration in the state because we can treat it as a control input rather than a state variable (a common design choice). However, if you want to estimate IMU bias, you'd add [bax, bay] for a 7-element state.

Exercise 6.5: IMU Drift Derive

An IMU measures acceleration with bias b = 0.01 m/s². After integrating for t = 60 seconds, what is the position drift error? (Drift from constant bias: Δx = ½ b t²)

meters
Show derivation
Δx = ½ × 0.01 × 60² = 0.005 × 3600 = 18 m

A tiny 0.01 m/s² bias causes 18 meters of drift in just one minute. Drift grows quadratically with time — at 10 minutes it would be 1800m. This is why IMU-only navigation fails quickly and GPS corrections are essential. It's also why estimating the bias as part of the state vector is critical for any production system.

Chapter 7: Observability

Can you estimate the full state from the measurements you have? If you only measure position, can you figure out velocity? If you only measure voltage in a circuit, can you infer current? The answer depends on observability — a fundamental property of the system.

Observability matrix:
O = [H; H·F; H·F²; ...; H·Fn-1]

The system is observable if and only if rank(O) = n (full rank).

For n=2: O = [[H], [H·F]] — a (2×2) matrix for scalar measurements.
Why this matters for filters: If a state is unobservable, the Kalman filter cannot reduce its uncertainty — the corresponding diagonal of P will grow without bound. Designing a sensor suite that makes all states observable is a prerequisite for any estimation system.
Exercise 7.1: Position from Position Derive

State = [position, velocity], F = [[1,1],[0,1]], H = [1, 0] (measure position). Compute the observability matrix O = [H; H·F]. Is the system observable?

rank of O
Show derivation
H = [1, 0]
H · F = [1, 0] · [[1,1],[0,1]] = [1, 1]
O = [[1, 0], [1, 1]]
det(O) = 1×1 − 0×1 = 1 ≠ 0 → rank = 2

Rank 2 = n, so the system is fully observable. Even though we only measure position, the dynamics couple position and velocity. By observing how position changes over time, we can infer velocity. This is exactly what the Kalman filter does — the off-diagonal terms in P capture this position-velocity coupling.

Exercise 7.2: Velocity from Velocity Derive

Same F = [[1,1],[0,1]], but now H = [0, 1] (measure velocity only). Compute O = [H; H·F] and its rank.

rank of O
Show derivation
H = [0, 1]
H · F = [0, 1] · [[1,1],[0,1]] = [0, 1]
O = [[0, 1], [0, 1]]
Rows are identical → rank = 1 < n = 2

Not observable! Measuring velocity alone cannot determine position, because in the constant-velocity model, velocity doesn't depend on position. The velocity measurement gives us v directly, but there's no way to figure out x from repeated v measurements — x could be any value. Physically: knowing how fast you're going doesn't tell you where you are.

Exercise 7.3: Making It Observable Trace
The system with H=[0,1] measuring velocity is unobservable. Which change would make it observable?
Show reasoning

If F = [[1,1],[0.1,0.9]], then H·F = [0,1]·[[1,1],[0.1,0.9]] = [0.1, 0.9]. Now O = [[0,1],[0.1,0.9]], which has det = −0.1 ≠ 0 → rank 2. The system is observable because velocity now depends on position (drag/spring force), so observing velocity changes reveals position information. Measurement rate and noise affect estimation quality, not observability itself — observability is a structural property of (F, H).

Exercise 7.4: 3-State Observability Derive

State = [x, v, a] (position, velocity, acceleration), F = [[1,1,0.5],[0,1,1],[0,0,1]], H = [1,0,0]. Compute O = [H; HF; HF²]. What is the (2,3) entry of O (row 2, column 3)?

Show derivation
H = [1, 0, 0]
HF = [1,0,0] · [[1,1,0.5],[0,1,1],[0,0,1]] = [1, 1, 0.5]
HF² = HF · F = [1,1,0.5] · [[1,1,0.5],[0,1,1],[0,0,1]] = [1, 2, 2]
O = [[1,0,0],[1,1,0.5],[1,2,2]]

O(2,3) = 0.5. The full O has rank 3 (det = 1×(1×2 − 0.5×2) − 0×(...) + 0×(...) = 1), so the system is fully observable. Position measurements alone can recover velocity and acceleration, because F couples them.

Exercise 7.5: Implement obsMatrix() Build

Build the observability matrix for a 2-state system. Return [[H], [HF]] as a 2×2 array.

Show solution
javascript
function obsMatrix(H, F) {
  const HF = [
    H[0]*F[0][0] + H[1]*F[1][0],
    H[0]*F[0][1] + H[1]*F[1][1]
  ];
  return [[H[0], H[1]], [HF[0], HF[1]]];
}

Chapter 8: Process & Measurement Noise Tuning

Q (process noise) and R (measurement noise) are the most important tuning knobs in any Kalman filter. Get them wrong and the filter either lags behind reality or jitters with every noisy measurement. Understanding their effect on the steady-state Kalman gain K is essential.

Intuition for 1D:
K = P/(P + R) where P evolves as P → P + Q (predict) → P(1−K) (update)

Steady-state gain (1D, continuous-time):
K = (−R + √(R² + 4QR)) / (2R)   (simplified from algebraic Riccati)

Limiting cases:
Q → 0: K → 0 (trust model completely)
Q → ∞: K → 1 (trust measurement completely)
R → 0: K → 1 (perfect sensor)
R → ∞: K → 0 (useless sensor)
The Q/R ratio is what matters. K depends on Q/R, not Q and R individually. Doubling both Q and R gives the same steady-state gain. Think of it as the ratio of "how uncertain the model is" to "how uncertain the sensor is." High Q/R means K is high (trust sensor). Low Q/R means K is low (trust model).
Exercise 8.1: Q Too Small Trace
You set Q = 0.001 and R = 1. The true system has occasional sudden jumps. What symptom will you see?
Show reasoning

Small Q tells the filter "the model is very reliable, almost no process noise." So K becomes small (trust model, ignore measurements). When a sudden jump happens, the filter's prediction is now far from the truth, but it stubbornly trusts its (wrong) model and only slowly incorporates the new measurements. This manifests as lag — the estimate trails behind reality.

Exercise 8.2: Steady-State K for Different Q/R Derive

For a simple 1D system, the discrete-time steady-state equation gives P satisfying P = P + Q − P²/(P+R). This simplifies to P² + RP − QR − Q·P = 0. For Q=1, R=4: solve for K = P/(P+R).

From steady-state: P² = QR + QP → P² − QP − QR = 0. Use quadratic formula.

Show derivation
P² − QP − QR = 0 → P² − P − 4 = 0
P = (1 + √(1 + 16)) / 2 = (1 + √17) / 2 = (1 + 4.123) / 2 = 2.56
K = P/(P+R) = 2.56/(2.56+4) = 2.56/6.56 = 0.390

K ≈ 0.39: the filter gives 39% weight to measurements. With Q/R = 0.25, the model is moderately trusted. If we made Q=4, R=1 (Q/R=4), we'd get K ≈ 0.78 — heavily trusting measurements.

Exercise 8.3: R Too Small Trace
You set R = 0.001 (claiming the sensor is nearly perfect) but the true sensor noise is σ = 2m. What happens?
Show reasoning

Tiny R tells the filter "the sensor is perfect." So K → 1, and the estimate snaps to every raw measurement. Since the true noise is σ=2m, the output jitters wildly. This is worse than just using raw measurements, because the filter also claims very high confidence (small P) while being noisy — it's both wrong and overconfident.

Exercise 8.4: Q/R Ratio Experiment Derive

Compute K for three Q/R ratios: Q=0.01 R=1, Q=1 R=1, Q=100 R=1. Use the formula K = P/(P+R) where P = (Q+√(Q²+4QR))/2. What is K when Q=R=1?

Show derivation
P = (Q + √(Q² + 4QR)) / 2 = (1 + √(1 + 4)) / 2 = (1 + 2.236) / 2 = 1.618
K = 1.618 / (1.618 + 1) = 1.618 / 2.618 = 0.618

0.618 is the golden ratio minus 1. When Q=R, the filter splits trust roughly 62/38 toward measurements. For Q=0.01: P≈0.099, K≈0.09 (model wins). For Q=100: P≈10.5, K≈0.91 (sensor wins). The Q/R ratio is the master tuning knob.

Exercise 8.5: Find the Bug Debug

This filter is over-smoothing (K is too low). The programmer claims Q=1, R=1 should give K≈0.62. Click the line that makes K smaller than expected.

function kalmanUpdate(x, P, z, H, R, Q) {
  P = P + Q;                 // predict covariance
  const S = H * P * H + R;   // innovation covariance
  const K = P * H / S;       // Kalman gain
  x = x + K * (z - H * x);  // state update
  P = (1 - K * H) * P + Q;  // DOUBLE-COUNTING Q!
  return { x, P, K };
}
Show explanation

Line 6 is the bug. The update step P = (1−KH)P should NOT add Q again. Q is already added in the predict step (line 2). Adding it a second time inflates P, which at steady state forces a larger P but also means line 2 of the next iteration adds Q on top of an already-too-large P. The net effect is that S in line 3 becomes larger, making K smaller than expected. The fix: remove the "+ Q" from line 6.

Exercise 8.6: Adaptive Noise Trace
Some systems use "adaptive" Q: if the innovation (z − Hx) is consistently larger than expected (given S), what should you do?
Show reasoning

Large innovations mean the predictions are worse than the filter thinks. This could mean either Q is too small (model isn't as reliable as assumed) or R is too small (sensor noise is underestimated). Increasing Q is the safer choice because it makes the filter more responsive without making it overfit to noise. In practice, the innovation covariance ∑(y·yT)/N should approximately equal S. If it's consistently larger, Q should increase.

Chapter 9: Capstone — INS/GPS Fusion

Everything comes together in a full inertial navigation system (INS) fused with GPS. This is the system inside every drone, self-driving car, and smartphone. The state vector is large, the dynamics are nonlinear, and sensor fusion is essential for robustness.

Full 9-state INS/GPS state vector:
x = [px, py, pz, vx, vy, vz, φ, θ, ψ]T
     (3D position, 3D velocity, roll/pitch/yaw)

Predict (IMU-driven):
position' = position + velocity · dt
velocity' = velocity + R(φ,θ,ψ) · abody · dt + g · dt
attitude' = attitude + ωbody · dt

GPS Update: HGPS = [[I3, 03, 03]] (observes only position)
GPS outage: When GPS is lost, the filter runs on IMU prediction only. Position uncertainty grows quadratically (IMU drift). A typical MEMS IMU drifts ~18m after 60 seconds (from the bias calculation in Ch6). After 5 minutes: 18 × 25 = 450m. This is why autonomous vehicles need redundant absolute position sensors (LiDAR localization, visual odometry).
Exercise 9.1: State Dimension Trace
A production INS adds 3 accelerometer biases and 3 gyroscope biases to the state. What is the total state dimension?
Show reasoning

9 base states (pos, vel, attitude) + 3 accelerometer biases + 3 gyroscope biases = 15. The biases are slowly time-varying, modeled as random walks in the state transition. By estimating them online, the filter can subtract the bias from future IMU readings, dramatically reducing drift. Some systems also add 3 magnetometer biases for 18 total states.

Exercise 9.2: Predict Step — Position Derive

Current state: p = [100, 200, 50]m, v = [10, 5, −2] m/s. dt = 0.01s (100Hz IMU). What is the predicted px?

meters
Show derivation
px' = px + vx × dt = 100 + 10 × 0.01 = 100.1 m

At 100Hz, each predict step moves only 0.1m. This tiny step is why IMU integration is smooth — each step is essentially linear. The nonlinearity accumulates over many steps, which is where drift comes from.

Exercise 9.3: Velocity Update with Gravity Derive

IMU reads abody = [0, 0, 9.81] m/s² (stationary, sensing gravity). Gravity vector g = [0, 0, −9.81] m/s². The rotation R = I (level). Current vz = 0 m/s, dt = 0.01s. What is predicted vz?

v' = v + R · abody · dt + g · dt

m/s
Show derivation
vz' = 0 + (1 × 9.81) × 0.01 + (−9.81) × 0.01 = 0.0981 − 0.0981 = 0

When stationary, the IMU reads +g upward (it measures the reaction force against gravity). Adding the gravity compensation g = [0,0,−9.81] cancels this out, giving zero net acceleration. This is the most common source of bugs in INS code: forgetting to subtract gravity, causing the filter to think the vehicle is accelerating upward at 9.81 m/s².

Exercise 9.4: GPS Update Derive

After 100 predict steps (1 second), predicted position = [101, 205, 49.8]. GPS measurement = [100.5, 204.8, 50.1] with R = diag(25, 25, 25) (5m σ each axis). Predicted P for position block = diag(2, 2, 2). What is the Kalman gain K for the x-component? (1D formula per axis since everything is diagonal.)

Show derivation
Kx = Pxx / (Pxx + Rxx) = 2 / (2 + 25) = 2/27 = 0.074
Updated px = 101 + 0.074 × (100.5 − 101) = 101 + 0.074 × (−0.5) = 101 − 0.037 = 100.963

K = 0.074: the filter barely moves toward the GPS reading. Why? Because the IMU-propagated uncertainty (P=2 m², σ=1.4m) is much smaller than GPS noise (R=25 m², σ=5m). After only 1 second, the IMU is still more trustworthy. After 60 seconds without GPS, P would grow to ~324 m² (σ=18m), and K would jump to ~0.93. This is the GPS/IMU handoff in action.

Exercise 9.5: GPS Outage Drift Derive

GPS is lost for 120 seconds. IMU accelerometer bias = 0.05 m/s². How much position drift accumulates? (Δx = ½ b t²)

meters
Show derivation
Δx = ½ × 0.05 × 120² = 0.025 × 14400 = 360 m

360 meters of drift in 2 minutes from a 0.05 m/s² bias. This is why GPS outages are critical for autonomous vehicles. Solutions: (1) estimate and subtract the bias (the 15-state filter), (2) use additional absolute position sources (LiDAR, visual odometry), (3) use a higher-quality IMU with smaller bias (tactical grade: 0.001 m/s², giving only 7.2m drift in 2 min).

Exercise 9.6: P Matrix Size Derive

For the 15-state INS/GPS filter, the covariance matrix P is 15×15. How many unique elements does P have? (P is symmetric.)

elements
Show derivation
Unique elements = n(n+1)/2 = 15 × 16 / 2 = 120

120 values to track, and FPFT involves 15×15 matrix multiplications at 100Hz. That's ~675,000 multiply-adds per second just for covariance propagation. This is why embedded INS systems use the square-root form (Cholesky) or the Joseph form for numerical stability — and why the UKF with 31 sigma points is sometimes preferred over the EKF for this problem.

Exercise 9.7: Design the Pipeline Design

Order the INS/GPS filter pipeline for one cycle at 100Hz (IMU rate).

?
?
?
?
?
Read IMU (accel + gyro) EKF predict with IMU Check if GPS available GPS update (if available) Output fused state estimate
Show reasoning

1. Read IMU — get latest accelerometer and gyroscope measurements. 2. EKF predict — propagate state and covariance forward using IMU data. 3. Check GPS — GPS arrives at 1Hz, so 99 out of 100 cycles have no GPS. 4. GPS update (if available) — correct position using GPS measurement. 5. Output state — publish the current best estimate to downstream systems. The predict step runs every cycle (100Hz), but the GPS update only fires ~once per second.