SLAM & Navigation Workbook — Engineermaxxing

Chapter 0: EKF-SLAM State Vector

You're building a robot that needs to map an unknown environment while simultaneously figuring out where it is in that map. This is the Simultaneous Localization and Mapping (SLAM) problem. The robot's belief about the world is encoded in a single state vector — its own pose plus the positions of every landmark it has ever seen.

In 2D EKF-SLAM, the state vector stacks the robot pose and all landmark positions:

State vector:
x = [x_r, y_r, θ_r, x₁, y₁, x₂, y₂, …, x_N, y_N]^T

Dimensions:
Robot pose: 3 elements (x, y, heading)
Each landmark: 2 elements (x, y position)
Total state size: 3 + 2N (for N landmarks)

Covariance matrix:
Σ ∈ ℝ^{(3+2N) × (3+2N)}
Encodes uncertainty in every state element AND cross-correlations between them.

The covariance is the soul of EKF-SLAM. When the robot observes a landmark, the covariance builds cross-correlations between the robot pose and that landmark. Later, re-observing any correlated landmark tightens the uncertainty on everything — the robot pose AND all other landmarks. This is why loop closure is so powerful: one observation propagates corrections through the entire map.

Exercise 0.1: State Vector Size Derive

A 2D robot has observed 3 landmarks. What is the size of the state vector?

State = [robot pose (3)] + [landmarks (2 each)].

elements

Show derivation

State size = 3 + 2N = 3 + 2 × 3 = 9

x = [x_r, y_r, θ_r, x₁, y₁, x₂, y₂, x₃, y₃]^T. Nine numbers fully describe the robot's belief about the world.

Exercise 0.2: Covariance Matrix Size Derive

For 50 landmarks in 2D, how many elements does the full covariance matrix Σ contain?

elements

Show derivation

State size = 3 + 2 × 50 = 103

Covariance = 103 × 103 = 10,609 elements

This is why EKF-SLAM doesn't scale well. The covariance matrix grows quadratically with the number of landmarks. At 1000 landmarks, you'd have a 2003×2003 matrix with ~4 million elements — and you need to invert a portion of it at every update step.

Exercise 0.3: Covariance Sparsity Trace

In EKF-SLAM, after several observations, which of the following is true about the covariance matrix Σ?

It is sparse — most off-diagonal blocks are zero because landmarks far apart are uncorrelated It is diagonal — each landmark's uncertainty is independent It is fully dense — every landmark becomes correlated with every other through the shared robot pose, even if they were never observed together It is block-diagonal — only landmarks observed at the same time are correlated

Show explanation

The covariance matrix becomes fully dense over time. Here's why: when the robot observes landmark A, the update step creates correlations between the robot pose and A. Later, when it observes landmark B, the update creates correlations between the pose and B. But since the robot pose is correlated with A, B now becomes correlated with A — even though they were never observed at the same time.

This "correlation contagion" through the shared robot pose is both the power and the curse of EKF-SLAM. Power: loop closure corrections propagate everywhere. Curse: O(N²) memory and O(N²) updates.

Exercise 0.4: 3D SLAM State Derive

In 3D SLAM, the robot pose has 6 DOF (x, y, z, roll, pitch, yaw) and each landmark has 3 coordinates (x, y, z). For a robot that has mapped 200 landmarks, what is the covariance matrix size?

rows (= columns)

Show derivation

State size = 6 + 3 × 200 = 606

Covariance Σ ∈ ℝ^{606 × 606} = 367,236 elements

In practice, 3D EKF-SLAM is rarely used beyond ~100 landmarks due to the O(N²) cost. Modern 3D SLAM systems use graph-based or factor graph approaches that exploit the sparsity of the information matrix (inverse covariance) instead.

Exercise 0.5: Implement stateSize() Build

Write a function that returns the state vector dimension and covariance element count for a SLAM problem.

Return an array [stateVecSize, covElements].

Show solution

javascript
function stateSize(poseDim, landmarkDim, numLandmarks) {
  const n = poseDim + landmarkDim * numLandmarks;
  return [n, n * n];
}

Exercise 0.6: Unique Covariance Entries Derive

The covariance matrix is symmetric (Σ = Σ^T). For a 2D SLAM problem with 10 landmarks, how many unique entries does Σ have? (Include the diagonal.)

unique entries

Show derivation

State size n = 3 + 2 × 10 = 23

Unique entries = n(n+1)/2 = 23 × 24 / 2 = 276

Exploiting symmetry halves the storage. In practice, implementations store only the upper triangle. For n=23 this saves 253 entries; for n=1003 (500 landmarks in 2D) it saves ~500,000 entries.

Chapter 1: Landmark Observation

Your robot sees a landmark with its range-bearing sensor. The sensor reports: "I see something 5 meters away at 30 degrees to my left." But the state vector stores landmark positions in global coordinates. How do you connect the raw sensor measurement to the state? That's the observation model — the function h(x) that predicts what the sensor should see given the current state estimate.

Range-bearing observation model:
h(x) = [r, φ]^T where:
r = √((x_j - x_r)² + (y_j - y_r)²)    // range to landmark j
φ = atan2(y_j - y_r, x_j - x_r) - θ_r    // bearing relative to robot heading

Innovation (surprise):
ν = z - h(x̂)    // actual measurement minus predicted measurement

Innovation = surprise. If the innovation ν is near zero, the observation matches what we expected — no new information. If it's large, either our state estimate is wrong (time to correct!) or the measurement is an outlier (time to reject!). The magnitude of the innovation relative to its expected uncertainty is the key to data association.

Exercise 1.1: Predicted Range Derive

Robot at (2, 3, π/4). Landmark at (5, 7). What is the predicted range r?

meters

Show derivation

Δx = 5 - 2 = 3

Δy = 7 - 3 = 4

r = √(3² + 4²) = √(9 + 16) = √25 = 5.0 m

Exercise 1.2: Predicted Bearing Derive

Same setup: robot at (2, 3, π/4), landmark at (5, 7). What is the predicted bearing φ in radians? (Bearing = atan2(Δy, Δx) - θ_r.)

radians

Show derivation

atan2(4, 3) = 0.9273 rad

φ = 0.9273 - π/4 = 0.9273 - 0.7854 = 0.1419 rad ≈ 0.14 rad (≈ 8.1°)

The landmark is slightly to the left of the robot's heading. If the robot were facing directly at the landmark, φ would be zero.

Exercise 1.3: Innovation Vector Derive

The sensor measures z = [5.3, 0.05]^T (range 5.3m, bearing 0.05 rad). Using the predicted h(x) = [5.0, 0.14]^T from above, what is the innovation ν?

range innovation (meters)

Show derivation

ν = z - h(x̂) = [5.3 - 5.0, 0.05 - 0.14]^T = [0.3, -0.09]^T

The range innovation is +0.3m (landmark is farther than expected), and the bearing innovation is -0.09 rad (landmark is more to the right than expected). These residuals will drive the EKF update to shift the state estimate.

Exercise 1.4: Observation Jacobian H Trace

The Jacobian H = ∂h/∂x maps state perturbations to measurement perturbations. For a 2D SLAM with 3 landmarks, H is a 2×9 matrix (2 measurement dims, 9 state dims). When observing landmark 2, which columns of H are nonzero?

All 9 columns — every state element affects the measurement Columns 1-3 (robot pose) and columns 6-7 (landmark 2) — only 5 nonzero columns Only columns 6-7 (landmark 2 position) Only columns 1-3 (robot pose)

Show explanation

The predicted measurement h(x) depends on the robot pose (x_r, y_r, θ_r) and the observed landmark's position (x₂, y₂). It does not depend on any other landmark's position. So H has nonzero entries only in columns 1-3 (pose) and columns 6-7 (landmark 2). The other 4 columns are all zeros.

This sparse structure is critical: it means each observation only directly updates the robot pose and the observed landmark. However, because the covariance Σ has off-diagonal correlations, the correction indirectly propagates to all other landmarks through the Kalman gain K = ΣH^TS^-1.

Exercise 1.5: Jacobian Entries Derive

For the range measurement r = √(Δx² + Δy²), compute ∂r/∂x_r. (Let Δx = x_j - x_r, Δy = y_j - y_r, q = Δx² + Δy².)

Δx / √q -Δx / √q Δy / √q 1 / √q

Show derivation

r = √q = (q)^1/2

∂r/∂x_r = ½ q^-1/2 · ∂q/∂x_r

∂q/∂x_r = 2Δx · (-1) = -2Δx // chain rule on (x_j - x_r)²

∂r/∂x_r = ½ · q^-1/2 · (-2Δx) = -Δx / √q

The negative sign makes sense: increasing x_r (moving the robot right) decreases Δx when the landmark is to the right, which decreases the range. Note that ∂r/∂x_j = +Δx/√q — same magnitude, opposite sign.

Exercise 1.6: Implement observationModel() Build

Write a function that computes the predicted measurement [range, bearing] given a robot pose and landmark position.

Use Math.sqrt, Math.atan2. Bearing = atan2(dy,dx) - theta.

Show solution

javascript
function observationModel(xr, yr, thr, xj, yj) {
  const dx = xj - xr;
  const dy = yj - yr;
  const range = Math.sqrt(dx * dx + dy * dy);
  const bearing = Math.atan2(dy, dx) - thr;
  return [range, bearing];
}

Chapter 2: Data Association

Your robot sees a blip on its sensor. Is it landmark #3 that you mapped two minutes ago, or a completely new landmark? Get it wrong and your map breaks catastrophically — matching to the wrong landmark introduces phantom correlations that can never be corrected. This is the data association problem, and it's often harder than the estimation itself.

The standard approach uses Mahalanobis distance — the innovation scaled by its expected uncertainty:

Mahalanobis distance:
d² = ν^T S^-1 ν

where:
ν = z - h(x̂, j) // innovation for candidate landmark j
S = H Σ H^T + R // innovation covariance
R = measurement noise covariance

Decision rule (nearest neighbor):
Match to landmark j* = argmin_j d²_j, provided d²_j* < χ² threshold

Mahalanobis ≠ Euclidean. Euclidean distance treats all directions equally. Mahalanobis stretches space according to uncertainty — a 1-meter innovation along a high-uncertainty direction is less surprising than a 1-meter innovation along a low-uncertainty direction. It's the natural distance metric for Gaussians.

Exercise 2.1: Mahalanobis Distance (1D) Derive

Innovation ν = 0.5m. Innovation variance S = 0.25 m². What is d²?

(dimensionless)

Show derivation

d² = ν² / S = 0.5² / 0.25 = 0.25 / 0.25 = 1.0

d² = 1.0 means the innovation is exactly 1 standard deviation. For a χ² distribution with 1 DOF, this corresponds to the 68th percentile — quite plausible. The typical gate threshold at 95% confidence is d² < 3.84.

Exercise 2.2: 2D Mahalanobis Derive

Innovation ν = [0.3, -0.09]^T. Innovation covariance S = [[0.25, 0], [0, 0.01]]. Compute d² = ν^TS^-1ν.

Since S is diagonal, S^-1 = diag(1/0.25, 1/0.01) = diag(4, 100).

(dimensionless)

Show derivation

S^-1ν = [4 × 0.3, 100 × (-0.09)]^T = [1.2, -9.0]^T

d² = ν^T(S^-1ν) = 0.3 × 1.2 + (-0.09) × (-9.0) = 0.36 + 0.81 = 1.17

Even though the bearing innovation (-0.09 rad) looks small, its contribution (0.81) is larger than the range innovation's contribution (0.36) because bearing has much lower variance (0.01 vs 0.25). Mahalanobis distance respects this — small innovations in precise dimensions count more.

Exercise 2.3: Gate Test Trace

Your robot sees a landmark. Candidate match A has d² = 1.17, candidate B has d² = 8.5. The χ² gate at 95% confidence with 2 DOF is 5.99. What should the robot do?

Match to A (closest) and also match to B (backup) Match to A (passes gate) and reject B (d² = 8.5 > 5.99, fails gate) Reject both — Mahalanobis distances above 1.0 indicate bad matches Initialize a new landmark — both matches are rejected

Show explanation

Candidate A: d² = 1.17 < 5.99 — passes the gate. This is a statistically plausible match.

Candidate B: d² = 8.5 > 5.99 — fails the gate. At 95% confidence, a true match would produce d² this large only 1.4% of the time. Reject it.

If no candidates pass the gate, the observation is treated as a new landmark and initialized. Here, A passes, so we match to A.

Exercise 2.4: New Landmark Decision Trace

Why is aggressive nearest-neighbor data association dangerous in ambiguous environments (e.g., many similar-looking trees)?

It uses too much memory It makes the filter converge too slowly A single wrong match injects false correlations into the covariance that permanently corrupt the map — the EKF has no mechanism to undo a bad association It makes the robot move in circles

Show explanation

A single wrong association is catastrophic in EKF-SLAM. When you match observation z to the wrong landmark j, the update step "corrects" the state as if j generated z — this shifts both the robot pose and landmark j to explain a measurement that came from somewhere else entirely. Worse, these false corrections propagate through the covariance to every other landmark.

This is why robust data association methods like Joint Compatibility Branch and Bound (JCBB) exist — they evaluate the consistency of the entire set of associations jointly, not each one independently.

Exercise 2.5: Implement mahalanobis() Build

Write a function that computes the Mahalanobis distance squared for a 2D innovation with a diagonal covariance.

For diagonal S, d² = nu0²/s00 + nu1²/s11.

Show solution

javascript
function mahalanobis(nu0, nu1, s00, s11) {
  return (nu0 * nu0) / s00 + (nu1 * nu1) / s11;
}

Exercise 2.6: Find the Association Bug Debug

This code finds the nearest neighbor match but has a critical bug. The robot keeps matching to wrong landmarks. Click the buggy line.

function nearestNeighbor(z, landmarks, S, gate) {
  let bestIdx = -1;
  let bestDist = Infinity;
  for (let j = 0; j < landmarks.length; j++) {
    const nu = innovate(z, landmarks[j]);
    const d2 = nu[0]*nu[0] + nu[1]*nu[1];
    if (d2 < bestDist && d2 < gate) {
      bestDist = d2; bestIdx = j;
    }
  }
  return bestIdx;
}

Show explanation

Line 6 is the bug. It computes the Euclidean distance (nu[0]² + nu[1]²) instead of the Mahalanobis distance (nu^T S^-1 nu). Without scaling by S^-1, the function treats range and bearing innovations as equally weighted — but range is in meters and bearing is in radians. A 0.5m range error and a 0.5 rad bearing error are very different levels of surprise!

The fix: const d2 = nu[0]*nu[0]/S[0][0] + nu[1]*nu[1]/S[1][1]; (assuming diagonal S).

Chapter 3: Loop Closure

Your robot has been driving for 10 minutes, mapping as it goes. Odometry drift has been accumulating — small errors in each motion step add up to meters of error. Then the robot recognizes a place it visited at the start. This loop closure event is the single most powerful source of information in SLAM: it constrains the accumulated drift to be consistent with the re-observation.

Odometry drift model (random walk):
Position error after k steps ≈ σ_step · √k    // if errors are i.i.d.

Worst case (systematic bias):
Position error after k steps ≈ ε · k    // linear drift

Loop closure constraint:
x_end ≈ x_start + Δz_closure    // relative measurement between start and end

Loop closure is a global constraint. Without loop closure, odometry drift makes the map grow increasingly wrong — landmarks placed at step 100 might be 1 meter off from landmarks placed at step 1. The closure observation says "these two positions are the same place," which means ALL intermediate poses and landmarks must be adjusted to make the trajectory consistent. This is why closing the loop can dramatically improve the entire map at once.

Exercise 3.1: Linear Drift Derive

A robot moves with 0.1 m/step drift (systematic error). After 100 steps, what is the accumulated position error?

meters

Show derivation

Error = ε × k = 0.1 × 100 = 10.0 m

10 meters of drift after 100 steps! For a warehouse robot moving 0.5 m/step, that's only 50 meters of travel. Without loop closure or absolute positioning, the map is unusable after a few minutes.

Exercise 3.2: Random Walk Drift Derive

If each step has independent random error with σ = 0.05 m, what is the expected position error (1σ) after 400 steps?

meters

Show derivation

Variance after k steps = k × σ² = 400 × 0.0025 = 1.0

Standard deviation = √1.0 = 1.0 m

Random walk drift grows as √k, not k — much slower than systematic drift. This is why wheel encoders with zero-mean noise accumulate error slower than sensors with a constant bias. But even √k diverges eventually; you always need loop closure or external corrections for long-duration navigation.

Exercise 3.3: Loop Closure Error Distribution Derive

A robot traverses a square loop (100 steps per side, 400 total). Systematic drift is 0.02 m/step in the forward direction only. When it returns to the start, what is the position error? (Hint: drift accumulates in the direction of travel, which changes at each corner.)

meters

Show derivation

Forward drift per side: 0.02 × 100 = 2.0 m. But on a square loop, the drift directions are:

Side 1 (east): drift = (+2, 0)

Side 2 (north): drift = (0, +2)

Side 3 (west): drift = (-2, 0)

Side 4 (south): drift = (0, -2)

Total: (0, 0) — the drifts cancel!

On a closed loop with constant forward bias, the drift cancels if the robot turns enough. This is a special case. In practice, heading drift (which rotates the direction of subsequent drift) prevents this cancellation and is usually the dominant error source. A 1° heading bias causes the entire trajectory to curve, producing meters of closure error.

Exercise 3.4: Heading Drift Impact Derive

Now consider heading drift: each step adds 0.01 rad of heading error (~0.57°). After 100 straight-line steps of 0.5 m each, by how much has the lateral (perpendicular) position drifted? (Approximate: lateral error ≈ distance × sin(accumulated heading error).)

meters

Show derivation

Each step k has accumulated heading error δθ = 0.01k rad. The lateral displacement at step k is approximately 0.5 × sin(0.01k) ≈ 0.5 × 0.01k = 0.005k (small angle).

Total lateral drift ≈ ∑_k=1¹⁰⁰ 0.005k = 0.005 × 100 × 101 / 2 = 0.005 × 5050 = 25.25 m

Using the exact sin: ∑ 0.5 sin(0.01k) for k=1..100 ≈ 24.5 m. The heading drift causes massive lateral error — nearly 50% of the total distance traveled (50m). This is why gyroscope calibration and heading correction are critical for odometry-based navigation.

Exercise 3.5: Loop Closure Correction Trace

After loop closure, how should the accumulated error be distributed among the intermediate poses?

Correct only the final pose — move it to match the closure constraint Correct only the first and last poses Distribute the correction proportionally across all intermediate poses — each pose absorbs a fraction of the total error, weighted by its uncertainty Discard all intermediate poses and restart mapping from the closure point

Show explanation

The correct approach distributes the error across all poses proportionally to their uncertainty. In EKF-SLAM, this happens automatically through the covariance — the Kalman gain allocates corrections based on how uncertain each state element is. In graph SLAM (Chapter 4), this is done by pose graph optimization, which minimizes the total weighted error across all constraints.

Correcting only the final pose would create a discontinuity — landmarks mapped near the end would suddenly jump. Distributing smoothly preserves the local consistency of the map.

Chapter 4: Graph SLAM

EKF-SLAM maintains a single giant state vector and covariance. Graph SLAM takes a different approach: it stores each robot pose as a node and each constraint (odometry, observation, loop closure) as an edge. The full map is recovered by optimizing the entire graph at once. This is more flexible, more scalable, and handles nonlinearity better.

Pose graph structure:
Nodes: x₀, x₁, ..., x_T (robot poses at each timestep)
Edges: z_ij = measured relative transform from pose i to pose j

Information form (dual of covariance):
Ω = Σ^-1 // information matrix (inverse covariance)
ξ = Σ^-1 μ // information vector

Key insight: Ω is sparse!
An edge between pose i and j only adds entries at (i,i), (i,j), (j,i), (j,j) in Ω.

Covariance is dense, information is sparse. In the covariance form (Σ), every landmark becomes correlated with every other — the matrix is fully dense. In the information form (Ω = Σ^-1), only directly connected variables have nonzero entries. This sparsity is why graph SLAM can handle millions of poses while EKF-SLAM chokes at hundreds of landmarks.

Exercise 4.1: Information Matrix Structure Derive

3 poses (1D for simplicity). Odometry constraints: x₀→x₁ and x₁→x₂. Loop closure: x₀→x₂. Each constraint has information ω = 1. The prior on x₀ adds ω₀ = 10 to Ω[0,0]. How many nonzero entries does Ω (3×3) have?

nonzero entries

Show derivation

Each constraint between i and j adds entries at (i,i), (i,j), (j,i), (j,j):

x₀→x₁: affects (0,0), (0,1), (1,0), (1,1)

x₁→x₂: affects (1,1), (1,2), (2,1), (2,2)

x₀→x₂: affects (0,0), (0,2), (2,0), (2,2)

Prior on x₀: affects (0,0)

Every entry in the 3×3 matrix gets at least one contribution. Ω[0,0] gets contributions from three sources (odo01 + loop02 + prior), Ω[1,1] gets two (odo01 + odo12), etc. All 9 entries are nonzero because the loop closure connects x₀ to x₂, filling in the corners. Without the loop closure, Ω[0,2] and Ω[2,0] would be zero (tridiagonal structure).

Exercise 4.2: Build Ω by Hand Derive

Same setup: 3 poses, ω = 1 for each constraint, prior ω₀ = 10. Each edge (i,j) with information ω adds: +ω to Ω[i,i], +ω to Ω[j,j], -ω to Ω[i,j] and Ω[j,i]. What is Ω[0,0]?

Show derivation

Ω[0,0] = ω_prior + ω_odo01 + ω_loop02 = 10 + 1 + 1 = 12

The full matrix:
Ω = [[12, -1, -1], [-1, 2, -1], [-1, -1, 2]]
The diagonal entries are the sum of all information connected to that pose. The off-diagonal entries are the negated information of the direct constraint. This structure makes Ω diagonally dominant — a property that ensures numerical stability when solving the linear system.

Exercise 4.3: Solve the System Derive

Given Ω from above and ξ = [10, 1, 3]^T (information vector), solve μ = Ω^-1ξ for μ[0] (use the fact that x₀ is well-constrained with prior=10, so μ[0] ≈ ξ[0]/Ω[0,0]).

(approximate)

Show derivation

Solving Ωμ = ξ for [[12,-1,-1],[-1,2,-1],[-1,-1,2]]μ = [10,1,3] via Cramer's rule:

det(Ω) = 12(2×2 - (-1)(-1)) - (-1)((-1)×2 - (-1)(-1)) + (-1)((-1)(-1) - 2×(-1))

= 12(4-1) - (-1)(-2-1) + (-1)(1+2) = 36 - 3 - 3 = 30

For μ[0], replace column 0 with ξ:

det = 10(4-1) - (-1)(1×2 - (-1)×3) + (-1)(1×(-1) - 2×3) = 30 + 5 + 7 = 42

μ[0] = 42 / 30 = 1.4

The exact solution is μ = [1.4, 2.0, 2.6]. The strong prior (ω=10) on x₀ pulls it toward ξ[0]/Ω[0,0] = 10/12 ≈ 0.83, but the odometry and loop closure constraints push it higher. The full solution balances all constraints simultaneously.

Exercise 4.4: Sparsity Ratio Derive

A robot traverses 1000 poses in a straight line (chain graph). Each pose is 3D (3 elements). Ω is block-tridiagonal. What fraction of the 3000×3000 matrix is nonzero? (Count nonzero blocks, not individual entries.)

A chain of 1000 nodes has 999 edges. Each edge creates 2 off-diagonal blocks, plus diagonal blocks. Count total nonzero 3×3 blocks.

percent nonzero

Show derivation

Diagonal blocks: 1000 (one per pose)

Off-diagonal blocks: 2 × 999 = 1998 (upper and lower triangle)

Total nonzero blocks = 1000 + 1998 = 2998

Total possible blocks = 1000 × 1000 = 1,000,000

Sparsity = 2998 / 1,000,000 = 0.2998% ≈ 0.3%

Only 0.3% of the matrix is nonzero! This extreme sparsity is what makes graph SLAM tractable at large scale. Sparse Cholesky solvers exploit this structure to solve the optimization in O(N) time for chain graphs, compared to O(N³) for dense systems.

Exercise 4.5: Impact of Loop Closures on Sparsity Trace

The same 1000-pose chain now has 5 loop closures (e.g., pose 0↔500, 100↔600, etc.). How does this affect the information matrix?

It becomes dense — any loop closure destroys all sparsity It adds only 10 more nonzero blocks (5 edges × 2 off-diagonal blocks each) — sparsity is barely affected, but the graph becomes harder to order for Cholesky due to fill-in It removes the tridiagonal structure entirely It has no effect on Ω

Show explanation

Each loop closure adds exactly one edge to the graph, which means 2 additional nonzero off-diagonal blocks in Ω. Five loop closures add only 10 blocks to the existing 2998 — the matrix stays 99.7% sparse.

However, these long-range connections create fill-in during Cholesky factorization. A good variable ordering (like COLAMD or nested dissection) minimizes this fill-in, keeping the solve fast. This is why graph SLAM implementations spend significant effort on variable ordering — it determines whether the solve takes milliseconds or seconds.

Exercise 4.6: Implement buildOmega() Build

Build a 1D information matrix for a chain of poses with optional loop closures.

Return a flat array of n*n elements (row-major).

Show solution

javascript
function buildOmega(n, edges) {
  const O = new Array(n * n).fill(0);
  for (const e of edges) {
    O[e.i * n + e.i] += e.omega;
    O[e.j * n + e.j] += e.omega;
    O[e.i * n + e.j] -= e.omega;
    O[e.j * n + e.i] -= e.omega;
  }
  return O;
}

Chapter 5: Visual Odometry

Instead of wheel encoders, you estimate motion from camera images. Two consecutive frames share visible features — by tracking how those features moved in the image, you can recover the camera's 3D rotation and translation. The key mathematical object is the essential matrix E, which encodes the geometric relationship between two views.

Epipolar constraint:
x'^T E x = 0

where x, x' are corresponding points in normalized image coordinates,
E = [t]_× R (translation cross-product matrix times rotation)

Essential matrix properties:
E ∈ ℝ^3×3, rank 2, 5 DOF
(3 for rotation + 3 for translation - 1 for scale ambiguity = 5)

Scale ambiguity. The essential matrix can only recover translation direction, not magnitude. A camera that moved 1 meter seeing objects 10 meters away looks identical (in terms of feature displacement) to a camera that moved 10 meters seeing objects 100 meters away. This is why monocular visual odometry always drifts in scale — you need stereo cameras, IMU, or known object sizes to resolve it.

Exercise 5.1: Essential Matrix DOF Trace

The essential matrix E is 3×3 (9 entries). Why does it have only 5 degrees of freedom?

Because 4 entries are always zero Because 4 entries are redundant due to symmetry Rotation has 3 DOF, translation has 3 DOF, but E is defined only up to scale (det constraint + 2 singular value constraints remove 4 DOF from 9), yielding 5 Because it is symmetric

Show explanation

E = [t]_×R has 6 underlying parameters (3 rotation + 3 translation). But scaling E by any constant c gives cE, which satisfies the same epipolar constraints (x'^T(cE)x = 0 iff x'^TEx = 0). This removes 1 DOF, leaving 5.

Algebraically, E has two equal nonzero singular values and one zero singular value. This imposes constraints on the 9 entries: the essential matrix manifold is a 5-dimensional surface in ℝ⁹.

Exercise 5.2: Minimum Points Derive

Each point correspondence x ↔ x' gives one equation (x'^TEx = 0). Since E has 5 DOF and we solve up to scale (so effectively need to determine 5 ratios), what is the minimum number of point correspondences needed to solve for E?

points

Show derivation

5 DOF → 5 equations → 5 point correspondences minimum

This is the 5-point algorithm (Nister, 2004). Each correspondence provides one linear equation in the entries of E. Five correspondences give 5 equations for 5 unknowns (after fixing the scale). In practice, the 5-point algorithm uses polynomial solvers and can return up to 10 solutions, which are disambiguated using additional correspondences or geometric checks.

Exercise 5.3: Epipolar Constraint Check Derive

Given E = [[0, -1, 0], [1, 0, -0.5], [0, 0.3, 0]] and correspondences x = [0.5, 0.3, 1]^T, x' = [0.6, 0.2, 1]^T. Compute x'^TEx. Is the epipolar constraint satisfied?

x'^TEx value

Show derivation

Ex = [[0,-1,0],[1,0,-0.5],[0,0.3,0]] · [0.5, 0.3, 1]^T

= [0×0.5 + (-1)×0.3 + 0×1, 1×0.5 + 0×0.3 + (-0.5)×1, 0×0.5 + 0.3×0.3 + 0×1]

= [-0.3, 0.0, 0.09]^T

x'^T(Ex) = 0.6×(-0.3) + 0.2×0.0 + 1×0.09 = -0.18 + 0 + 0.09 = -0.09

The result is -0.09, not zero. This means the correspondence either has noise or doesn't perfectly satisfy the epipolar constraint for this E. In RANSAC, you'd compare |x'^TEx| against a threshold (typically derived from the Sampson distance). A residual of 0.09 is moderate — whether it's an inlier depends on the noise level and the threshold you set.

Exercise 5.4: Fundamental vs Essential Trace

The fundamental matrix F relates pixel coordinates: p'^TFp = 0. The essential matrix E relates normalized coordinates: x'^TEx = 0. Given the camera intrinsic matrix K, how are they related?

F = E (they're the same thing) E = K'^T F K (since x = K^-1p and x' = K'^-1p') F = K E K^T F = E + K

Show explanation

Normalized coordinates: x = K^-1p. Substituting into x'^TEx = 0: (K'^-1p')^T E (K^-1p) = p'^T K'^-T E K^-1 p = 0. So F = K'^-TEK^-1, or equivalently E = K'^TFK.

When both cameras have the same intrinsics (K' = K), this simplifies to E = K^TFK. The essential matrix removes the camera-specific calibration, leaving only the pure geometric relationship. This is why calibrated cameras should use E (5 DOF, more constrained) while uncalibrated cameras must use F (7 DOF).

Exercise 5.5: Translation Scale Derive

A monocular VO system estimates translation direction t̂ = [0.8, 0, 0.6]^T (unit vector). A stereo system with 10cm baseline resolves the scale: actual translation = 2.5 meters. What is the full translation vector?

t_x (meters)

Show derivation

t = ||t|| × t̂ = 2.5 × [0.8, 0, 0.6]^T = [2.0, 0, 1.5]^T

The monocular system gives the direction [0.8, 0, 0.6] but cannot determine the scale factor 2.5. The stereo baseline (or IMU, or known object size) provides the missing scale. Without it, all distances are relative — the VO trajectory could be scaled by any positive constant and still be consistent with the observations.

Chapter 6: Feature Matching & RANSAC

You've extracted 500 feature matches between two images. But 40% are wrong — mismatches from repeated textures, occlusions, or detector noise. If you feed these to the 5-point algorithm, the outliers will corrupt the essential matrix. You need RANSAC (Random Sample Consensus) — sample minimal subsets, fit a model, count inliers, and keep the best model.

RANSAC iteration formula:
n = log(1 - p) / log(1 - (1 - ε)^s)

where:
n = number of iterations needed
p = desired probability of finding an outlier-free sample (e.g., 0.99)
ε = outlier ratio
s = minimal sample size (e.g., 5 for the 5-point algorithm)

RANSAC grows exponentially with outlier ratio. With 10% outliers and s=5, you need ~6 iterations. With 50% outliers and s=5, you need ~145 iterations. With 90% outliers, you need ~460,000 iterations. This exponential scaling is why keeping the outlier ratio low (through good feature detection and matching) is more important than a clever RANSAC variant.

Exercise 6.1: RANSAC Iterations (50% inliers) Derive

50% inliers (ε = 0.5), 5-point model (s = 5), 99% confidence (p = 0.99). How many RANSAC iterations needed?

iterations

Show derivation

(1 - ε)^s = 0.5⁵ = 0.03125

1 - (1 - ε)^s = 1 - 0.03125 = 0.96875

n = log(1 - 0.99) / log(0.96875) = log(0.01) / log(0.96875)

= (-4.605) / (-0.03174) = 145.1 → 146 iterations

Each iteration: sample 5 random correspondences, compute E, count inliers (points with |x'^TEx| < threshold). After 146 iterations, there's a 99% chance at least one sample was all-inlier. With 500 correspondences, each iteration takes microseconds — 146 iterations is fast.

Exercise 6.2: RANSAC Iterations (80% inliers) Derive

Same setup but with better matching: 80% inliers (ε = 0.2), s = 5, p = 0.99. How many iterations?

iterations

Show derivation

(1 - ε)^s = 0.8⁵ = 0.32768

n = log(0.01) / log(1 - 0.32768) = (-4.605) / (-0.3979) = 11.6 → 12 iterations

With 80% inliers instead of 50%, the iteration count drops from 146 to just 12 — a 12× reduction. This is why feature matching quality matters more than RANSAC sophistication. Better descriptors (SuperPoint, DISK) with better matchers (SuperGlue, LightGlue) can push inlier rates above 90%, making RANSAC nearly instantaneous.

Exercise 6.3: 8-Point vs 5-Point Derive

The 8-point algorithm needs s=8 samples. At ε=0.5 and p=0.99, how many iterations does it need? Compare to the 5-point result.

iterations

Show derivation

(1 - 0.5)⁸ = 0.5⁸ = 0.00390625

n = log(0.01) / log(1 - 0.00390625) = (-4.605) / (-0.003914) = 1177

1177 vs 146 — the 8-point algorithm needs 8× more RANSAC iterations at 50% inliers. Each additional sample point in the minimal set multiplies the iteration count because it's exponentially harder to draw a clean sample. This is why the 5-point algorithm (despite being algebraically harder) is preferred in practice: fewer samples = fewer RANSAC iterations = faster runtime.

Exercise 6.4: Implement ransacIterations() Build

Write a function that computes the number of RANSAC iterations needed.

n = log(1-p) / log(1 - w^s), then Math.ceil().

Show solution

javascript
function ransacIterations(inlierRatio, sampleSize, confidence) {
  const w = Math.pow(inlierRatio, sampleSize);
  return Math.ceil(Math.log(1 - confidence) / Math.log(1 - w));
}

Exercise 6.5: Find the RANSAC Bug Debug

This RANSAC loop has a subtle bug that causes it to sometimes return a model fitted to outliers. Click the buggy line.

function ransac(points, fitModel, countInliers, n) {
  let bestModel = null;
  let bestCount = 0;
  for (let i = 0; i < n; i++) {
    const sample = randomSample(points, 5);
    const model = fitModel(sample);
    const count = countInliers(model, sample);
    if (count > bestCount) {
      bestCount = count; bestModel = model;
    }
  }
  return bestModel;
}

Show explanation

Line 7 is the bug. It counts inliers against the sample (the 5 sampled points) instead of against ALL points. The model is fitted to exactly the sample, so it will always have 5 "inliers" from the sample — even if the model is complete nonsense for the rest of the data. The fix: countInliers(model, points). RANSAC's power comes from evaluating each hypothesis against the full dataset, not just the minimal sample used to generate it.

Exercise 6.6: Inlier Threshold Trace

The RANSAC inlier threshold τ determines what counts as an inlier. For the epipolar constraint (x'^TEx = 0), what happens if τ is too large vs too small?

Too large or too small both reduce accuracy equally Too large: fewer iterations needed. Too small: more iterations needed Too large: outliers are accepted as inliers, corrupting the model. Too small: true inliers are rejected, reducing the effective inlier count and requiring exponentially more iterations The threshold doesn't matter — RANSAC is robust to it

Show explanation

The inlier threshold is a critical hyperparameter. Too large (τ = 10 pixels): gross outliers sneak in and the "best" model may fit a mixture of good and bad correspondences. Too small (τ = 0.1 pixels): even well-matched points get rejected due to noise, making the effective inlier ratio drop and requiring many more iterations.

A good rule of thumb: set τ = 2-3× the measurement noise standard deviation. For feature matching with sub-pixel accuracy, this is typically 1-3 pixels. For the Sampson distance (a better geometric error than x'^TEx), τ² follows a χ²(1) distribution — set τ² = 3.84 for 95% confidence.

Chapter 7: Factor Graphs

A factor graph is a bipartite graph with two types of nodes: variable nodes (the unknowns: poses, landmarks) and factor nodes (the constraints: odometry, observations, priors). It's a more general representation than a pose graph — it can represent any probabilistic inference problem, and modern SLAM backends (GTSAM, Ceres, g2o) are all factor graph solvers.

Factor graph joint probability:
p(x) ∝ ∏_k f_k(x_k)

where x_k ⊆ x is the subset of variables connected to factor k.

For Gaussian factors (standard in SLAM):
f_k(x_k) = exp(-½ ||h_k(x_k) - z_k||²_{Σ_k})

MAP estimation = nonlinear least squares:
x* = argmin_x ∑_k ||h_k(x_k) - z_k||²_{Σ_k}

Factor graphs separate structure from algebra. The graph structure tells you which variables interact. The factor functions tell you how. This separation is powerful: you can change from EKF to smoothing to incremental updates (iSAM2) without changing the graph structure — only the solver changes.

Exercise 7.1: Factor Count Derive

A robot moves through 5 poses, observing 3 landmarks. It has: 1 prior factor on pose 0, 4 odometry factors (between consecutive poses), and 6 observation factors (each landmark seen from 2 poses). How many total factors and variables?

total factors

Show derivation

Prior factors: 1

Odometry factors: 4

Observation factors: 6

Total factors = 1 + 4 + 6 = 11

Variable nodes: 5 poses + 3 landmarks = 8

The system is over-determined: 11 factors constraining 8 variables (each with multiple DOF). This redundancy is what makes the estimate robust — if one factor is noisy, the others compensate.

Exercise 7.2: Variable Elimination Order Trace

In a chain of 5 poses (x₀-x₁-x₂-x₃-x₄), eliminating a variable means marginalizing it out, which connects all its remaining neighbors. If you eliminate x₂ first, what new edge is created?

x₀ — x₄ x₀ — x₃ x₁ — x₃ (the neighbors of x₂ become directly connected) No new edges — elimination only removes edges

Show explanation

When you eliminate x₂, all its neighbors must become connected (this is the fill-in caused by elimination). x₂'s neighbors are x₁ and x₃, so a new edge x₁—x₃ is created. The chain becomes x₀-x₁-x₃-x₄ with the extra x₁—x₃ edge.

For a chain graph, the optimal elimination order is from one end to the other (e.g., x₀, x₁, x₂, x₃, x₄). This produces zero fill-in because each node has only one remaining neighbor when eliminated. Eliminating from the middle causes fill-in, making the computation more expensive.

Exercise 7.3: Fill-in Count Derive

In a star graph (central node c connected to leaves a, b, d, e — no edges between leaves), how many fill-in edges are created by eliminating c first?

fill-in edges

Show derivation

c has 4 neighbors: a, b, d, e

Eliminating c connects all pairs: C(4,2) = 4! / (2! × 2!) = 6

Fill-in edges: a-b, a-d, a-e, b-d, b-e, d-e

Eliminating the central node first is the worst possible choice — it creates maximum fill-in, turning the sparse star into a dense clique. The optimal order eliminates leaves first (zero fill-in per leaf), then the center last. This principle generalizes: always eliminate nodes with few connections first. Variable ordering algorithms like COLAMD automate this heuristic.

Exercise 7.4: Degrees of Freedom Derive

A 2D SLAM factor graph has 100 poses (each 3 DOF: x, y, θ), 50 landmarks (each 2 DOF), 99 odometry factors (each constrains 3 DOF), 1 prior factor (3 DOF), and 200 observation factors (each constrains 2 DOF). How many total constraint DOF does the system have?

constraint DOF

Show derivation

Odometry: 99 × 3 = 297

Prior: 1 × 3 = 3

Observations: 200 × 2 = 400

Total constraint DOF = 297 + 3 + 400 = 700

Variable DOF = 100 × 3 + 50 × 2 = 400

Redundancy = 700 - 400 = 300 (heavily over-determined)

The system is over-determined by 300 DOF — there are 1.75× more constraints than unknowns. This redundancy is what makes the least-squares solution robust to noise. Each additional observation factor adds 2 constraints, further improving the estimate. An under-determined system (fewer constraints than unknowns) would have infinitely many solutions — you'd need more observations or stronger priors.

Exercise 7.5: Arrange the SLAM Pipeline Design

Put these factor graph SLAM steps in the correct order.

→

Add prior on x₀ Add odometry factor Add observation factors Data association Optimize graph (least squares)

Show explanation

The correct order: Prior → Odometry → Data association → Observation factors → Optimize

First, anchor the graph with a prior on the initial pose. Then add the odometry constraint for the new motion. Before adding observations, perform data association to determine which landmark each measurement corresponds to (or if it's a new landmark). Then add the observation factors with the correct landmark IDs. Finally, optimize the entire graph to get the MAP estimate.

Chapter 8: Occupancy Grid Maps

SLAM tells you where landmarks are. But for navigation, you need to know where the robot can and cannot go. An occupancy grid divides the world into cells, each storing the probability that it's occupied. The key trick is using log-odds to make updates additive instead of multiplicative.

Log-odds representation:
l = log(p / (1 - p)) // logit function
p = 1 / (1 + e^-l) // sigmoid (inverse logit)

Bayesian update in log-odds:
l_new = l_old + log(p_sensor / (1 - p_sensor)) - log(p_prior / (1 - p_prior))

With uniform prior (p_prior = 0.5, l_prior = 0):
l_new = l_old + log(p_sensor / (1 - p_sensor))

Log-odds make life simple. In probability space, Bayesian updates require multiplication and renormalization. In log-odds space, updates are just addition. This is not only computationally cheaper but also numerically stable — probabilities near 0 or 1 cause floating-point issues, while log-odds handle extreme values gracefully (l = ±10 represents p ≈ 0.99995).

Exercise 8.1: Log-Odds Conversion Derive

Convert p = 0.9 to log-odds l.

log-odds

Show derivation

l = log(p / (1 - p)) = log(0.9 / 0.1) = log(9) = 2.197 ≈ 2.20

Reference values: p=0.5 → l=0, p=0.73 → l=1, p=0.88 → l=2, p=0.95 → l=3, p=0.99 → l=4.6.

Exercise 8.2: Sequential Updates Derive

Starting from l = 0 (p = 0.5 prior), observe the cell as occupied (p_sensor = 0.9), then observe it as free (p_sensor = 0.3). What is the final log-odds and probability?

final probability

Show derivation

Initial: l = 0

Update 1 (occupied, p=0.9): l = 0 + log(0.9/0.1) = 0 + 2.197 = 2.197

Update 2 (free, p=0.3): l = 2.197 + log(0.3/0.7) = 2.197 + (-0.847) = 1.350

Final p = 1 / (1 + e^-1.350) = 1 / (1 + 0.259) = 1 / 1.259 ≈ 0.79

One occupied and one free observation leaves the cell more likely occupied (p ≈ 0.79) because the occupied sensor (p=0.9, contributing +2.20 log-odds) was more confident than the free sensor (p=0.3, contributing -0.85 log-odds). The occupied observation "won" by a margin of 1.35 log-odds. To bring the cell back to 50/50, you'd need about 2.6 more free observations at p=0.3 each.

Exercise 8.3: How Many Observations to Be Sure? Derive

Starting from l=0, how many consecutive "occupied" observations (each with p_sensor = 0.7) are needed to reach p ≥ 0.95?

Each observation adds l_obs = log(0.7/0.3) ≈ 0.847 to the log-odds. Need l ≥ log(0.95/0.05) = log(19) ≈ 2.944.

observations

Show derivation

l_obs = log(0.7 / 0.3) = log(2.333) = 0.847

Need: n × 0.847 ≥ 2.944

n ≥ 2.944 / 0.847 = 3.47 → n = 4

Check: 4 × 0.847 = 3.389 → p = sigmoid(3.389) = 0.967 ≥ 0.95 ✓

With a relatively weak sensor (70% accuracy), you need 4 consistent observations to reach 95% confidence. A stronger sensor (p=0.9) would need only 2 observations (2 × 2.197 = 4.394, p = 0.988). This is why LiDAR-based mapping (very confident per-ray) converges faster than camera-based mapping (less certain per pixel).

Exercise 8.4: Grid Resolution Tradeoff Derive

A 100m × 100m environment mapped at 5cm resolution. How many grid cells? How much memory at 1 byte per cell?

million cells

Show derivation

Cells per side = 100 / 0.05 = 2000

Total cells = 2000 × 2000 = 4,000,000 = 4M cells

Memory = 4M × 1 byte = 4 MB

4 MB is manageable. But at 1cm resolution: 10,000 × 10,000 = 100M cells = 100 MB. And in 3D at 1cm: 10,000³ = 10¹² cells = 1 TB — impossible. This is why 3D maps use octrees (only allocate cells near surfaces), point clouds, or signed distance fields instead of dense grids.

Exercise 8.5: Implement logOddsUpdate() Build

Write a function that performs a log-odds occupancy update and returns the new probability.

Update: l_new = l_old + log(p/(1-p)). Convert back: p = 1/(1+exp(-l)).

Show solution

javascript
function logOddsUpdate(currentLogOdds, sensorProb) {
  const update = Math.log(sensorProb / (1 - sensorProb));
  const newL = currentLogOdds + update;
  const newP = 1 / (1 + Math.exp(-newL));
  return [newL, newP];
}

Exercise 8.6: Clamping Log-Odds Trace

In practice, log-odds values are clamped to a range like [-5, 5] (corresponding to p ∈ [0.0067, 0.9933]). Why is clamping necessary?

To save memory — smaller numbers fit in fewer bits Without clamping, a cell observed as occupied 100 times would require 100+ free observations to flip back — the map becomes unable to adapt to a dynamic environment where objects move It prevents integer overflow It makes the map look better visually

Show explanation

In a static world, unclamped log-odds are fine — cells converge to their true occupancy. But in the real world, objects move. A chair observed 50 times at location A has l=50×2.2 = 110 (p≈1). When the chair moves, clearing it requires 50+ free observations to bring l back below 0. With clamping at l=5, you need at most ~6 free observations to flip the cell — the map responds to changes in seconds instead of minutes.

Chapter 9: Capstone: Full SLAM Pipeline

You're building a mobile robot with a 2D LiDAR that needs to explore and map an indoor office (50m × 50m). Design the complete SLAM system: state representation, motion model, measurement model, data association, loop closure, and map output. Each exercise in this capstone integrates concepts from the previous chapters.

This is the integration test. Each previous chapter taught one component in isolation. Now you put them all together. A real SLAM system is exactly this sequence of design decisions, each affecting the others. Get the state representation wrong and the Jacobians break. Get data association wrong and loop closure fails. Get the map resolution wrong and path planning is impossible.

Exercise 9.1: State Vector Design Derive

Your robot uses a graph-based SLAM approach. After a 5-minute exploration at 10 Hz (10 poses/second), with keyframe selection keeping every 10th pose, and the LiDAR detecting an average of 2 new landmarks per keyframe: how many variables (poses + landmarks) are in the factor graph?

variables

Show derivation

Total poses at 10 Hz for 300s = 3000

Keyframes = 3000 / 10 = 300

Landmarks = 300 × 2 = 600

Total variables = 300 + 600 = 900

In 2D, each pose has 3 DOF and each landmark 2 DOF, so the total DOF is 300×3 + 600×2 = 900 + 1200 = 2100. The information matrix is 2100×2100 but extremely sparse — easily solvable in real-time with modern sparse solvers like GTSAM's iSAM2.

Exercise 9.2: Motion Model Noise Derive

Your robot uses differential-drive odometry. The wheels have 1% distance error and 0.5°/m heading error. For a 0.5m straight-line motion: what is the position uncertainty σ_pos and heading uncertainty σ_θ?

σ_pos (meters)

Show derivation

σ_pos = 1% × 0.5 m = 0.005 m = 5 mm

σ_θ = 0.5°/m × 0.5 m = 0.25° = 0.00436 rad

These per-step errors seem tiny, but they accumulate. After 100 steps (50m): σ_pos ≈ 0.005 × √100 = 0.05m (random walk). The heading error is more dangerous: σ_θ ≈ 0.00436 × √100 = 0.0436 rad ≈ 2.5°. A 2.5° heading error over the next 50m of travel causes ~2.2m of lateral drift. This is why heading estimation is the bottleneck in odometry-based SLAM.

Exercise 9.3: LiDAR Observation Rate Derive

Your LiDAR has 360° FOV, 1° angular resolution, 12m max range. In a typical office, about 30% of rays hit a wall/obstacle within range. How many valid range readings per scan? If each landmark is a distinct corner detected from clustered range discontinuities, and corners are found every ~20 valid rays on average, how many landmarks per scan?

landmarks per scan

Show derivation

Total rays = 360 / 1 = 360

Valid rays = 360 × 0.3 = 108

Corners (landmarks) = 108 / 20 = 5.4

About 5 landmarks per scan is typical for indoor environments. This means your factor graph gets ~5 observation factors per keyframe. Over 300 keyframes, that's ~1500 observation factors — providing dense constraints that make the optimization well-conditioned.

Exercise 9.4: Map Memory Budget Derive

The 50m×50m office is mapped as an occupancy grid at 2cm resolution. Each cell stores an int8 log-odds value. How much memory for the grid? If you also store the factor graph with 900 variables (2100 DOF total) as double-precision floats for the mean vector and sparse information matrix (~3 nonzero blocks per variable, each 3×3 or 2×2), approximately how much total memory for the SLAM state?

MB (grid only)

Show derivation

Grid: (50/0.02) × (50/0.02) = 2500 × 2500 = 6,250,000 cells

At 1 byte each: 6.25 MB

Factor graph mean: 2100 doubles × 8 bytes = 16.8 KB

Sparse Ω: ~900 × 3 blocks × ~6 entries/block × 8 bytes ≈ 130 KB

Total: ~6.25 MB (grid dominates)

The occupancy grid dominates memory by 50×. The factor graph itself is tiny. This is why SLAM research focuses on efficient map representations — the map is the bottleneck, not the pose graph. At 1cm resolution, the grid would be 25 MB; at 0.5cm, 100 MB.

Exercise 9.5: Loop Closure Detection Trace

Your robot has been exploring for 3 minutes and returns to the lobby. The LiDAR scan looks similar to the one from minute 0. What sequence of checks should you perform before accepting this as a loop closure?

Just match the scans — if ICP converges, accept it Check scan similarity, but only accept if the robot is within 1 meter of the original position (1) Place recognition to identify candidate matches, (2) geometric verification via scan matching/ICP, (3) Mahalanobis gate test on the relative pose constraint, (4) consistency check — does this closure agree with the current graph estimate? Use GPS to confirm the position match

Show explanation

Loop closure verification requires multiple independent checks because a false positive is catastrophic (Ch. 2):

(1) Place recognition: Scan descriptors (scan context, BoW) identify candidate locations that "look like" the current scan. Fast but noisy — gives candidates, not confirmations.

(2) Geometric verification: ICP or scan matching computes the precise relative transform. If the alignment error is high, reject.

(3) Mahalanobis gate: Is the relative transform consistent with what the current pose graph predicts? If the graph says we should be 10m away but the closure says 0m, something is wrong.

(4) Consistency check: Does adding this constraint make the overall graph residual increase dramatically? If so, either this is a false closure or the graph has other errors that need resolution first.

Exercise 9.6: Arrange the Full SLAM Pipeline Design

Put these components of a complete SLAM system in the correct processing order for each new sensor frame.

→

Motion prediction (odometry) Feature extraction Data association Loop closure detection Graph optimization Update occupancy grid

Show explanation

The correct order: Motion prediction → Feature extraction → Data association → Loop closure detection → Graph optimization → Update occupancy grid

(1) Predict the new pose from odometry. (2) Extract features/landmarks from the new scan. (3) Match features to known landmarks (data association). (4) Check if this location has been visited before (loop closure — uses the current scan and map). (5) Add all new factors and optimize the graph. (6) With the optimized poses, ray-trace through the scans to update the occupancy grid.

The map update comes LAST because it depends on the optimized poses — updating the map with pre-optimization poses would project scans to wrong locations.

Exercise 9.7: System Latency Budget Derive

Your SLAM system must run at 10 Hz (100ms budget per frame). Typical timings: odometry prediction 0.5ms, feature extraction 15ms, data association 5ms, loop closure check 10ms (amortized), graph optimization 30ms (incremental), map update 20ms. Total? Does it fit?

ms total

Show derivation

Total = 0.5 + 15 + 5 + 10 + 30 + 20 = 80.5 ms

Budget = 100 ms → 80.5 < 100 ✓ (19.5 ms headroom)

It fits with 19.5ms of headroom — about 20% margin. In practice, graph optimization can spike during loop closures (full relinearization instead of incremental), and feature extraction varies with scene complexity. The 20% margin is tight but workable. If it overruns, common optimizations: (1) run loop closure in a separate thread, (2) use iSAM2 for incremental optimization, (3) update the map at a lower rate (every 5th frame).

The proof of work. If you completed every exercise in this workbook — computed Jacobians, debugged data association, derived RANSAC bounds, built occupancy grids, and designed a full SLAM pipeline — you understand the mathematical machinery that makes robots navigate autonomously. These are the exact calculations that robotics engineers work through when building real SLAM systems. "What I cannot create, I do not understand."

Related Lessons

Topic	Lesson
Classical SLAM fundamentals	Classical SLAM — From Absolute Zero
Visual-Inertial Odometry	Classical VIO — From Absolute Zero
EKF foundations	Extended Kalman Filter — From Absolute Zero
Bayesian estimation	Bayesian Estimation — From Absolute Zero
Modern SLAM (learning-based)	Modern SLAM — From Absolute Zero