Thrun et al., Chapter 13

Mapping with Unknown Data Association

The EM algorithm for mapping: alternating localization and mapping, grid-based EM, and layered representations.

Prerequisites: Chapters 7-9 (localization, mapping), basic EM concept.
10
Chapters
0
Simulations
10
Quizzes

Chapter 0: The Data Association Problem

All SLAM algorithms in Chapters 10-12 assume known correspondences, or use maximum likelihood to determine them on-the-fly. But what happens when landmarks are genuinely ambiguous? When the robot returns to a place and can't tell which previously seen feature matches which current observation?

The challenge: In a rectangular building with four identical-looking corners, the robot visits each corner but cannot tell them apart. The four corner detections might correspond to 1, 2, 3, or 4 distinct corners. Without resolving this ambiguity, the map is fundamentally wrong. This is the data association problem at its hardest.

This chapter introduces Expectation-Maximization (EM) for mapping — an algorithm that can handle ambiguous correspondences by iterating between localization and mapping. Unlike EKF/EIF SLAM, which commit to a single correspondence hypothesis, EM maintains a probability distribution over correspondences.

The trade-off: EM is a batch algorithm (requires multiple passes over the data) and only finds the most likely map (no uncertainty estimate). But it can handle raw sensor data and ambiguous correspondences that would break Kalman-based approaches.

Check: What problem does EM mapping address that EKF/EIF SLAM cannot?

Chapter 1: EM: The Basic Idea

EM mapping finds the most likely map by alternating two steps:

m* = argmaxm p(m | d)

where d is all data (measurements and controls). Since searching the space of all maps is intractable, EM performs hill climbing:

E-Step (Expectation / Localization): Given the current map estimate m[i], localize the robot at every point in time. Compute a probability distribution over poses for each time step.

M-Step (Maximization / Mapping): Given the pose distributions from the E-step, compute a new map m[i+1] that maximizes the expected log-likelihood.

The chicken-and-egg solution: We can't localize without a map, and we can't map without knowing poses. EM breaks the deadlock: start with any map (even a terrible one from raw odometry), localize against it, build a better map from the localization, then localize again in the improved map, and so on. Each iteration is guaranteed to improve (or maintain) the map likelihood.
Check: What does the E-step of EM mapping compute?

Chapter 2: E-Step: Localization

The E-step runs a full forwards-backwards localization pass over the data. For each time step τ, it computes:

p(sτ | m[i], z1:T, u1:T) ∝ p(sτ | m[i], zτ+1:T, uτ+1:T) · p(sτ | m[i], z1:τ, u1:τ)

The first factor is the backward pass (future data constraining past poses). The second is the forward pass (past data). Combining them gives the best possible pose estimate using ALL data.

Why forwards-backwards? In standard online localization, we only use data up to the current time. But EM is a batch algorithm with access to all data. Data collected after time τ also carries information about the pose at time τ. The backwards pass captures this, producing significantly sharper pose distributions.

The result is a probability distribution Isτ over poses for every time step. This distribution serves as a "soft assignment" for the M-step — each pose is weighted by its probability rather than committed to a single value.

Check: Why does the E-step use a backwards pass in addition to the forward pass?

Chapter 3: M-Step: Mapping

The M-step builds a new map that maximizes the expected log-likelihood under the pose distributions from the E-step:

m[i+1] = argmaxm Στ ∫ Isτ log p(zτ | sτ, m) dsτ

For occupancy grid maps, this decomposes per-cell: each cell's occupancy is determined by the weighted evidence from all times when the robot might have observed it, weighted by the pose probabilities.

The soft assignment advantage: Unlike maximum likelihood SLAM, which commits to single poses and can make hard errors, EM "spreads" each observation across all plausible poses. A measurement might have come from pose A (probability 0.7) or pose B (probability 0.3). Both contribute to the map, proportionally. This soft assignment naturally handles ambiguous correspondences.
Check: What does the M-step maximize?

Chapter 4: Convergence

EM is guaranteed to improve the map likelihood at each iteration (or leave it unchanged at a fixed point). It converges to a local maximum of p(m | d).

In practice, 5-20 iterations typically suffice. The initial map (from raw odometry) is usually terrible, but each iteration produces a noticeably better map. The improvement is often dramatic after just 2-3 iterations.

Local maxima: EM finds a local maximum, not necessarily the global one. In highly symmetric environments (e.g., a circular corridor), EM may converge to a map that's rotated or reflected relative to the true map. In practice, this is rarely a problem for environments with sufficient asymmetry.
IterationMap QualityDescription
0TerribleRaw odometry; drift makes everything misaligned
1RoughMajor structures visible; local alignment improved
2-3GoodWalls are straight; rooms recognizable
5-10ExcellentNear-optimal; convergence nearly complete
Check: What guarantee does EM provide about convergence?

Chapter 5: Grid-Based EM

The E-step requires representing the pose distribution over a 3D space (x, y, θ). Grid-based EM discretizes this on a grid, similar to grid localization (Chapter 8).

The computational cost is dominated by the E-step: for each time step, the localization runs over the full grid. For T time steps and K grid cells, the E-step is O(T · K). The M-step is O(T · K · M) where M is the number of map cells.

Practical trick: Instead of running the full grid localization at every iteration, use a coarse grid initially and refine it in later iterations. Early iterations only need rough pose estimates; fine resolution matters only near convergence. This multi-resolution approach can speed up EM by an order of magnitude.
Check: What dominates the computational cost of grid-based EM?

Chapter 6: Layered EM

Standard EM doesn't scale to large environments because the grid localization becomes too expensive. Layered EM addresses this by decomposing the problem into layers:

1. Build small local maps from consecutive data segments. Each local map covers a small area and is internally consistent.

2. Run EM on the relative positions of local maps, not individual sensor readings. This is a much smaller optimization problem.

Why layers help: Within a 10-meter local map, odometry is accurate enough to build a good map without EM. The errors accumulate over longer distances. By building accurate local maps first, then aligning them globally with EM, we split the problem into two manageable pieces.

The perceptual model for layered EM compares overlapping local maps: if local map A and local map B are placed at the correct relative positions, their overlapping regions should agree. The measurement likelihood measures this agreement.

Check: What is the key idea of layered EM?

Chapter 7: Local Maps

Each local map is built from a short segment of data (e.g., 50-200 scans) using standard occupancy grid mapping (Chapter 9). The segment is short enough that odometry errors are small, so the local map is accurate.

The local map serves as the "measurement" for the higher-level EM. Instead of matching raw sensor readings to a global map, EM matches local maps to each other. This is both faster and more robust.

Local map size trade-off: Larger local maps cover more area but accumulate more odometry error. Smaller local maps are more accurate but provide less context for alignment. A good rule of thumb: each local map should be small enough that the worst-case odometry error is less than half the sensor range.
Check: Why must local maps be small?

Chapter 8: Examples

EM mapping has produced some of the largest and most accurate maps in robotics research:

EnvironmentChallengeResult
Rectangular corridorFour identical corners; high symmetryEM correctly identifies all four corners
Large office buildingMultiple loops; long corridorsConsistent metric map with aligned loops
Multi-floor buildingRepeated structure across floorsLayered EM aligns floor plans correctly
Comparison with EKF SLAM: EKF SLAM requires carefully engineered feature detectors and fails with ambiguous landmarks. EM mapping works with raw laser scans and handles ambiguity naturally. The cost: EM is batch (multiple passes over data) while EKF is online (single pass).
Check: What type of sensor data can EM mapping use that EKF SLAM cannot?

Chapter 9: Summary

ConceptKey Idea
EM mappingAlternate localization (E) and mapping (M) to find the most likely map
E-stepForwards-backwards localization over all data, producing pose distributions
M-stepBuild map from data weighted by pose distributions
ConvergenceMonotonically improves map likelihood; converges to local maximum
Layered EMBuild local maps, then align them globally
What comes next: EM is powerful but slow (multiple passes over all data). Chapter 14 introduces fast incremental mapping — real-time algorithms based on gradient descent that can build maps as the robot moves, with techniques for detecting and correcting loop closures on-the-fly.
"Give me a bad map and I'll localize roughly.
From those rough poses I'll build a better map.
A few rounds later, both are good."
Check: What is the main limitation of EM mapping compared to EKF SLAM?