Thrun et al., Chapter 13

Mapping with Unknown Data Association

The EM algorithm for mapping: alternating localization and mapping, grid-based EM, and layered representations.

Prerequisites: Chapters 7-9 (localization, mapping), basic EM concept.

Chapters

Simulations

Quizzes

Chapter 0: The Data Association Problem

All SLAM algorithms in Chapters 10-12 assume known correspondences, or use maximum likelihood to determine them on-the-fly. But what happens when landmarks are genuinely ambiguous? When the robot returns to a place and can't tell which previously seen feature matches which current observation?

The challenge: In a rectangular building with four identical-looking corners, the robot visits each corner but cannot tell them apart. The four corner detections might correspond to 1, 2, 3, or 4 distinct corners. Without resolving this ambiguity, the map is fundamentally wrong. This is the data association problem at its hardest.

This chapter introduces Expectation-Maximization (EM) for mapping — an algorithm that can handle ambiguous correspondences by iterating between localization and mapping. Unlike EKF/EIF SLAM, which commit to a single correspondence hypothesis, EM maintains a probability distribution over correspondences.

The trade-off: EM is a batch algorithm (requires multiple passes over the data) and only finds the most likely map (no uncertainty estimate). But it can handle raw sensor data and ambiguous correspondences that would break Kalman-based approaches.

Check: What problem does EM mapping address that EKF/EIF SLAM cannot?

Ambiguous data association — when the robot cannot tell which features correspond to which landmarks Large maps Real-time operation

Chapter 1: EM: The Basic Idea

EM mapping finds the most likely map by alternating two steps:

m* = argmax_m p(m | d)

where d is all data (measurements and controls). Since searching the space of all maps is intractable, EM performs hill climbing:

E-Step (Expectation / Localization): Given the current map estimate m^[i], localize the robot at every point in time. Compute a probability distribution over poses for each time step.

M-Step (Maximization / Mapping): Given the pose distributions from the E-step, compute a new map m^[i+1] that maximizes the expected log-likelihood.

The chicken-and-egg solution: We can't localize without a map, and we can't map without knowing poses. EM breaks the deadlock: start with any map (even a terrible one from raw odometry), localize against it, build a better map from the localization, then localize again in the improved map, and so on. Each iteration is guaranteed to improve (or maintain) the map likelihood.

Check: What does the E-step of EM mapping compute?

A probability distribution over robot poses at each time step, given the current map estimate The most likely map The sensor noise parameters

Chapter 2: E-Step: Localization

The E-step runs a full forwards-backwards localization pass over the data. For each time step τ, it computes:

p(s_τ | m^[i], z_1:T, u_1:T) ∝ p(s_τ | m^[i], z_τ+1:T, u_τ+1:T) · p(s_τ | m^[i], z_1:τ, u_1:τ)

The first factor is the backward pass (future data constraining past poses). The second is the forward pass (past data). Combining them gives the best possible pose estimate using ALL data.

Why forwards-backwards? In standard online localization, we only use data up to the current time. But EM is a batch algorithm with access to all data. Data collected after time τ also carries information about the pose at time τ. The backwards pass captures this, producing significantly sharper pose distributions.

The result is a probability distribution I_{s_τ} over poses for every time step. This distribution serves as a "soft assignment" for the M-step — each pose is weighted by its probability rather than committed to a single value.

Check: Why does the E-step use a backwards pass in addition to the forward pass?

Future measurements also constrain past poses, and the backwards pass captures this information The backwards pass is faster It corrects for sensor noise

Chapter 3: M-Step: Mapping

The M-step builds a new map that maximizes the expected log-likelihood under the pose distributions from the E-step:

m^[i+1] = argmax_m Σ_τ ∫ I_{s_τ} log p(z_τ | s_τ, m) ds_τ

For occupancy grid maps, this decomposes per-cell: each cell's occupancy is determined by the weighted evidence from all times when the robot might have observed it, weighted by the pose probabilities.

The soft assignment advantage: Unlike maximum likelihood SLAM, which commits to single poses and can make hard errors, EM "spreads" each observation across all plausible poses. A measurement might have come from pose A (probability 0.7) or pose B (probability 0.3). Both contribute to the map, proportionally. This soft assignment naturally handles ambiguous correspondences.

Check: What does the M-step maximize?

The expected log-likelihood of the sensor data, weighted by the pose distributions from the E-step The number of landmarks detected The robot's speed

Chapter 4: Convergence

EM is guaranteed to improve the map likelihood at each iteration (or leave it unchanged at a fixed point). It converges to a local maximum of p(m | d).

In practice, 5-20 iterations typically suffice. The initial map (from raw odometry) is usually terrible, but each iteration produces a noticeably better map. The improvement is often dramatic after just 2-3 iterations.

Local maxima: EM finds a local maximum, not necessarily the global one. In highly symmetric environments (e.g., a circular corridor), EM may converge to a map that's rotated or reflected relative to the true map. In practice, this is rarely a problem for environments with sufficient asymmetry.

Iteration	Map Quality	Description
0	Terrible	Raw odometry; drift makes everything misaligned
1	Rough	Major structures visible; local alignment improved
2-3	Good	Walls are straight; rooms recognizable
5-10	Excellent	Near-optimal; convergence nearly complete

Check: What guarantee does EM provide about convergence?

Each iteration improves (or maintains) the map likelihood, converging to a local maximum It always finds the global optimum It converges in exactly 3 iterations

Chapter 5: Grid-Based EM

The E-step requires representing the pose distribution over a 3D space (x, y, θ). Grid-based EM discretizes this on a grid, similar to grid localization (Chapter 8).

The computational cost is dominated by the E-step: for each time step, the localization runs over the full grid. For T time steps and K grid cells, the E-step is O(T · K). The M-step is O(T · K · M) where M is the number of map cells.

Practical trick: Instead of running the full grid localization at every iteration, use a coarse grid initially and refine it in later iterations. Early iterations only need rough pose estimates; fine resolution matters only near convergence. This multi-resolution approach can speed up EM by an order of magnitude.

Check: What dominates the computational cost of grid-based EM?

The E-step (localization): running grid localization forwards and backwards over the full dataset The M-step (mapping) Data loading

Chapter 6: Layered EM

Standard EM doesn't scale to large environments because the grid localization becomes too expensive. Layered EM addresses this by decomposing the problem into layers:

1. Build small local maps from consecutive data segments. Each local map covers a small area and is internally consistent.

2. Run EM on the relative positions of local maps, not individual sensor readings. This is a much smaller optimization problem.

Why layers help: Within a 10-meter local map, odometry is accurate enough to build a good map without EM. The errors accumulate over longer distances. By building accurate local maps first, then aligning them globally with EM, we split the problem into two manageable pieces.

The perceptual model for layered EM compares overlapping local maps: if local map A and local map B are placed at the correct relative positions, their overlapping regions should agree. The measurement likelihood measures this agreement.

Check: What is the key idea of layered EM?

Build accurate local maps first, then use EM to align their relative positions globally Use multiple sensors Run EM on each grid cell independently

Chapter 7: Local Maps

Each local map is built from a short segment of data (e.g., 50-200 scans) using standard occupancy grid mapping (Chapter 9). The segment is short enough that odometry errors are small, so the local map is accurate.

The local map serves as the "measurement" for the higher-level EM. Instead of matching raw sensor readings to a global map, EM matches local maps to each other. This is both faster and more robust.

Local map size trade-off: Larger local maps cover more area but accumulate more odometry error. Smaller local maps are more accurate but provide less context for alignment. A good rule of thumb: each local map should be small enough that the worst-case odometry error is less than half the sensor range.

Check: Why must local maps be small?

To keep odometry errors small enough that the local map is internally accurate To save memory Because the sensor range is short

Chapter 8: Examples

EM mapping has produced some of the largest and most accurate maps in robotics research:

Environment	Challenge	Result
Rectangular corridor	Four identical corners; high symmetry	EM correctly identifies all four corners
Large office building	Multiple loops; long corridors	Consistent metric map with aligned loops
Multi-floor building	Repeated structure across floors	Layered EM aligns floor plans correctly

Comparison with EKF SLAM: EKF SLAM requires carefully engineered feature detectors and fails with ambiguous landmarks. EM mapping works with raw laser scans and handles ambiguity naturally. The cost: EM is batch (multiple passes over data) while EKF is online (single pass).

Check: What type of sensor data can EM mapping use that EKF SLAM cannot?

Raw sensor data (laser scans) without feature extraction, including ambiguous observations GPS data Camera images

Chapter 9: Summary

Concept	Key Idea
EM mapping	Alternate localization (E) and mapping (M) to find the most likely map
E-step	Forwards-backwards localization over all data, producing pose distributions
M-step	Build map from data weighted by pose distributions
Convergence	Monotonically improves map likelihood; converges to local maximum
Layered EM	Build local maps, then align them globally

What comes next: EM is powerful but slow (multiple passes over all data). Chapter 14 introduces fast incremental mapping — real-time algorithms based on gradient descent that can build maps as the robot moves, with techniques for detecting and correcting loop closures on-the-fly.

"Give me a bad map and I'll localize roughly.
From those rough poses I'll build a better map.
A few rounds later, both are good."

Check: What is the main limitation of EM mapping compared to EKF SLAM?

EM is a batch algorithm requiring multiple passes over all data, not suitable for real-time operation EM cannot handle loop closure EM requires feature extraction