Ch 11: The Extended Information Form — Probabilistic Robotics

Chapter 0: Why Information Form?

EKF SLAM (Chapter 10) maintains a covariance matrix Σ_t that becomes dense as landmarks correlate. Updating this dense matrix costs O(N²) per step. Can we do better?

The information form is the dual of the covariance form. Instead of μ and Σ, it maintains:

Ω = Σ^-1 (information matrix)
ξ = Σ^-1μ (information vector)

The key insight: While the covariance matrix of online SLAM is dense, the information matrix of full SLAM is naturally sparse. Non-zero entries appear only between poses connected by motion and between poses and the landmarks they observed. This sparsity enables efficient algorithms.

The EIF SLAM algorithm exploits this sparsity to solve the full SLAM problem: estimating the entire robot path x_1:t and the map m simultaneously. It accumulates information lazily (cheap) and resolves it into estimates at the end (one-time cost).

Check: What is the key structural advantage of the information form for full SLAM?

The information matrix is naturally sparse — non-zero only between directly connected variables The information matrix is smaller The information form doesn't need linearization

Chapter 1: Intuitive Description

Think of the information matrix as a graph of constraints. Each non-zero off-diagonal block represents a constraint between two variables (poses or landmarks):

• Motion constraints link consecutive poses x_t-1 and x_t.

• Measurement constraints link a pose x_t to the landmark m_j it observed.

Each constraint says: "these two variables should satisfy this geometric relationship." The information matrix accumulates all constraints into a single linear system.

Constraint graph: Imagine nodes for every pose and every landmark. Draw an edge between pose t and pose t+1 (motion). Draw an edge between pose t and landmark j whenever the robot observed j at time t. The non-zero pattern of Ω is exactly this graph. Since each pose is linked only to its neighbors and to the few landmarks it sees, the graph — and hence Ω — is sparse.

The key difference from EKF SLAM: EKF SLAM processes information eagerly, updating the dense covariance at each step (expensive). EIF SLAM accumulates information lazily, adding constraints to the sparse information matrix (cheap). Resolution into actual pose/map estimates happens only at the end.

Check: What does a non-zero off-diagonal entry in the information matrix represent?

A direct constraint between two variables (e.g., a motion link or a measurement link) The Euclidean distance between two landmarks The probability that two landmarks are the same

Chapter 2: Sparsity

Why is Ω sparse in full SLAM? The log-posterior decomposes as a sum of independent terms:

log p(y_0:t | z, u, c) = const. + log p(x₀) + Σ_t [log p(x_t|x_t-1,u_t) + Σ_i log p(z_tⁱ|y_t,c_tⁱ)]

Each term involves only a few variables: x_t-1 and x_t for motion, or x_t and m_j for measurement. After linearization, each term contributes a local quadratic to the information matrix, affecting only the rows/columns of the involved variables.

Sparsity quantified: If there are T poses and N landmarks, and the robot sees at most k landmarks per step, the information matrix has O(T + N) diagonal blocks and O(T + kT) off-diagonal non-zero blocks. Compare this to EKF SLAM's dense (3+3N) × (3+3N) covariance. When N >> kT, EIF is dramatically sparser.

Important caveat: this sparsity holds for full SLAM only. In online SLAM, integrating out past poses destroys the sparsity by introducing links between landmarks that were observed at the same (now-eliminated) pose. This is why the EKF's covariance is dense.

Check: Why does marginalizing out past poses destroy sparsity?

Landmarks that were linked indirectly through a shared pose become directly linked when that pose is removed The matrix gets smaller New landmarks are added

Chapter 3: The EIF SLAM Algorithm

EIF SLAM has three phases, run iteratively until convergence:

Phase	What It Does	Complexity
Construct	Build Ω and ξ from linearized motion/measurement models	O(T) — linear in time steps
Reduce	Eliminate landmark variables, leaving only poses	O(T) — linear
Solve	Invert the reduced system to get poses; back-substitute for landmarks	O(T³) worst case, but exploits sparsity

The iteration: Construct uses the current pose estimates for linearization. Reduce removes landmarks via variable elimination. Solve recovers new pose and landmark estimates. These steps repeat 2-3 times with updated estimates, improving the linearization. Convergence is typically fast — 2-3 iterations suffice.

Check: What are the three phases of EIF SLAM?

Construct (build information matrix), Reduce (eliminate landmarks), Solve (recover estimates) Predict, Update, Resample E-step, M-step, Convergence check

Chapter 4: Construct Step

EIF construct builds Ω and ξ by processing all data sequentially:

Initialization: Set Ω_x0,x0 = ∞ · I to fix the initial pose at the origin. This anchors the coordinate system.

For each control u_t: Linearize the motion model at the current estimate. Add a quadratic constraint between x_t-1 and x_t to Ω and ξ.

For each measurement z_tⁱ: Linearize the measurement model. Add a quadratic constraint between x_t and m_j to Ω and ξ.

The beauty of additivity: Each constraint is a local addition to Ω and ξ. Motion constraints affect a 6×6 block (two consecutive poses). Measurement constraints affect a 5×5 block (one pose and one landmark). The total construction time is linear in the number of measurements and controls.

The result: a large sparse matrix Ω and vector ξ, defined over all poses and all landmarks. If the environment has 10,000 poses and 500 landmarks, the matrix is 31,500 × 31,500, but most of it is zero.

Check: Why is the Construct step O(T)?

Each control and measurement adds only a small local block to Ω — no global matrix operations needed The matrix is diagonal Only the most recent data is used

Chapter 5: Reduce Step

After construction, Ω contains both pose and landmark variables. The Reduce step eliminates all landmark variables, leaving a system defined only over poses.

This is variable elimination (Schur complement): for each landmark m_j, compute the set of poses τ(j) that observed it, then update the submatrix connecting those poses:

Ω̃ ← Ω̃ - Ω̃_τ(j),j Ω̃_j,j^-1 Ω̃_j,τ(j)

and similarly for ξ̃. Then remove the rows/columns for m_j.

What variable elimination does: It takes the indirect links between poses (through a shared landmark) and converts them into direct links. If poses 5, 12, and 37 all saw landmark j, eliminating j creates direct links between poses 5-12, 5-37, and 12-37. The landmark information is "absorbed" into the pose-to-pose constraints.

After reducing all landmarks, the remaining system involves only the T+1 poses. For environments without loops, this system is tridiagonal (each pose linked only to its predecessor). With loops, it's still sparse.

Check: What happens when a landmark is eliminated in the Reduce step?

The poses that observed it become directly linked, absorbing the landmark's information The landmark is deleted from the map Nothing changes — the matrix stays the same

Chapter 6: Solve Step

The Solve step recovers the actual estimates from the reduced information system:

1. Pose recovery: Invert the reduced matrix Ω̃ (poses only) to get Σ̃ = Ω̃^-1. Compute pose means: μ_0:t = Σ̃ ξ̃.

2. Landmark recovery: For each landmark j, compute its position from the recovered poses: μ_j = Ω_j,j^-1(ξ_j + Ω_j,τ(j) μ_τ(j)).

The bottleneck: Inverting Ω̃ is the most expensive step. For environments without large loops, Ω̃ is banded and can be inverted in O(T) time. For environments with loops (the robot returns to previously visited areas), the bandwidth increases and inversion can be O(T³) in the worst case. In practice, sparse solvers handle this efficiently.

Landmark recovery is cheap: each landmark's estimate is computed independently from the recovered poses, in O(1) per landmark.

Check: When is the Solve step most expensive?

When the environment has large loops, making the reduced information matrix less sparse When there are many landmarks When the robot moves slowly

Chapter 7: Data Association

Unknown correspondences in EIF SLAM are handled similarly to EKF SLAM: maximum likelihood correspondence. For each measurement, find the landmark that minimizes the Mahalanobis distance.

The challenge: computing Mahalanobis distances in the information form requires access to the covariance (the inverse of Ω), which is expensive to compute for the full system. In practice, local submatrices are extracted and inverted.

Practical approach: Use the current pose estimates from the Solve step to predict expected measurements. Match observations to predictions using Euclidean distance as a first filter, then refine with Mahalanobis distance using local covariance blocks. This avoids inverting the full information matrix for data association.

As with EKF SLAM, wrong correspondences can be devastating. The iterative nature of EIF SLAM provides a partial defense: after each iteration, the improved estimates may reveal and correct earlier association errors.

Check: What makes data association harder in the information form than the covariance form?

Computing Mahalanobis distances requires the covariance, which is the inverse of the information matrix There are more landmarks The information form doesn't store landmarks

Chapter 8: Efficiency

How does EIF SLAM compare to EKF SLAM?

Aspect	EKF SLAM	EIF SLAM
Problem type	Online SLAM	Full SLAM
Representation	Dense covariance (3+3N)²	Sparse information matrix
Per-step cost	O(N²)	O(1) per constraint (lazy)
Total cost	O(TN²)	O(T) + solve cost
Incremental?	Yes	No (batch)
Max landmarks	~1,000	~100,000+

The trade-off: EIF SLAM is a batch algorithm. It processes all data at once, not incrementally. This means it can't be used for real-time SLAM during operation. But it can handle much larger maps because each constraint is cheap to add. It's ideal for offline map construction from recorded data.

Check: Why can EIF SLAM handle more landmarks than EKF SLAM?

Adding constraints to the sparse information matrix is O(1), while EKF updates the dense covariance at O(N²) EIF uses less memory EIF doesn't need linearization

Chapter 9: Summary

Concept	Key Idea
Information form	Ω = Σ^-1; naturally sparse for full SLAM
Constraint graph	Non-zero entries = direct constraints between variables
Construct	Build Ω lazily by adding local constraints; O(T)
Reduce	Eliminate landmarks via variable elimination; O(T)
Solve	Invert reduced system for poses; back-substitute for landmarks
Iteration	Repeat 2-3 times to improve linearization

What comes next: EIF SLAM is a batch algorithm — not suitable for real-time operation. Chapter 12 introduces the Sparse Extended Information Filter (SEIF), which adapts the information form for online SLAM by maintaining an approximately sparse information matrix through sparsification.

"The covariance tells you everything about the uncertainty.
The information matrix tells you everything about the constraints.
Same information, different perspective — and the perspective changes everything."

Check: What makes EIF SLAM a batch algorithm rather than an online one?

It accumulates all data before solving, and the Solve step processes the entire dataset at once It only works with recorded data, not live sensors It is too slow for real-time