FutureMapping 2 — Veanors

Chapter 0: The Problem

You're building a lightweight home robot — or AR glasses — that needs to understand the 3D space around it in real-time. It must track its own position, build a map, recognize objects, and do all of this within a tiny power budget. This is Spatial AI.

Current prototype systems like SemanticFusion require heavy computing resources (a full desktop GPU) while delivering a fraction of the capability needed. There are orders of magnitude of improvement still required to fit real products. Why is it so hard?

The core computation in Spatial AI is incremental estimation: continuously fusing noisy measurements from cameras, IMUs, and other sensors into a persistent model of the world. The standard approach — bundle adjustment, pose graph optimization — formulates this as a global least-squares problem. You write down a giant system of equations and solve it all at once.

The bottleneck: Global solvers require a "god's eye view" of the entire problem structure. They need centralized access to all variables and measurements, large matrix factorizations, and high-bandwidth data transfer. This is fundamentally mismatched with the direction hardware is heading: massively parallel, distributed processors with local memory (like Graphcore's IPU) where moving data is far more expensive than computing.

We need an algorithm that:

Operates with purely local computation and storage — each node only knows about its direct neighbors
Communicates only via local message passing — no global coordination
Is naturally incremental — new measurements are absorbed without re-solving from scratch
Is distributed — maps directly onto graph processor architectures
Is flexible — handles heterogeneous factor graphs with mixed geometric and semantic measurements

Why is standard bundle adjustment poorly suited for next-generation graph processor hardware?

It requires centralized access to the full problem structure and large data transfers — mismatched with distributed, local-memory architectures It's too simple to benefit from parallelism It doesn't use Gaussian distributions

Chapter 1: The Key Insight

The answer is Gaussian Belief Propagation (GBP) — an algorithm where each node in a factor graph computes using only local information, exchanging simple Gaussian messages with its immediate neighbors, yet the whole system converges to the globally optimal estimates.

Think of it this way. In a traditional solver, there's a central authority that collects all measurements, builds a giant matrix, and solves the whole system. In GBP, there's no central authority at all. Each variable node is like a person in a network who:

Receives "opinions" (Gaussian messages) from each of its neighboring factors
Combines them by multiplying the Gaussians (adding precisions)
Sends its updated belief back out

After enough rounds of message passing, every node's belief converges to the correct marginal distribution — the same answer that the global solver would give.

Why this matters now: GBP has been known for decades but was never seriously used in robotics. The reason? On CPUs, global solvers (Cholesky, conjugate gradient) are faster. But hardware is changing. Graph processors like Graphcore's IPU have thousands of cores, each with local memory, connected by a communication fabric. GBP's "purely local compute + message passing" maps perfectly onto this architecture. The algorithm was waiting for its hardware.

A Gaussian message is beautifully simple: just an information vector η and a precision matrix Λ. To combine messages, you just add η's and Λ's. To send a message, you do a small local matrix computation. No node ever needs to know the global structure of the graph.

What makes GBP "purely local"?

Each node only communicates with its direct neighbors and never needs knowledge of the rest of the graph It only processes one measurement at a time It runs on a single processor core

Chapter 2: Factor Graphs for SLAM

Before diving into GBP itself, we need to understand the data structure it operates on: the factor graph.

A factor graph is an undirected bipartite graph with two types of nodes:

Variable nodes x_i — things we want to estimate (robot poses, landmark positions, calibration parameters)
Factor nodes f_s — constraints that connect variables (sensor measurements, motion models, priors)

Each factor f_s is connected to a subset of variables x_s and represents the probability of obtaining measurement z_s given those variable values: f_s(x_s) = p(z_s | x_s).

The joint probability

The key property: all factors are independent of each other. So the total joint probability over all variables is simply the product of all factors:

p(x) = ∏_s f_s(x_s)

This is the entire probabilistic model in one equation. Every measurement, every prior, every constraint is one factor in this product.

Gaussian factors

In Spatial AI, almost all factors are Gaussian. A Gaussian factor has the form:

f_s(x_s) = K exp(−½ [z_s − h_s(x_s)]^T Λ_s [z_s − h_s(x_s)])

Where h_s is the measurement function, z_s is the actual measurement, and Λ_s is the measurement precision (inverse covariance). Three ingredients define each factor: the measurement function h_s, the measured value z_s, and the precision Λ_s.

Factor Graph for SLAM

A robot moves through 4 poses observing 3 landmarks. Blue circles = pose variables, green circles = landmark variables, orange squares = measurement factors. Click nodes to highlight their connections.

Information form: Gaussians can be represented as (μ, Σ) — mean and covariance — or (η, Λ) — information vector and precision matrix — where η = Λμ. GBP uses the information form because multiplying Gaussians (fusing information) becomes simple addition: η_combined = η₁ + η₂, Λ_combined = Λ₁ + Λ₂. It can also represent "zero information" (infinite covariance) naturally with η = 0, Λ = 0.

Why does GBP use the information form (η, Λ) rather than the mean/covariance form (μ, Σ)?

Because multiplying Gaussians (fusing information from multiple sources) reduces to simple addition of η's and Λ's Because it uses less memory Because it makes visualization easier

Chapter 3: From Global to Local

In standard SLAM, you formulate the estimation problem as minimizing the total energy — the sum of squared residuals from all factors:

x* = argmin_x ∑_s (z_s − h_s(x_s))^T Λ_s (z_s − h_s(x_s))

This is equivalent to finding the mode of the joint Gaussian distribution (since minimizing the negative log of a product of Gaussians gives this sum of quadratics).

The global approach

Standard solvers like Gauss-Newton or Levenberg-Marquardt linearize all factors, stack the Jacobians into a giant matrix J, form the information matrix H = J^TΛJ (also called the Hessian approximation), and solve the linear system Hx = b. This requires:

Access to all Jacobians and measurements at once
Forming and factorizing H — a sparse but potentially huge matrix
Global reordering strategies (variable elimination, Cholesky, etc.)

The local alternative

GBP replaces this global solve with iterative local message passing. Instead of forming H and solving Hx = b directly, each node computes small local messages and passes them to neighbors. After enough iterations, the marginal beliefs at each variable node converge to the same answer that the global solver would give.

The trade-off: The global solver converges in one shot (for linear problems) but requires centralized computation. GBP requires multiple iterations of message passing but each iteration is purely local. On a graph processor with thousands of cores running in parallel, those local iterations can be blazingly fast — and you never need to move data off-chip.

Global Solve vs. Local Message Passing

Left: global solver forms and inverts a large matrix. Right: GBP passes local messages that gradually propagate information. Press Play to animate message passing rounds.

Round 0

What does GBP replace the global matrix factorization with?

Iterative local message passing between neighboring nodes — each node only does small local computations A faster matrix factorization Sampling from the posterior distribution

Chapter 4: GBP Derivation

Now we derive the actual message update rules. This is the mathematical core of GBP. There are two types of messages:

Message from variable to factor

A variable node x_m sends a message to factor f_s by multiplying together all incoming messages from other factors (every factor neighbor except f_s). In the information form, multiplying Gaussians = adding parameters:

η_m→s = ∑_{l ∈ n(x_m) \ f_s} η_l→m

Λ_m→s = ∑_{l ∈ n(x_m) \ f_s} Λ_l→m

In words: "Dear factor f_s, here's what everyone else thinks about me." The variable node excludes f_s's own message to avoid circular reasoning.

Message from factor to variable

A factor f_s sends a message to variable x_m by:

Linearize the factor around current variable estimates to get a local Gaussian: η_s = J^TΛ_s[Jx₀ + z_s − h_s(x₀)] and Λ'_s = J^TΛ_sJ
Add incoming messages from all input variables to the linearized factor
Marginalize out all variables except x_m — this is a Schur complement on the combined precision matrix

For a factor connecting two variables x_a (input) and x_b (output), after adding the message from x_a:

η_s→b = η'_b − Λ'_ba(Λ'_aa + Λ_a→s)⁻¹(η'_a + η_a→s)

Λ_s→b = Λ'_bb − Λ'_ba(Λ'_aa + Λ_a→s)⁻¹Λ'_ab

This is exactly the Schur complement — marginalizing out x_a from the joint Gaussian over (x_a, x_b). The key insight: this is a small local matrix operation, not a global one.

Belief update

The belief at variable x_m — its current best estimate — is the product of all incoming messages:

η_m = ∑_{s ∈ n(x_m)} η_s→m Λ_m = ∑_{s ∈ n(x_m)} Λ_s→m

The mean estimate is then μ_m = Λ_m⁻¹η_m.

GBP Message Passing Animation

Watch messages flow on a factor graph. Variable nodes (circles) combine and relay incoming messages. Factor nodes (squares) linearize and marginalize. Press Step to advance one message passing round, or Play for continuous animation.

Iteration 0 — press Step

The Schur complement is the key: The factor-to-variable message requires marginalizing out other variables from a joint Gaussian. For Gaussians, marginalization is a closed-form matrix operation (the Schur complement), and it's local — only involves the variables connected to that factor. A factor connecting 2 variables does a tiny 2×2 (or d×d) matrix inversion, regardless of how large the overall graph is.

In the variable-to-factor message, why does variable x_m exclude factor f_s's own message?

To avoid circular reasoning — sending f_s's own information back to it would double-count that measurement To reduce computation Because f_s's message hasn't arrived yet

Chapter 5: Trees vs Loopy Graphs

GBP's behavior depends critically on the structure of the factor graph.

Tree-structured graphs: exact in 2 passes

If the factor graph is a tree (no loops), GBP gives the exact marginal distributions in just two passes: one sweep from leaves to root, then one sweep from root back to leaves. Every variable node gets its exact posterior. This is the classic result from Pearl's work on belief propagation.

Why? In a tree, each branch contributes independent information. The messages from different branches never contain overlapping information, so there's no double-counting.

Loopy graphs: iterative approximation

Real SLAM graphs have loops. When a robot revisits a place (loop closure), the factor graph gains a cycle. On loopy graphs, GBP is no longer exact — messages can "go around in circles," recounting the same information. However:

In practice, loopy GBP often converges to excellent approximations
For Gaussian models with walk-summable precision matrices (diagonally dominant), convergence is guaranteed
Damping helps: instead of sending the full new message, send a weighted average of the old and new message

η^new = (1 − α)η^old + αη^computed

With damping factor α ∈ (0, 1]. Smaller α = more conservative updates = more stable but slower convergence.

When GBP converges on loopy graphs: Weiss and Freeman proved that when loopy GBP converges for Gaussian models, the means are exact — they match the true marginal means from the global solution. The variances may be approximate (typically overconfident). For SLAM, the means are what we care most about (where is the robot? where are the landmarks?), so this is a very favorable property.

Convergence strategies

The paper discusses several approaches to improve convergence:

Damping — the simplest fix, controls step size
Message scheduling — choose which messages to send first (e.g., residual BP: send the most-changed messages first)
Robust factors — use robust kernels (Huber, etc.) to downweight outlier measurements

When loopy GBP converges on a Gaussian factor graph, what can we say about the accuracy of its estimates?

The means are exact (match the global solution), but the variances may be approximate (typically overconfident) Both means and variances are exact Everything is approximate

Chapter 6: 1D SLAM Example

Let's make GBP concrete with a simple 1D localization problem from the paper. A robot moves along a line, making position measurements and odometry readings.

Setup

The robot has 3 pose variables (x₀, x₁, x₂) connected by:

A prior on x₀: the robot starts at position 0 (with some uncertainty)
Odometry factors: x₁ − x₀ ≈ 1.0, x₂ − x₁ ≈ 1.0 (measured motion)
A landmark observation: x₂ sees a landmark at a known position

Tracing the messages

Step 1: The prior factor sends a message to x₀: "I think you're at position 0 with precision 10."

Step 2: x₀ relays this to the odometry factor connecting x₀ and x₁. The factor linearizes, adds x₀'s message, marginalizes out x₀, and sends to x₁: "Based on the prior on x₀ and the odometry, I think you're at position 1.0 with precision 5." (Precision decreases because odometry noise adds uncertainty.)

Step 3: This continues to x₂, with precision decreasing at each step. When the landmark observation factor adds its information, x₂'s belief tightens up. On the backward pass, this improved information flows back to x₁ and x₀.

1D GBP SLAM

Three robot poses on a line with odometry and a landmark observation. Watch beliefs (Gaussian curves) tighten as messages propagate. Step through message passing rounds or adjust measurement noise.

Iteration 0

Odom. noise0.50

Worked numerical example: Prior on x₀: η=0, Λ=10 (mean 0, high confidence). Odometry x₀→x₁: measured displacement 1.0, precision 4. After message from prior through odometry to x₁: the message carries η≈4.0, Λ≈2.86 (mean ≈1.0, lower precision because uncertainty accumulates). The landmark observation at x₂ with precision 8 saying "you're at 2.1" then pulls x₂ toward 2.1, and the backward messages pull x₁ and x₀ slightly too.

In the 1D SLAM example, why does precision (confidence) decrease as messages propagate along the odometry chain?

Each odometry factor adds its own measurement noise, so uncertainty accumulates along the chain — just like dead reckoning drift Because the robot is moving further from the origin Because GBP is approximate

Chapter 7: 2D SLAM Simulation

The paper presents a full 2D SLAM simulation where a robot moves in a square, observing landmarks along the way. This demonstrates GBP's key properties on a realistic-scale problem.

The setup

A robot follows a square trajectory with ~20 poses. At each pose, it observes nearby landmarks (2D bearing/range measurements). There are odometry factors between consecutive poses and landmark observation factors. Critically, when the robot returns to its starting area, loop closure factors create cycles in the graph.

What happens

Before loop closure, uncertainty grows along the trajectory (just like the 1D case). When the loop closure measurements arrive, GBP propagates this information back around the loop, tightening all pose estimates. The paper shows convergence in ~15-20 iterations of message passing with damping.

2D SLAM with GBP

A robot (blue) traces a square path, observing landmarks (green). Ellipses show pose uncertainty. Press Play to watch GBP converge after loop closure. Red = initial noisy estimates, blue = GBP-refined estimates.

Ready — press Play

Damping α0.30

Incremental updates: A major advantage of GBP: when a new measurement arrives, you just add a new factor node and start passing messages from it. There's no need to re-solve the entire system. The new information gradually diffuses through the graph via message passing. This is exactly the incremental behavior Spatial AI needs.

What happens to pose uncertainty when loop closure measurements are added and GBP iterates?

The loop closure information propagates back through the graph via messages, tightening (reducing uncertainty of) all pose estimates along the loop Only the poses near the loop closure improve Uncertainty increases because of the loop

Chapter 8: Why GBP for Spatial AI

The paper makes a systematic case for GBP as the right algorithmic framework for Spatial AI. Here are the key arguments:

1. Perfect match for graph processors

Graph processors like Graphcore's IPU have thousands of cores, each with local memory, connected by a fast communication fabric. GBP's "local compute + message passing" maps directly onto this: each variable and factor node lives on a core, messages are passed via the communication fabric. No global memory access, no centralized coordination.

2. Naturally distributed

GBP works with arbitrary, asynchronous message schedules. Nodes don't need to synchronize. This means it works across multiple chips, multiple devices, even across a network of robots doing cooperative mapping.

3. Inherently incremental

New measurements = new factor nodes. Just add them to the graph and start passing messages. No need to re-solve from scratch. Old parts of the graph that haven't changed don't need recomputation — their messages are already correct.

4. Handles heterogeneous graphs

A real Spatial AI system has geometric measurements (odometry, visual features), semantic labels (from neural networks), calibration parameters, object models — all in one factor graph. GBP doesn't care what the factors represent. Every factor is just a Gaussian constraint. Geometric and semantic estimation happen together, naturally.

5. Attention-driven computation

GBP can focus computation where it matters most. By choosing which messages to send first (residual scheduling), the algorithm automatically spends more computation on parts of the graph that have changed or are uncertain. "Just-in-time" estimation — compute accurate estimates for the parts you need right now, let other parts coast.

The vision: Imagine messages continually bubbling around a large factor graph on a graph processor. The graph is changing continually with new measurements. It may never reach full convergence, but it's always close — with controlled accuracy. Computation flows like attention: intense where needed, relaxed where things are stable. This is fundamentally different from the "stop everything and solve" approach of traditional SLAM.

What makes GBP "naturally incremental"?

New measurements just add new factor nodes — messages from them propagate through the graph without re-solving the entire system It stores a history of all previous solutions It only processes the most recent measurement

Chapter 9: Connections

What GBP builds on

Pearl's Belief Propagation (1988): The foundational message-passing algorithm for probabilistic inference on graphical models. GBP is BP specialized to the Gaussian case.

Weiss & Freeman (2001): Proved that loopy GBP with Gaussians gives exact means when it converges — the theoretical justification for using GBP on loopy SLAM graphs.

Loopy SAM (Ranganathan et al., 2007): First demonstration of GBP for robot mapping. Showed promising results but didn't gain traction because CPU-based global solvers were faster at the time.

iSAM2 (Kaess et al., 2012): Incremental SLAM using Bayes trees — shares GBP's goal of incremental, partially-local updates, but still requires global knowledge of the tree structure.

What came from this

FutureMapping 1 (Davison, 2018): The predecessor paper that laid out the vision for Spatial AI on graph processors. FutureMapping 2 provides the concrete algorithmic framework (GBP) to realize that vision.

Gaussian Splatting SLAM (2023-24): Modern systems like MonoGS, SplaTAM, and Gaussian-SLAM use 3D Gaussians as the scene representation — fitting naturally into the GBP framework where each Gaussian splat could be a variable node optimized via local message passing.

Graphcore IPU for Spatial AI: The specific hardware platform envisioned in this paper. IPUs have thousands of cores with in-processor memory and BSP (Bulk Synchronous Parallel) communication — exactly the architecture GBP is designed for.

NeRF-SLAM / Neural SLAM: Recent work combining neural scene representations with SLAM estimation. Factor graphs with neural factors (NeRF rendering losses) could use GBP for distributed optimization.

Broader connections

Expectation Propagation (EP): A generalization where each factor's message can approximate a non-Gaussian factor with a Gaussian. GBP is a special case of EP where all factors are already Gaussian.

Loopy BP in coding theory: LDPC and turbo code decoding use loopy BP on binary factor graphs — the same algorithm, different domain. Decades of convergence analysis from coding theory apply to GBP.

Variational inference: GBP can be viewed as minimizing the Bethe free energy — a variational approximation to the true log partition function. This connects it to the broader VI framework.

The big picture: FutureMapping 2 is not claiming GBP is novel. It's arguing that GBP — a well-understood algorithm from the probabilistic graphical models literature — is the right tool for Spatial AI at this specific moment in computing history, as hardware shifts from centralized GPUs/CPUs to distributed graph processors. The algorithm was waiting for its hardware to arrive.

Cheat sheet

Core idea

Replace global matrix solves with local Gaussian message passing on factor graphs

Messages

var→factor: multiply (add) incoming messages. factor→var: linearize + Schur complement

Convergence

Exact on trees. On loopy graphs: means exact, variances approximate. Use damping.

Properties

Purely local, naturally incremental, distributed, asynchronous, heterogeneous

Hardware match

Maps perfectly onto graph processors (Graphcore IPU) with local memory + message passing

Why did GBP not gain traction for SLAM before this paper, despite being demonstrated in 2007?

On CPUs, global solvers (Cholesky, conjugate gradient) were faster — GBP's advantage only emerges on distributed graph processors, which didn't exist yet Because GBP doesn't work on loopy graphs Because SLAM was already solved

FutureMapping 2: Gaussian Belief Propagation for Spatial AI