Davison & Ortiz — Imperial College London, 2019

FutureMapping 2: Gaussian Belief Propagation for Spatial AI

A detailed tutorial arguing that Gaussian Belief Propagation — local message passing on factor graphs — is the right algorithmic framework for distributed, incremental Spatial AI on next-generation graph processors.

Prerequisites: Factor graphs + Gaussian distributions + Basic linear algebra
10
Chapters
6
Simulations

Chapter 0: The Problem

You're building a lightweight home robot — or AR glasses — that needs to understand the 3D space around it in real-time. It must track its own position, build a map, recognize objects, and do all of this within a tiny power budget. This is Spatial AI.

Current prototype systems like SemanticFusion require heavy computing resources (a full desktop GPU) while delivering a fraction of the capability needed. There are orders of magnitude of improvement still required to fit real products. Why is it so hard?

The core computation in Spatial AI is incremental estimation: continuously fusing noisy measurements from cameras, IMUs, and other sensors into a persistent model of the world. The standard approach — bundle adjustment, pose graph optimization — formulates this as a global least-squares problem. You write down a giant system of equations and solve it all at once.

The bottleneck: Global solvers require a "god's eye view" of the entire problem structure. They need centralized access to all variables and measurements, large matrix factorizations, and high-bandwidth data transfer. This is fundamentally mismatched with the direction hardware is heading: massively parallel, distributed processors with local memory (like Graphcore's IPU) where moving data is far more expensive than computing.

We need an algorithm that:

Why is standard bundle adjustment poorly suited for next-generation graph processor hardware?

Chapter 1: The Key Insight

The answer is Gaussian Belief Propagation (GBP) — an algorithm where each node in a factor graph computes using only local information, exchanging simple Gaussian messages with its immediate neighbors, yet the whole system converges to the globally optimal estimates.

Think of it this way. In a traditional solver, there's a central authority that collects all measurements, builds a giant matrix, and solves the whole system. In GBP, there's no central authority at all. Each variable node is like a person in a network who:

  1. Receives "opinions" (Gaussian messages) from each of its neighboring factors
  2. Combines them by multiplying the Gaussians (adding precisions)
  3. Sends its updated belief back out

After enough rounds of message passing, every node's belief converges to the correct marginal distribution — the same answer that the global solver would give.

Why this matters now: GBP has been known for decades but was never seriously used in robotics. The reason? On CPUs, global solvers (Cholesky, conjugate gradient) are faster. But hardware is changing. Graph processors like Graphcore's IPU have thousands of cores, each with local memory, connected by a communication fabric. GBP's "purely local compute + message passing" maps perfectly onto this architecture. The algorithm was waiting for its hardware.

A Gaussian message is beautifully simple: just an information vector η and a precision matrix Λ. To combine messages, you just add η's and Λ's. To send a message, you do a small local matrix computation. No node ever needs to know the global structure of the graph.

What makes GBP "purely local"?

Chapter 2: Factor Graphs for SLAM

Before diving into GBP itself, we need to understand the data structure it operates on: the factor graph.

A factor graph is an undirected bipartite graph with two types of nodes:

Each factor fs is connected to a subset of variables xs and represents the probability of obtaining measurement zs given those variable values: fs(xs) = p(zs | xs).

The joint probability

The key property: all factors are independent of each other. So the total joint probability over all variables is simply the product of all factors:

p(x) = ∏s fs(xs)

This is the entire probabilistic model in one equation. Every measurement, every prior, every constraint is one factor in this product.

Gaussian factors

In Spatial AI, almost all factors are Gaussian. A Gaussian factor has the form:

fs(xs) = K exp(−½ [zs − hs(xs)]T Λs [zs − hs(xs)])

Where hs is the measurement function, zs is the actual measurement, and Λs is the measurement precision (inverse covariance). Three ingredients define each factor: the measurement function hs, the measured value zs, and the precision Λs.

Factor Graph for SLAM

A robot moves through 4 poses observing 3 landmarks. Blue circles = pose variables, green circles = landmark variables, orange squares = measurement factors. Click nodes to highlight their connections.

Information form: Gaussians can be represented as (μ, Σ) — mean and covariance — or (η, Λ) — information vector and precision matrix — where η = Λμ. GBP uses the information form because multiplying Gaussians (fusing information) becomes simple addition: ηcombined = η1 + η2, Λcombined = Λ1 + Λ2. It can also represent "zero information" (infinite covariance) naturally with η = 0, Λ = 0.
Why does GBP use the information form (η, Λ) rather than the mean/covariance form (μ, Σ)?

Chapter 3: From Global to Local

In standard SLAM, you formulate the estimation problem as minimizing the total energy — the sum of squared residuals from all factors:

x* = argminxs (zs − hs(xs))T Λs (zs − hs(xs))

This is equivalent to finding the mode of the joint Gaussian distribution (since minimizing the negative log of a product of Gaussians gives this sum of quadratics).

The global approach

Standard solvers like Gauss-Newton or Levenberg-Marquardt linearize all factors, stack the Jacobians into a giant matrix J, form the information matrix H = JTΛJ (also called the Hessian approximation), and solve the linear system Hx = b. This requires:

The local alternative

GBP replaces this global solve with iterative local message passing. Instead of forming H and solving Hx = b directly, each node computes small local messages and passes them to neighbors. After enough iterations, the marginal beliefs at each variable node converge to the same answer that the global solver would give.

The trade-off: The global solver converges in one shot (for linear problems) but requires centralized computation. GBP requires multiple iterations of message passing but each iteration is purely local. On a graph processor with thousands of cores running in parallel, those local iterations can be blazingly fast — and you never need to move data off-chip.
Global Solve vs. Local Message Passing

Left: global solver forms and inverts a large matrix. Right: GBP passes local messages that gradually propagate information. Press Play to animate message passing rounds.

Round 0
What does GBP replace the global matrix factorization with?

Chapter 4: GBP Derivation

Now we derive the actual message update rules. This is the mathematical core of GBP. There are two types of messages:

Message from variable to factor

A variable node xm sends a message to factor fs by multiplying together all incoming messages from other factors (every factor neighbor except fs). In the information form, multiplying Gaussians = adding parameters:

ηm→s = ∑l ∈ n(xm) \ fs ηl→m
Λm→s = ∑l ∈ n(xm) \ fs Λl→m

In words: "Dear factor fs, here's what everyone else thinks about me." The variable node excludes fs's own message to avoid circular reasoning.

Message from factor to variable

A factor fs sends a message to variable xm by:

  1. Linearize the factor around current variable estimates to get a local Gaussian: ηs = JTΛs[Jx0 + zs − hs(x0)] and Λ's = JTΛsJ
  2. Add incoming messages from all input variables to the linearized factor
  3. Marginalize out all variables except xm — this is a Schur complement on the combined precision matrix

For a factor connecting two variables xa (input) and xb (output), after adding the message from xa:

ηs→b = η'b − Λ'ba(Λ'aa + Λa→s)−1(η'a + ηa→s)
Λs→b = Λ'bb − Λ'ba(Λ'aa + Λa→s)−1Λ'ab

This is exactly the Schur complement — marginalizing out xa from the joint Gaussian over (xa, xb). The key insight: this is a small local matrix operation, not a global one.

Belief update

The belief at variable xm — its current best estimate — is the product of all incoming messages:

ηm = ∑s ∈ n(xm) ηs→m   Λm = ∑s ∈ n(xm) Λs→m

The mean estimate is then μm = Λm−1ηm.

GBP Message Passing Animation

Watch messages flow on a factor graph. Variable nodes (circles) combine and relay incoming messages. Factor nodes (squares) linearize and marginalize. Press Step to advance one message passing round, or Play for continuous animation.

Iteration 0 — press Step
The Schur complement is the key: The factor-to-variable message requires marginalizing out other variables from a joint Gaussian. For Gaussians, marginalization is a closed-form matrix operation (the Schur complement), and it's local — only involves the variables connected to that factor. A factor connecting 2 variables does a tiny 2×2 (or d×d) matrix inversion, regardless of how large the overall graph is.
In the variable-to-factor message, why does variable xm exclude factor fs's own message?

Chapter 5: Trees vs Loopy Graphs

GBP's behavior depends critically on the structure of the factor graph.

Tree-structured graphs: exact in 2 passes

If the factor graph is a tree (no loops), GBP gives the exact marginal distributions in just two passes: one sweep from leaves to root, then one sweep from root back to leaves. Every variable node gets its exact posterior. This is the classic result from Pearl's work on belief propagation.

Why? In a tree, each branch contributes independent information. The messages from different branches never contain overlapping information, so there's no double-counting.

Loopy graphs: iterative approximation

Real SLAM graphs have loops. When a robot revisits a place (loop closure), the factor graph gains a cycle. On loopy graphs, GBP is no longer exact — messages can "go around in circles," recounting the same information. However:

ηnew = (1 − α)ηold + αηcomputed

With damping factor α ∈ (0, 1]. Smaller α = more conservative updates = more stable but slower convergence.

When GBP converges on loopy graphs: Weiss and Freeman proved that when loopy GBP converges for Gaussian models, the means are exact — they match the true marginal means from the global solution. The variances may be approximate (typically overconfident). For SLAM, the means are what we care most about (where is the robot? where are the landmarks?), so this is a very favorable property.

Convergence strategies

The paper discusses several approaches to improve convergence:

When loopy GBP converges on a Gaussian factor graph, what can we say about the accuracy of its estimates?

Chapter 6: 1D SLAM Example

Let's make GBP concrete with a simple 1D localization problem from the paper. A robot moves along a line, making position measurements and odometry readings.

Setup

The robot has 3 pose variables (x0, x1, x2) connected by:

Tracing the messages

Step 1: The prior factor sends a message to x0: "I think you're at position 0 with precision 10."

Step 2: x0 relays this to the odometry factor connecting x0 and x1. The factor linearizes, adds x0's message, marginalizes out x0, and sends to x1: "Based on the prior on x0 and the odometry, I think you're at position 1.0 with precision 5." (Precision decreases because odometry noise adds uncertainty.)

Step 3: This continues to x2, with precision decreasing at each step. When the landmark observation factor adds its information, x2's belief tightens up. On the backward pass, this improved information flows back to x1 and x0.

1D GBP SLAM

Three robot poses on a line with odometry and a landmark observation. Watch beliefs (Gaussian curves) tighten as messages propagate. Step through message passing rounds or adjust measurement noise.

Iteration 0
Odom. noise0.50
Worked numerical example: Prior on x0: η=0, Λ=10 (mean 0, high confidence). Odometry x0→x1: measured displacement 1.0, precision 4. After message from prior through odometry to x1: the message carries η≈4.0, Λ≈2.86 (mean ≈1.0, lower precision because uncertainty accumulates). The landmark observation at x2 with precision 8 saying "you're at 2.1" then pulls x2 toward 2.1, and the backward messages pull x1 and x0 slightly too.
In the 1D SLAM example, why does precision (confidence) decrease as messages propagate along the odometry chain?

Chapter 7: 2D SLAM Simulation

The paper presents a full 2D SLAM simulation where a robot moves in a square, observing landmarks along the way. This demonstrates GBP's key properties on a realistic-scale problem.

The setup

A robot follows a square trajectory with ~20 poses. At each pose, it observes nearby landmarks (2D bearing/range measurements). There are odometry factors between consecutive poses and landmark observation factors. Critically, when the robot returns to its starting area, loop closure factors create cycles in the graph.

What happens

Before loop closure, uncertainty grows along the trajectory (just like the 1D case). When the loop closure measurements arrive, GBP propagates this information back around the loop, tightening all pose estimates. The paper shows convergence in ~15-20 iterations of message passing with damping.

2D SLAM with GBP

A robot (blue) traces a square path, observing landmarks (green). Ellipses show pose uncertainty. Press Play to watch GBP converge after loop closure. Red = initial noisy estimates, blue = GBP-refined estimates.

Ready — press Play
Damping α0.30
Incremental updates: A major advantage of GBP: when a new measurement arrives, you just add a new factor node and start passing messages from it. There's no need to re-solve the entire system. The new information gradually diffuses through the graph via message passing. This is exactly the incremental behavior Spatial AI needs.
What happens to pose uncertainty when loop closure measurements are added and GBP iterates?

Chapter 8: Why GBP for Spatial AI

The paper makes a systematic case for GBP as the right algorithmic framework for Spatial AI. Here are the key arguments:

1. Perfect match for graph processors

Graph processors like Graphcore's IPU have thousands of cores, each with local memory, connected by a fast communication fabric. GBP's "local compute + message passing" maps directly onto this: each variable and factor node lives on a core, messages are passed via the communication fabric. No global memory access, no centralized coordination.

2. Naturally distributed

GBP works with arbitrary, asynchronous message schedules. Nodes don't need to synchronize. This means it works across multiple chips, multiple devices, even across a network of robots doing cooperative mapping.

3. Inherently incremental

New measurements = new factor nodes. Just add them to the graph and start passing messages. No need to re-solve from scratch. Old parts of the graph that haven't changed don't need recomputation — their messages are already correct.

4. Handles heterogeneous graphs

A real Spatial AI system has geometric measurements (odometry, visual features), semantic labels (from neural networks), calibration parameters, object models — all in one factor graph. GBP doesn't care what the factors represent. Every factor is just a Gaussian constraint. Geometric and semantic estimation happen together, naturally.

5. Attention-driven computation

GBP can focus computation where it matters most. By choosing which messages to send first (residual scheduling), the algorithm automatically spends more computation on parts of the graph that have changed or are uncertain. "Just-in-time" estimation — compute accurate estimates for the parts you need right now, let other parts coast.

The vision: Imagine messages continually bubbling around a large factor graph on a graph processor. The graph is changing continually with new measurements. It may never reach full convergence, but it's always close — with controlled accuracy. Computation flows like attention: intense where needed, relaxed where things are stable. This is fundamentally different from the "stop everything and solve" approach of traditional SLAM.
What makes GBP "naturally incremental"?

Chapter 9: Connections

What GBP builds on

Pearl's Belief Propagation (1988): The foundational message-passing algorithm for probabilistic inference on graphical models. GBP is BP specialized to the Gaussian case.

Weiss & Freeman (2001): Proved that loopy GBP with Gaussians gives exact means when it converges — the theoretical justification for using GBP on loopy SLAM graphs.

Loopy SAM (Ranganathan et al., 2007): First demonstration of GBP for robot mapping. Showed promising results but didn't gain traction because CPU-based global solvers were faster at the time.

iSAM2 (Kaess et al., 2012): Incremental SLAM using Bayes trees — shares GBP's goal of incremental, partially-local updates, but still requires global knowledge of the tree structure.

What came from this

FutureMapping 1 (Davison, 2018): The predecessor paper that laid out the vision for Spatial AI on graph processors. FutureMapping 2 provides the concrete algorithmic framework (GBP) to realize that vision.

Gaussian Splatting SLAM (2023-24): Modern systems like MonoGS, SplaTAM, and Gaussian-SLAM use 3D Gaussians as the scene representation — fitting naturally into the GBP framework where each Gaussian splat could be a variable node optimized via local message passing.

Graphcore IPU for Spatial AI: The specific hardware platform envisioned in this paper. IPUs have thousands of cores with in-processor memory and BSP (Bulk Synchronous Parallel) communication — exactly the architecture GBP is designed for.

NeRF-SLAM / Neural SLAM: Recent work combining neural scene representations with SLAM estimation. Factor graphs with neural factors (NeRF rendering losses) could use GBP for distributed optimization.

Broader connections

Expectation Propagation (EP): A generalization where each factor's message can approximate a non-Gaussian factor with a Gaussian. GBP is a special case of EP where all factors are already Gaussian.

Loopy BP in coding theory: LDPC and turbo code decoding use loopy BP on binary factor graphs — the same algorithm, different domain. Decades of convergence analysis from coding theory apply to GBP.

Variational inference: GBP can be viewed as minimizing the Bethe free energy — a variational approximation to the true log partition function. This connects it to the broader VI framework.

The big picture: FutureMapping 2 is not claiming GBP is novel. It's arguing that GBP — a well-understood algorithm from the probabilistic graphical models literature — is the right tool for Spatial AI at this specific moment in computing history, as hardware shifts from centralized GPUs/CPUs to distributed graph processors. The algorithm was waiting for its hardware to arrive.

Cheat sheet

Core idea
Replace global matrix solves with local Gaussian message passing on factor graphs
Messages
var→factor: multiply (add) incoming messages. factor→var: linearize + Schur complement
Convergence
Exact on trees. On loopy graphs: means exact, variances approximate. Use damping.
Properties
Purely local, naturally incremental, distributed, asynchronous, heterogeneous
Hardware match
Maps perfectly onto graph processors (Graphcore IPU) with local memory + message passing
Why did GBP not gain traction for SLAM before this paper, despite being demonstrated in 2007?