A detailed tutorial arguing that Gaussian Belief Propagation — local message passing on factor graphs — is the right algorithmic framework for distributed, incremental Spatial AI on next-generation graph processors.
You're building a lightweight home robot — or AR glasses — that needs to understand the 3D space around it in real-time. It must track its own position, build a map, recognize objects, and do all of this within a tiny power budget. This is Spatial AI.
Current prototype systems like SemanticFusion require heavy computing resources (a full desktop GPU) while delivering a fraction of the capability needed. There are orders of magnitude of improvement still required to fit real products. Why is it so hard?
The core computation in Spatial AI is incremental estimation: continuously fusing noisy measurements from cameras, IMUs, and other sensors into a persistent model of the world. The standard approach — bundle adjustment, pose graph optimization — formulates this as a global least-squares problem. You write down a giant system of equations and solve it all at once.
We need an algorithm that:
The answer is Gaussian Belief Propagation (GBP) — an algorithm where each node in a factor graph computes using only local information, exchanging simple Gaussian messages with its immediate neighbors, yet the whole system converges to the globally optimal estimates.
Think of it this way. In a traditional solver, there's a central authority that collects all measurements, builds a giant matrix, and solves the whole system. In GBP, there's no central authority at all. Each variable node is like a person in a network who:
After enough rounds of message passing, every node's belief converges to the correct marginal distribution — the same answer that the global solver would give.
A Gaussian message is beautifully simple: just an information vector η and a precision matrix Λ. To combine messages, you just add η's and Λ's. To send a message, you do a small local matrix computation. No node ever needs to know the global structure of the graph.
Before diving into GBP itself, we need to understand the data structure it operates on: the factor graph.
A factor graph is an undirected bipartite graph with two types of nodes:
Each factor fs is connected to a subset of variables xs and represents the probability of obtaining measurement zs given those variable values: fs(xs) = p(zs | xs).
The key property: all factors are independent of each other. So the total joint probability over all variables is simply the product of all factors:
This is the entire probabilistic model in one equation. Every measurement, every prior, every constraint is one factor in this product.
In Spatial AI, almost all factors are Gaussian. A Gaussian factor has the form:
Where hs is the measurement function, zs is the actual measurement, and Λs is the measurement precision (inverse covariance). Three ingredients define each factor: the measurement function hs, the measured value zs, and the precision Λs.
A robot moves through 4 poses observing 3 landmarks. Blue circles = pose variables, green circles = landmark variables, orange squares = measurement factors. Click nodes to highlight their connections.
In standard SLAM, you formulate the estimation problem as minimizing the total energy — the sum of squared residuals from all factors:
This is equivalent to finding the mode of the joint Gaussian distribution (since minimizing the negative log of a product of Gaussians gives this sum of quadratics).
Standard solvers like Gauss-Newton or Levenberg-Marquardt linearize all factors, stack the Jacobians into a giant matrix J, form the information matrix H = JTΛJ (also called the Hessian approximation), and solve the linear system Hx = b. This requires:
GBP replaces this global solve with iterative local message passing. Instead of forming H and solving Hx = b directly, each node computes small local messages and passes them to neighbors. After enough iterations, the marginal beliefs at each variable node converge to the same answer that the global solver would give.
Left: global solver forms and inverts a large matrix. Right: GBP passes local messages that gradually propagate information. Press Play to animate message passing rounds.
Now we derive the actual message update rules. This is the mathematical core of GBP. There are two types of messages:
A variable node xm sends a message to factor fs by multiplying together all incoming messages from other factors (every factor neighbor except fs). In the information form, multiplying Gaussians = adding parameters:
In words: "Dear factor fs, here's what everyone else thinks about me." The variable node excludes fs's own message to avoid circular reasoning.
A factor fs sends a message to variable xm by:
For a factor connecting two variables xa (input) and xb (output), after adding the message from xa:
This is exactly the Schur complement — marginalizing out xa from the joint Gaussian over (xa, xb). The key insight: this is a small local matrix operation, not a global one.
The belief at variable xm — its current best estimate — is the product of all incoming messages:
The mean estimate is then μm = Λm−1ηm.
Watch messages flow on a factor graph. Variable nodes (circles) combine and relay incoming messages. Factor nodes (squares) linearize and marginalize. Press Step to advance one message passing round, or Play for continuous animation.
GBP's behavior depends critically on the structure of the factor graph.
If the factor graph is a tree (no loops), GBP gives the exact marginal distributions in just two passes: one sweep from leaves to root, then one sweep from root back to leaves. Every variable node gets its exact posterior. This is the classic result from Pearl's work on belief propagation.
Why? In a tree, each branch contributes independent information. The messages from different branches never contain overlapping information, so there's no double-counting.
Real SLAM graphs have loops. When a robot revisits a place (loop closure), the factor graph gains a cycle. On loopy graphs, GBP is no longer exact — messages can "go around in circles," recounting the same information. However:
With damping factor α ∈ (0, 1]. Smaller α = more conservative updates = more stable but slower convergence.
The paper discusses several approaches to improve convergence:
Let's make GBP concrete with a simple 1D localization problem from the paper. A robot moves along a line, making position measurements and odometry readings.
The robot has 3 pose variables (x0, x1, x2) connected by:
Step 1: The prior factor sends a message to x0: "I think you're at position 0 with precision 10."
Step 2: x0 relays this to the odometry factor connecting x0 and x1. The factor linearizes, adds x0's message, marginalizes out x0, and sends to x1: "Based on the prior on x0 and the odometry, I think you're at position 1.0 with precision 5." (Precision decreases because odometry noise adds uncertainty.)
Step 3: This continues to x2, with precision decreasing at each step. When the landmark observation factor adds its information, x2's belief tightens up. On the backward pass, this improved information flows back to x1 and x0.
Three robot poses on a line with odometry and a landmark observation. Watch beliefs (Gaussian curves) tighten as messages propagate. Step through message passing rounds or adjust measurement noise.
The paper presents a full 2D SLAM simulation where a robot moves in a square, observing landmarks along the way. This demonstrates GBP's key properties on a realistic-scale problem.
A robot follows a square trajectory with ~20 poses. At each pose, it observes nearby landmarks (2D bearing/range measurements). There are odometry factors between consecutive poses and landmark observation factors. Critically, when the robot returns to its starting area, loop closure factors create cycles in the graph.
Before loop closure, uncertainty grows along the trajectory (just like the 1D case). When the loop closure measurements arrive, GBP propagates this information back around the loop, tightening all pose estimates. The paper shows convergence in ~15-20 iterations of message passing with damping.
A robot (blue) traces a square path, observing landmarks (green). Ellipses show pose uncertainty. Press Play to watch GBP converge after loop closure. Red = initial noisy estimates, blue = GBP-refined estimates.
The paper makes a systematic case for GBP as the right algorithmic framework for Spatial AI. Here are the key arguments:
Graph processors like Graphcore's IPU have thousands of cores, each with local memory, connected by a fast communication fabric. GBP's "local compute + message passing" maps directly onto this: each variable and factor node lives on a core, messages are passed via the communication fabric. No global memory access, no centralized coordination.
GBP works with arbitrary, asynchronous message schedules. Nodes don't need to synchronize. This means it works across multiple chips, multiple devices, even across a network of robots doing cooperative mapping.
New measurements = new factor nodes. Just add them to the graph and start passing messages. No need to re-solve from scratch. Old parts of the graph that haven't changed don't need recomputation — their messages are already correct.
A real Spatial AI system has geometric measurements (odometry, visual features), semantic labels (from neural networks), calibration parameters, object models — all in one factor graph. GBP doesn't care what the factors represent. Every factor is just a Gaussian constraint. Geometric and semantic estimation happen together, naturally.
GBP can focus computation where it matters most. By choosing which messages to send first (residual scheduling), the algorithm automatically spends more computation on parts of the graph that have changed or are uncertain. "Just-in-time" estimation — compute accurate estimates for the parts you need right now, let other parts coast.
Pearl's Belief Propagation (1988): The foundational message-passing algorithm for probabilistic inference on graphical models. GBP is BP specialized to the Gaussian case.
Weiss & Freeman (2001): Proved that loopy GBP with Gaussians gives exact means when it converges — the theoretical justification for using GBP on loopy SLAM graphs.
Loopy SAM (Ranganathan et al., 2007): First demonstration of GBP for robot mapping. Showed promising results but didn't gain traction because CPU-based global solvers were faster at the time.
iSAM2 (Kaess et al., 2012): Incremental SLAM using Bayes trees — shares GBP's goal of incremental, partially-local updates, but still requires global knowledge of the tree structure.
FutureMapping 1 (Davison, 2018): The predecessor paper that laid out the vision for Spatial AI on graph processors. FutureMapping 2 provides the concrete algorithmic framework (GBP) to realize that vision.
Gaussian Splatting SLAM (2023-24): Modern systems like MonoGS, SplaTAM, and Gaussian-SLAM use 3D Gaussians as the scene representation — fitting naturally into the GBP framework where each Gaussian splat could be a variable node optimized via local message passing.
Graphcore IPU for Spatial AI: The specific hardware platform envisioned in this paper. IPUs have thousands of cores with in-processor memory and BSP (Bulk Synchronous Parallel) communication — exactly the architecture GBP is designed for.
NeRF-SLAM / Neural SLAM: Recent work combining neural scene representations with SLAM estimation. Factor graphs with neural factors (NeRF rendering losses) could use GBP for distributed optimization.
Expectation Propagation (EP): A generalization where each factor's message can approximate a non-Gaussian factor with a Gaussian. GBP is a special case of EP where all factors are already Gaussian.
Loopy BP in coding theory: LDPC and turbo code decoding use loopy BP on binary factor graphs — the same algorithm, different domain. Decades of convergence analysis from coding theory apply to GBP.
Variational inference: GBP can be viewed as minimizing the Bethe free energy — a variational approximation to the true log partition function. This connects it to the broader VI framework.