Estimation & Robotics

Factor Graphs

The modern SLAM back-end. Stop filtering one state at a time — instead optimize the whole trajectory and map at once as a giant, sparse least-squares problem. Pose-graph optimization, bundle adjustment, and the loop closure that snaps a drifted map back into place.

Prerequisites: A measurement gives a noisy constraint between things + Least squares minimizes squared error. That’s it.

Chapters

Simulations

Assumed Knowledge

Chapter 0: Filtering vs Smoothing

A filter — Kalman, particle — is recursive: it keeps only the latest state estimate, folds in each new measurement, and throws the past away. That’s efficient and real-time, but it has two costs. It accumulates linearization error (each step linearizes once, forever), and when a robot returns to a place it’s seen before — a loop closure — a filter can’t easily go back and fix the whole drifted path between then and now, because it already discarded those past states.

Smoothing takes the opposite approach: keep all the states — the entire trajectory of poses, plus the map — and find the configuration of everything at once that best explains all the measurements. It’s a single, giant optimization over the whole history. More expensive, but far more accurate: it can re-linearize everything, and a loop closure constraint can ripple back and correct the entire trajectory. This is how modern SLAM back-ends work, and the data structure that makes it tractable is the factor graph.

The trap: “filtering and smoothing give the same answer, smoothing is just slower.” No — smoothing is more accurate, because filtering permanently bakes in early linearization errors and can’t exploit later information (like a loop closure) to fix the past. Smoothing keeps the whole problem open and re-solves it, so a discovery now can correct a mistake from long ago.

Filter (latest state) vs smoother (whole trajectory)

A filter tracks only the current pose (one highlighted dot); a smoother holds the entire trajectory in memory and optimizes it jointly. Drag the timeline — the filter forgets the past; the smoother keeps it all, ready to correct.

time0.60

What is the key advantage of smoothing over filtering?

It uses less memory It keeps and re-optimizes the whole trajectory, so it’s more accurate and a loop closure can correct the entire past path It needs no measurements

Chapter 1: The Factor Graph

A factor graph is a bipartite graph with two kinds of nodes. Variables — the unknowns we’re solving for: the robot’s poses at each timestep, and the positions of landmarks in the map. And factors — the constraints, one per measurement, each connecting the variables it relates. An odometry factor connects two consecutive poses (“I moved roughly this much between them”). A landmark observation factor connects a pose to a landmark (“from here, I saw that landmark over there”). A prior factor pins down a single variable (“I started at the origin”). And a loop closure factor connects two non-consecutive poses (“I’ve been here before”).

So the graph literally is the problem: the variables are what you want to find, and the factors are everything you know about them. Each factor encodes a measurement and its uncertainty. The whole estimation task becomes: find the assignment of all the variables that’s most consistent with all the factors. If this feels like a graph of nodes passing information — it is; factor graphs are cousins of the probabilistic graphical models and message-passing you saw with GNNs. But here the goal is to optimize the variables, not classify them.

A SLAM factor graph

Big nodes = robot poses (a trajectory); small nodes = landmarks; squares = factors (odometry between poses, observations pose→landmark, a prior). Click factor types to highlight them. The graph IS the estimation problem.

In a factor graph, what are the variables and the factors?

Variables are measurements; factors are sensors Variables are the unknowns to solve (poses, landmarks); factors are the measurement constraints connecting them Both are just timestamps

Chapter 2: Every Factor Is an Error

Concretely, each factor defines an error (a residual): the difference between what the measurement said and what the current variable estimates would predict. An odometry factor’s error is “the relative pose my variables imply between these two poses” minus “the relative pose odometry measured.” A landmark factor’s error is “where my variables say that landmark should appear from this pose” minus “where I actually observed it.” If the variables are perfect and noise-free, every error is zero. They never are, so each factor contributes a nonzero residual.

And each error is weighted by the measurement’s uncertainty — a precise sensor’s factor is stiff (small errors cost a lot), a noisy sensor’s factor is loose (errors are tolerated). The total cost of a configuration is the sum of all the squared, weighted errors across every factor. Minimizing that sum is the goal. This is exactly maximum-a-posteriori estimation under Gaussian noise: the most probable trajectory and map, given all measurements, is the one that minimizes this weighted sum of squared residuals. Every factor “pulls” the variables toward satisfying its measurement; the solution is the equilibrium of all those pulls.

A factor’s residual

A measurement (teal) says where a landmark should be relative to a pose; the current estimate (orange) disagrees. The gap is the residual — its squared, uncertainty-weighted length is this factor’s cost. Drag the estimate and watch the cost rise and fall.

estimate position0.35

What is the total cost that factor-graph optimization minimizes?

The number of factors The sum of all factors’ squared, uncertainty-weighted errors (residuals) — i.e. the MAP estimate under Gaussian noise The total travel distance

Chapter 3: The Optimization

So we have a big nonlinear least-squares problem: find the variable values (all poses, all landmarks — possibly thousands of dimensions) that minimize the sum of squared residuals. We solve it iteratively with Gauss-Newton or Levenberg-Marquardt. The recipe: linearize every factor’s error around the current estimate; this turns the problem into a single big linear least-squares system; solve that system for the best update step; apply the step; and repeat until the total error stops shrinking. Each iteration nudges all the variables together toward the configuration that satisfies all the constraints best.

You start from an initial guess (often the raw odometry, which has drifted), and the optimizer pulls the whole trajectory and map into global consistency. Because it re-linearizes at every iteration — unlike a filter, which linearizes once and moves on — smoothing avoids the filter’s accumulated linearization error. The catch: nonlinear least squares can get stuck in local minima, so a decent initial guess matters (Levenberg-Marquardt adds damping to stay robust). But given a reasonable start, this iterate-linearize-solve loop reliably finds the most consistent explanation of all your data.

Gauss-Newton: iterate toward minimum error

Total error vs. iteration: linearize, solve, step, repeat. The error plunges as the variables snap into consistency. Press iterate and watch the cost fall; a drifted start converges to the consistent solution.

How is the factor-graph optimization solved?

By a single closed-form matrix inverse Iteratively (Gauss-Newton/LM): linearize all factors, solve a big linear system for the update, apply, repeat — re-linearizing each step By gradient-free random search

Chapter 4: Sparsity — why it’s fast

A trajectory with 10,000 poses and 50,000 landmarks has a state of hundreds of thousands of dimensions. Solving a dense linear system that big each iteration would be hopeless — cost grows cubically. What saves us is sparsity. Each factor touches only a few variables: an odometry factor connects just two consecutive poses; an observation connects one pose and one landmark. No factor touches everything. So the linear system’s matrix is overwhelmingly zeros — each variable is directly linked to only its handful of neighbors.

And sparse linear systems can be solved dramatically faster than dense ones, with sparse factorization methods (sparse Cholesky, QR) that exploit the structure — near-linear in the number of variables, not cubic. This is the entire reason large-scale SLAM and bundle adjustment are tractable: the problem is huge but locally connected, and the optimizer exploits that local structure. The factor graph isn’t just a nice picture — its sparsity pattern is exactly the sparsity of the matrix you solve, so the graph structure directly determines how fast you can solve it. Good ordering of the variables (to keep the factorization sparse) is a real part of the craft.

The information matrix is sparse

The matrix linking variables: a nonzero block only where two variables share a factor. Mostly zeros — each pose links to its neighbors and a few landmarks. Drag the trajectory length: the matrix grows, but stays sparse (a thin band + loop-closure off-diagonals).

trajectory length12

Why is large-scale factor-graph optimization tractable?

The state is low-dimensional Each factor touches only a few variables, so the linear system is sparse — solvable far faster than a dense system It ignores most measurements

Chapter 5: Loop Closure — the killer feature

This is where smoothing earns its keep. As a robot drives, odometry drifts — small errors accumulate, so its estimated trajectory slowly diverges from reality. By the time it returns to its starting point, its map might be visibly warped, the start and end not lining up. Then it recognizes a place it’s seen before — a loop closure — and adds a single new factor: “pose 1000 is right next to pose 5.”

That one constraint is electric. The optimizer re-solves the whole graph, and the loop-closure error — the gap between where the loop should close and where it currently does — distributes back across the entire loop, correcting every pose in between. The warped trajectory snaps into a consistent shape; accumulated drift is wiped out in one optimization. A filter can’t do this — it discarded those intermediate poses. Because the smoother kept the whole graph, a discovery in the present rewrites the entire past into consistency. Watching a drifted SLAM map snap straight the instant a loop closes is one of the most satisfying sights in robotics, and it’s pure factor-graph optimization.

Why the correction spreads: the loop-closure factor pulls pose 1000 toward pose 5, but the odometry factors all along the loop pull each pose toward its neighbors. The optimizer finds the compromise that minimizes all errors together — so the single big loop-closure error gets shared out as many tiny corrections along the whole trajectory. The drift doesn’t vanish at one point; it’s smeared away everywhere.

Drift, then loop closure snaps it straight

A trajectory that should be a closed loop has drifted open (start and end don’t meet). Press close loop: a loop-closure factor is added and the graph re-optimizes — the drift distributes back and the loop snaps shut.

When a loop closure is added, how does it correct accumulated drift?

It only fixes the final pose Re-optimizing the whole graph distributes the loop-closure error back across all the intermediate poses, snapping the trajectory consistent It deletes the drifted poses

Chapter 6: Bundle Adjustment

Bundle adjustment is the factor graph specialized to vision — the engine behind structure-from-motion and visual SLAM. The variables are camera poses and 3-D points (the reconstructed scene). The factors are reprojection errors: project each 3-D point through each camera that saw it, and compare to where it actually appeared in that image. The error is the pixel distance between the projected point and the observed pixel. Minimize the sum of all these reprojection errors over all cameras and all points, jointly.

The name comes from the “bundles” of light rays from each 3-D point to each camera, all being adjusted into agreement. It’s the same nonlinear least-squares-on-a-sparse-factor-graph machinery, and it’s what turns a set of overlapping photos into a precise 3-D reconstruction (and what runs in the back-end of visual SLAM systems like ORB-SLAM). The sparsity here is beautiful: each 3-D point is seen by only some cameras, so the camera-point structure is extremely sparse — which is exactly why optimizing thousands of cameras and millions of points is feasible at all. Bundle adjustment is, in a sense, the original and most important factor-graph optimization.

Bundle adjustment: reprojection error

A 3-D point, projected into two cameras (lines), should land where it was observed (teal marks); the current estimate (orange) is off by the reprojection error. Drag the point estimate — bundle adjustment moves cameras AND points to minimize the total pixel error.

3-D point estimate0.35

What does bundle adjustment minimize?

The number of cameras The total reprojection error — the pixel distance between projected 3-D points and their observed positions — over all cameras and points jointly The image resolution

Chapter 7: Pose-Graph SLAM, Live (showcase)

The full pose-graph SLAM experience: a robot drives a loop, odometry drifts, and the estimated trajectory warps away from the true path. Add loop closures and run the optimizer, and watch the whole trajectory pull back into a consistent map. Crank the odometry drift and see how much a single loop closure can rescue.

Drifted trajectory → optimized map

Teal = true loop, orange = drifted estimate from odometry. Press optimize to add a loop closure and re-solve the factor graph — the orange estimate snaps toward the truth. Increase the drift and watch how badly raw odometry warps, and how well the optimization recovers it.

odometry drift0.50

Notice the optimization doesn’t just fix the endpoint — it redistributes the correction along the whole loop, so the entire estimated path bends back toward the truth. That global consistency, from a sparse least-squares solve over the whole trajectory, is the modern SLAM back-end in a nutshell. The front-end (odometry, place recognition) feeds factors in; the factor-graph back-end keeps the map globally consistent.

Chapter 8: Incremental Solving & Tools

Re-optimizing the entire graph from scratch every time a new measurement arrives would be wasteful — most of the solution barely changes. Incremental smoothing (iSAM, iSAM2) updates the solution efficiently as new factors stream in, re-solving only the parts of the graph that are actually affected, using a clever data structure (the Bayes tree) to track dependencies. This gives near-real-time smoothing — the accuracy of full optimization at close to filtering speed — which is why modern SLAM systems can smooth, not just filter, online.

In practice you don’t write the solver yourself; you use a library. GTSAM (factor graphs and iSAM), g2o (general graph optimization), and Ceres (Google’s nonlinear least-squares) are the workhorses — you define your variables and factors, and they handle the sparse linear algebra and iteration. And the broad lesson generalizes far beyond robotics: any problem that’s a sum of local error terms over shared variables — calibration, sensor fusion, structure-from-motion, even some learning problems — is a factor graph, and can be solved by this same sparse nonlinear-least-squares machinery. Filtering gives you the present cheaply; smoothing gives you the whole consistent history, and factor graphs are how.

Filtering vs smoothing accuracy over a loop

Estimation error along a looping trajectory. The filter (orange) drifts and stays drifted — it can’t fix the past. The smoother (teal) drifts too, until the loop closes, then snaps the whole path back to low error. Drag through the trajectory.

progress around loop0.50

What does incremental smoothing (iSAM) provide?

A way to avoid measurements Efficient online updates as new factors arrive — re-solving only the affected part — giving full-smoothing accuracy near real time A replacement for the factor graph

Chapter 9: Cheat Sheet & Connections

filtering → smoothing

keep the whole trajectory + map, not just the latest state

↓ model as a factor graph

variables + factors

poses/landmarks (unknowns) + measurement constraints (errors)

↓ minimize Σ squared weighted residuals (MAP)

nonlinear least squares

Gauss-Newton / LM; sparse → fast; loop closures correct the whole path

↓ online

iSAM + tools (GTSAM/g2o/Ceres)

incremental smoothing near real time; bundle adjustment for vision

Concept	What it is
Factor graph	variables (unknowns) + factors (measurement error terms)
Cost	Σ of squared, uncertainty-weighted residuals = MAP estimate
Solver	Gauss-Newton / LM, exploiting sparsity
Loop closure	one factor → re-optimize → drift corrected across the whole path
Bundle adjustment	cameras + 3-D points, minimize reprojection error
iSAM	incremental smoothing for online, near-real-time use

Keep exploring

→ Modern SLAM — where factor-graph back-ends live
→ Classical SLAM — the filtering-based predecessor
→ Kalman Filter / Particle Filter — the filtering alternatives
→ Multi-View Geometry — the geometry behind bundle adjustment

“What I cannot create, I do not understand.” You just rebuilt the modern SLAM back-end: model the whole trajectory and map as a factor graph of variables and measurement-error factors, minimize the sum of squared residuals by sparse nonlinear least squares, and let a single loop-closure factor re-optimize the entire past into consistency. Smoothing, not filtering — keep everything, and solve it all at once.