The algorithm that lets robots explore unknown environments — building a map while simultaneously figuring out where they are within it.
Imagine you wake up in a dark building with a flashlight. You want to draw a floor plan (a map). But to mark anything on the map, you need to know where you are. And to know where you are, you need... a map. It's a chicken-and-egg problem.
SLAM (Simultaneous Localization And Mapping) solves both at the same time. As the robot moves, it observes landmarks, estimates its own pose, and builds a map — all in one continuous loop. It's one of the fundamental problems in robotics.
A robot (green) moves through unknown space, sensing landmarks (yellow). It must estimate both its own path and the landmark positions.
A "map" in robotics isn't necessarily a picture. It's a representation of the environment that the robot can use for navigation and planning. There are several common types, each with tradeoffs.
Toggle between three representations of the same room. Each shows the same space differently.
A robot's pose in 2D is three numbers: position (x, y) and heading angle θ. Together, they describe where the robot is and which direction it's facing. This is a rigid body transform — it tells us how to convert from the robot's frame to the world frame.
When the robot moves, its new pose depends on the old pose plus a motion command (drive forward, turn). But real motors are imperfect, so every motion adds noise. Over time, this odometry drift accumulates and the robot becomes lost.
Watch how odometry (red dashed) drifts away from the true path (teal). Without corrections, errors compound.
Before the robot can localize or map, it needs to find distinctive features in its sensor data. These are things like corners, edges, blobs — anything that looks unique and can be reliably re-detected later. This is called the frontend of a SLAM system.
Each detected feature gets a descriptor — a compact numerical summary that captures what it looks like. Think of it as a fingerprint. When the robot sees a feature again from a different viewpoint, the descriptor helps match it to the stored version.
Features are detected at corners and distinctive points (yellow). Move the threshold slider to see how detection sensitivity changes.
The robot sees a corner. Is it the same corner it saw 30 seconds ago, or a completely different one? This matching problem is called data association, and getting it wrong is catastrophic — if you match to the wrong landmark, your map becomes inconsistent.
The simplest approach is nearest neighbor: match each observation to the closest known landmark. But this fails when landmarks are close together. More robust methods include JCBB (Joint Compatibility Branch and Bound), which considers all matches jointly.
Current observations (teal) must match to known landmarks (yellow). Lines show matches. Red lines are wrong matches. Click "Scramble" to see how ambiguity arises.
| Method | Approach | Robustness |
|---|---|---|
| Nearest Neighbor | Match to closest landmark | Low |
| Mahalanobis Gate | NN but weighted by uncertainty | Medium |
| JCBB | Check all matches jointly | High |
| RANSAC | Random sample consensus | High |
The classic approach. We augment the robot's state vector with every landmark position. If we have N landmarks, the state is [x, y, θ, l1x, l1y, ..., lNx, lNy] — a vector of size 3+2N. The covariance matrix is (3+2N) × (3+2N), tracking correlations between everything.
The EKF (Extended Kalman Filter) handles the nonlinear motion and observation models by linearizing them at each step. Predict: move robot, grow uncertainty. Update: observe a landmark, shrink uncertainty — for the robot and all correlated landmarks.
Watch the state vector grow as new landmarks are discovered. Uncertainty ellipses (cyan) show correlations. Click "Step" or toggle auto-play.
FastSLAM takes a clever shortcut using Rao-Blackwellization. The insight: if you knew the robot's path exactly, the landmark positions would be conditionally independent. You wouldn't need that massive joint covariance matrix.
So FastSLAM uses particles to represent the robot's pose distribution. Each particle carries its own mini-EKF for each landmark. Since landmarks are independent given the path, each mini-EKF is just 2×2 — no quadratic blowup.
Each green dot is a particle (robot pose hypothesis). After resampling, particles cluster around the true pose. Landmarks shown in yellow.
The modern workhorse. Instead of maintaining a filter, we build a graph. Each node is a robot pose (at some time) or a landmark position. Each edge is a constraint — an odometry measurement between poses, or a landmark observation. This forms a factor graph.
SLAM becomes an optimization problem: find the node positions that best satisfy all constraints simultaneously. This is solved with iterative least-squares methods like Gauss-Newton. The key advantage: we can re-optimize the entire trajectory when new information arrives (like a loop closure).
Blue nodes = robot poses. Yellow nodes = landmarks. Edges = constraints. Click "Optimize" to see least-squares adjustment pull everything into consistency.
| Approach | State | Updates | Scalability |
|---|---|---|---|
| EKF-SLAM | Current only | Recursive | O(N²) |
| FastSLAM | Current + particles | Sequential | O(M·N) |
| Graph-SLAM | All history | Batch optimize | Sparse O(N) |
The robot has been exploring for 10 minutes. Odometry has drifted badly. Then it returns to a place it visited at the start — and recognizes it. This is loop closure, and it's the most impactful event in SLAM. A single loop closure can correct accumulated drift across the entire trajectory.
Detection uses place recognition: comparing the current scene's features against a database of all previously visited places. Bag-of-words (DBoW) and learned descriptors are common approaches. Once a loop is detected, a new constraint is added to the graph, and re-optimization snaps everything into alignment.
The red path shows drifted odometry forming a loop. Click "Close Loop" to see the constraint propagate corrections backward through the entire trajectory.
Time to see it all come together. Below is a 2D room with landmarks. A robot explores the space, observing landmarks within its sensor range. Watch the map build in real-time. Toggle between the EKF view (uncertainty ellipses) and the graph view (nodes and edges). Use arrow keys or click "Auto-explore" to drive the robot.
Arrow keys to drive manually. Sensor range shown as a circle around the robot.
Classical SLAM is the foundation, but the field has exploded in the last decade. Modern systems combine visual features, inertial measurements, and even neural networks. Here's where SLAM connects to everything else.
| System | What It Adds | When To Use |
|---|---|---|
| Visual SLAM (ORB-SLAM) | Camera features + bundle adjustment | Camera-only systems |
| VIO (Visual-Inertial Odometry) | Tightly coupled camera + IMU | Drones, AR/VR headsets |
| LiDAR SLAM (LOAM) | Point cloud matching | Self-driving cars, surveying |
| NeRF/3DGS SLAM | Neural scene representation | Dense 3D reconstruction |
You now understand the core concepts behind every SLAM system. From vacuum robots to self-driving cars, the chicken-and-egg problem is the same — only the sensors and scale change.