The Complete Beginner's Path

Understand Classical
SLAM

The algorithm that lets robots explore unknown environments — building a map while simultaneously figuring out where they are within it.

Prerequisites: Basic probability + Intuition for coordinates. That's it.
11
Chapters
8+
Simulations
0
Assumed Knowledge

Chapter 0: The Chicken-and-Egg Problem

Imagine you wake up in a dark building with a flashlight. You want to draw a floor plan (a map). But to mark anything on the map, you need to know where you are. And to know where you are, you need... a map. It's a chicken-and-egg problem.

SLAM (Simultaneous Localization And Mapping) solves both at the same time. As the robot moves, it observes landmarks, estimates its own pose, and builds a map — all in one continuous loop. It's one of the fundamental problems in robotics.

The core idea: You can't separate "where am I?" from "what does the world look like?" They're coupled. Improving your map improves your position estimate, and improving your position estimate improves your map. SLAM exploits this feedback loop.
The SLAM Problem

A robot (green) moves through unknown space, sensing landmarks (yellow). It must estimate both its own path and the landmark positions.

Check: Why is SLAM called a "chicken-and-egg" problem?
🔗 Pattern Recognition
SLAM Extends the Kalman Filter State Vector
Kalman Filter
Track ONE thing (e.g. a ball's position) with state x and covariance P. Predict → Update loop.
EKF-SLAM
Track the robot AND all landmarks jointly. Same predict → update loop, but the state vector keeps growing. → Kalman Filter lesson

The deep connection: SLAM is just a Kalman filter where you keep appending new states every time you discover something. The predict/update structure is identical — only the dimensionality grows.

If a Kalman filter tracks a single ball with a 4D state [x, y, vx, vy], what would the state look like if you tracked 3 balls simultaneously? How is that analogous to SLAM?

Chapter 1: What Is a Map?

A "map" in robotics isn't necessarily a picture. It's a representation of the environment that the robot can use for navigation and planning. There are several common types, each with tradeoffs.

Landmark Map
A list of distinctive point features with (x,y) coordinates. Sparse, compact, great for localization.
Occupancy Grid
A 2D grid where each cell stores the probability of being occupied. Dense, good for path planning.
Topological Map
A graph of places and connections. "Kitchen connects to hallway." Abstract, ignores geometry.
Map Representations

Toggle between three representations of the same room. Each shows the same space differently.

Classical SLAM typically uses landmark maps — a sparse set of distinctive points. This keeps the math tractable. Modern SLAM systems often produce dense maps (point clouds, meshes) as a byproduct.
Check: Which map type stores a probability for each grid cell?

Chapter 2: Robot Pose & Coordinates

A robot's pose in 2D is three numbers: position (x, y) and heading angle θ. Together, they describe where the robot is and which direction it's facing. This is a rigid body transform — it tells us how to convert from the robot's frame to the world frame.

pose = (x, y, θ)     where θ is the heading angle

When the robot moves, its new pose depends on the old pose plus a motion command (drive forward, turn). But real motors are imperfect, so every motion adds noise. Over time, this odometry drift accumulates and the robot becomes lost.

Odometry Drift

Watch how odometry (red dashed) drifts away from the true path (teal). Without corrections, errors compound.

Motion noise0.030
Key problem: Odometry drift is why robots can't just dead-reckon forever. After 100 meters, a 1% error means you're a full meter off. SLAM corrects this drift by observing landmarks.
Check: What three quantities define a robot's 2D pose?

Chapter 3: Feature Extraction (Frontend)

Before the robot can localize or map, it needs to find distinctive features in its sensor data. These are things like corners, edges, blobs — anything that looks unique and can be reliably re-detected later. This is called the frontend of a SLAM system.

Each detected feature gets a descriptor — a compact numerical summary that captures what it looks like. Think of it as a fingerprint. When the robot sees a feature again from a different viewpoint, the descriptor helps match it to the stored version.

Raw Sensor Data
Camera image, lidar scan, sonar pings
Feature Detection
Find corners (Harris), blobs (SIFT), edges (Canny)
Descriptor Extraction
Compute a fingerprint vector for each feature
Feature List
[(x1,y1,d1), (x2,y2,d2), ...]
Corner Detection

Features are detected at corners and distinctive points (yellow). Move the threshold slider to see how detection sensitivity changes.

Detection threshold0.50
Good features are distinctive (not all walls look the same), repeatable (detected from different angles), and compact (quick to compare). Corners are ideal; flat surfaces are terrible.
Check: What is a feature descriptor?

Chapter 4: Data Association

The robot sees a corner. Is it the same corner it saw 30 seconds ago, or a completely different one? This matching problem is called data association, and getting it wrong is catastrophic — if you match to the wrong landmark, your map becomes inconsistent.

The simplest approach is nearest neighbor: match each observation to the closest known landmark. But this fails when landmarks are close together. More robust methods include JCBB (Joint Compatibility Branch and Bound), which considers all matches jointly.

Data Association Challenge

Current observations (teal) must match to known landmarks (yellow). Lines show matches. Red lines are wrong matches. Click "Scramble" to see how ambiguity arises.

Wrong matches are poison. A single incorrect data association can corrupt the entire map. This is why robust matching algorithms and outlier rejection are critical in real SLAM systems.
MethodApproachRobustness
Nearest NeighborMatch to closest landmarkLow
Mahalanobis GateNN but weighted by uncertaintyMedium
JCBBCheck all matches jointlyHigh
RANSACRandom sample consensusHigh
Check: What is the data association problem?
Checkpoint — Before you move on
A robot observes 3 features. Two match known landmarks easily, but the third is equidistant from two known landmarks. Explain: (1) Why is this ambiguity dangerous? (2) How would a Mahalanobis gate help? (3) What information besides distance could resolve it?
✓ Gate cleared
Model Answer

1. Why it's dangerous: If you match to the wrong landmark, the EKF update will "pull" both the robot pose and the landmark estimate toward an incorrect configuration. Worse, because EKF-SLAM maintains correlations between ALL landmarks, this one bad match can corrupt estimates for landmarks you aren't even currently observing. The covariance becomes inconsistent — the filter thinks it's certain about wrong positions.

2. Mahalanobis gate: Instead of raw Euclidean distance, the Mahalanobis distance weights by the innovation covariance S = H P HT + R. A landmark with high uncertainty (large P block) will have a larger "gate" around it, while a well-localized landmark has a tight gate. This prevents matching to a precise landmark that happens to be nearby in space but would require an implausible robot pose correction.

3. Beyond distance: Feature descriptors! Two landmarks may be spatially close but look completely different (a corner vs. an edge). JCBB goes further — it checks whether the JOINT set of matches is mutually consistent, rejecting individual matches that would require contradictory robot poses.

Chapter 5: EKF-SLAM

The classic approach. We augment the robot's state vector with every landmark position. If we have N landmarks, the state is [x, y, θ, l1x, l1y, ..., lNx, lNy] — a vector of size 3+2N. The covariance matrix is (3+2N) × (3+2N), tracking correlations between everything.

The EKF (Extended Kalman Filter) handles the nonlinear motion and observation models by linearizing them at each step. Predict: move robot, grow uncertainty. Update: observe a landmark, shrink uncertainty — for the robot and all correlated landmarks.

State: x = [robot pose, landmark 1, landmark 2, ...]
Covariance: P is (3+2N) × (3+2N) — grows quadratically!
EKF-SLAM: Growing State Vector

Watch the state vector grow as new landmarks are discovered. Uncertainty ellipses (cyan) show correlations. Click "Step" or toggle auto-play.

Landmarks: 0 | State dim: 3
The O(N²) curse: Every update touches the entire covariance matrix. With 1000 landmarks, that's a million-entry matrix to update per step. EKF-SLAM doesn't scale to large environments.
Check: Why does EKF-SLAM struggle with large environments?
🔨 Derivation Why does EKF-SLAM covariance grow quadratically? ✓ ATTEMPTED

In EKF-SLAM, the state vector is x = [r, l1, l2, ..., lN] where r is the 3D robot pose and each li is a 2D landmark position. The covariance matrix P tracks uncertainty and correlations between all state elements.

Your task: Show that (1) the covariance matrix is (3+2N) × (3+2N), (2) explain why the off-diagonal blocks are non-zero and what they physically mean, and (3) derive why a single EKF update step costs O(N²) operations.

P has block structure: Prr (3×3 robot uncertainty), Prli (3×2 cross-correlation between robot and landmark i), and Plilj (2×2 cross-correlation between landmarks). How many such blocks exist?
When you observe landmark i from robot pose r, the EKF update adjusts BOTH the landmark and the robot pose. Since r is shared across all observations, updating r changes the "context" for all other landmarks. The Kalman gain K multiplied through the full state creates cross-correlations: "if the robot is actually 1m to the left, then ALL landmarks must also shift."
The EKF update is: P ← (I - KH)P. The Kalman gain K = PHTS-1 requires multiplying P (size (3+2N)×(3+2N)) by HT. Even though H is sparse (only involves the observed landmark), the update (I-KH)P touches every element of P because K has non-zero entries for all state elements.

Step 1: State and covariance dimensions.

State x = [xr, yr, θr, x1, y1, x2, y2, ..., xN, yN]T has dimension d = 3 + 2N. The covariance P is d×d = (3+2N)² entries.

Step 2: Why off-diagonals are non-zero.

Consider observing landmark j. The observation model is z = h(xr, lj) + noise. The Jacobian H has non-zero blocks only for xr and lj. BUT the Kalman gain K = PHTS-1 multiplies the FULL covariance P by HT. Since Prli blocks are non-zero (for previously observed landmarks), K has non-zero entries for ALL landmarks, not just the observed one.

Physically: Plilj ≠ 0 means "landmarks i and j were both observed from the same uncertain robot pose, so their errors are correlated." If we learn landmark i is 1m east of where we thought, we gain information about landmark j too.

Step 3: O(N²) per update.

The covariance update P ← P - KSKT (Joseph form) requires forming KSKT which is (3+2N)×(3+2N). Even with sparse H, K is a (3+2N)×2 matrix (2 because observation is [range, bearing]), so KSKT is a rank-2 update to the full P. Computing this outer product is O((3+2N)²) = O(N²).

The key insight: Correlations are both EKF-SLAM's strength (observing one landmark improves ALL estimates) and its curse (you can never drop the off-diagonal blocks without losing optimality, and they grow as O(N²)).

💻 Build It Implement EKF-SLAM Landmark Initialization ✓ ATTEMPTED
When the robot observes a feature that doesn't match any known landmark, it must augment the state vector and covariance matrix. Given a range-bearing observation (r, φ) from the robot's current pose, initialize the new landmark position and expand P.
signature def augment_state(mu, P, z_range, z_bearing, R_obs): """ Augment EKF-SLAM state with a new landmark. Args: mu: np.array, shape (3+2N,) -- current state [x,y,theta, lm1x,lm1y, ...] P: np.array, shape (3+2N, 3+2N) -- current covariance z_range: float -- measured range to new landmark z_bearing: float -- measured bearing to new landmark (relative to robot heading) R_obs: np.array, shape (2,2) -- observation noise covariance [[sigma_r^2, 0],[0, sigma_phi^2]] Returns: mu_new: np.array, shape (3+2N+2,) -- augmented state P_new: np.array, shape (3+2N+2, 3+2N+2) -- augmented covariance """
Test case
Robot at (5, 3, π/4), observes landmark at range=2, bearing=0 (straight ahead).
Expected landmark position: (5 + 2cos(π/4), 3 + 2sin(π/4)) = (6.41, 4.41).
New P should be (5×5) if starting with 1 existing landmark.
The landmark position in world coords is: lx = xr + r·cos(θr + φ), ly = yr + r·sin(θr + φ). The Jacobian Gr = ∂l/∂(x,y,θ) is needed for cross-correlations: Pnew row/col = Gr · Prr.
python
import numpy as np

def augment_state(mu, P, z_range, z_bearing, R_obs):
    n = len(mu)
    xr, yr, theta = mu[0], mu[1], mu[2]

    # 1. Landmark position from inverse observation model
    angle = theta + z_bearing
    lx = xr + z_range * np.cos(angle)
    ly = yr + z_range * np.sin(angle)

    # 2. Augment state vector
    mu_new = np.append(mu, [lx, ly])

    # 3. Jacobian of inverse obs model w.r.t. robot pose
    Gr = np.array([
        [1, 0, -z_range * np.sin(angle)],
        [0, 1,  z_range * np.cos(angle)]
    ])

    # 4. Jacobian w.r.t. observation [range, bearing]
    Gz = np.array([
        [np.cos(angle), -z_range * np.sin(angle)],
        [np.sin(angle),  z_range * np.cos(angle)]
    ])

    # 5. Augment covariance
    P_new = np.zeros((n + 2, n + 2))
    P_new[:n, :n] = P  # existing block unchanged

    # Cross-correlation: new landmark with existing state
    Prr = P[:3, :3]
    cross = Gr @ P[:3, :]  # (2 x n) cross-correlation row
    P_new[n:, :n] = cross
    P_new[:n, n:] = cross.T

    # New landmark self-covariance
    P_new[n:, n:] = Gr @ Prr @ Gr.T + Gz @ R_obs @ Gz.T

    return mu_new, P_new
Bonus: What happens to the cross-correlation row (P_new[n:, :n]) as the robot's heading uncertainty grows? Why does this mean landmarks initialized during high uncertainty are "less useful" for future corrections?
⚔ Adversarial: The hidden power of correlations
You run EKF-SLAM and observe 50 landmarks. The covariance matrix P shows strong off-diagonal correlations between landmarks 1-20 (observed early) but near-zero correlations between landmarks 40-50 (observed recently from uncertain poses). You then get a PERFECT GPS fix on the robot's current position.

Chapter 6: Particle Filter SLAM (FastSLAM)

FastSLAM takes a clever shortcut using Rao-Blackwellization. The insight: if you knew the robot's path exactly, the landmark positions would be conditionally independent. You wouldn't need that massive joint covariance matrix.

So FastSLAM uses particles to represent the robot's pose distribution. Each particle carries its own mini-EKF for each landmark. Since landmarks are independent given the path, each mini-EKF is just 2×2 — no quadratic blowup.

Particle (path hypothesis)
Each particle is one possible robot trajectory
Per-particle landmark EKFs
Each landmark: independent 2×2 EKF. O(N) per particle.
Resampling
Keep particles that match observations. Kill unlikely ones.
FastSLAM Particles

Each green dot is a particle (robot pose hypothesis). After resampling, particles cluster around the true pose. Landmarks shown in yellow.

Num particles50
Scaling: EKF-SLAM is O(N²) per update. FastSLAM is O(M·N) where M is the number of particles. With M=100 particles and N=1000 landmarks, that's 100K instead of 1M operations.
Check: What does Rao-Blackwellization exploit in FastSLAM?

Chapter 7: Graph-Based SLAM

The modern workhorse. Instead of maintaining a filter, we build a graph. Each node is a robot pose (at some time) or a landmark position. Each edge is a constraint — an odometry measurement between poses, or a landmark observation. This forms a factor graph.

SLAM becomes an optimization problem: find the node positions that best satisfy all constraints simultaneously. This is solved with iterative least-squares methods like Gauss-Newton. The key advantage: we can re-optimize the entire trajectory when new information arrives (like a loop closure).

x* = argmin Σedges || zij − h(xi, xj) ||²Σ
Factor Graph

Blue nodes = robot poses. Yellow nodes = landmarks. Edges = constraints. Click "Optimize" to see least-squares adjustment pull everything into consistency.

Factor graphs are sparse. Each constraint only connects a few nodes. This means the system matrix (Hessian) is sparse, and we can use efficient sparse solvers. Libraries like g2o, GTSAM, and Ceres make this practical.
ApproachStateUpdatesScalability
EKF-SLAMCurrent onlyRecursiveO(N²)
FastSLAMCurrent + particlesSequentialO(M·N)
Graph-SLAMAll historyBatch optimizeSparse O(N)
Check: In graph-based SLAM, what do the edges represent?
🔨 Derivation The Information Filter — Why Graph-SLAM is Sparse ✓ ATTEMPTED

The EKF maintains covariance P (dense). The information filter maintains the inverse: the information matrix Ω = P-1 and information vector ξ = P-1μ. In SLAM, Ω is naturally sparse — zero entries between landmarks that were never observed together.

Your task: Show that (1) Ωij = 0 when landmarks i and j were never co-observed from the same robot pose, (2) explain why this sparsity makes graph-SLAM efficient, and (3) show the connection between Ω and the graph's adjacency structure.

In a Gaussian, the precision matrix (inverse covariance) encodes conditional independence. Ωij = 0 means "i and j are conditionally independent given all other variables." Two landmarks are conditionally independent unless they share a direct constraint (were observed from the same pose).
The factor graph has an edge between nodes i and j if and only if there's a measurement connecting them. The information matrix has Ωij ≠ 0 in exactly the same pattern. The matrix IS the adjacency structure of the graph (weighted by measurement precision).
Optimizing the graph requires solving Ω·Δx = ξ − Ω·x. If Ω is sparse (each row has ~5-10 non-zeros instead of N), sparse Cholesky factorization solves this in O(N) to O(N¹·&sup5;) instead of O(N³). This is why graph-SLAM scales to millions of poses.

Step 1: Conditional independence → sparse precision.

In a Gaussian N(μ, P), the precision matrix Ω = P-1 has a beautiful property: Ωij = 0 ⇔ xi ⊥ xj | x\{i,j\} (conditional independence). Landmark i and landmark j are conditionally independent given the robot poses they were observed from. If they were NEVER observed from the same pose, they have no direct constraint — only indirect ones through shared poses.

Step 2: Factor graph = sparsity pattern.

Each factor (measurement) adds to Ω only in the rows/columns of the connected nodes. An odometry factor between poses xk and xk+1 adds to Ωk,k, Ωk,k+1, Ωk+1,k, Ωk+1,k+1. A landmark observation from pose xk to landmark lj adds to their mutual entries. The rest stays zero.

Step 3: Sparsity → efficiency.

The least-squares normal equations give HTΣ-1H Δx* = HTΣ-1e where H is the Jacobian. The matrix HTΣ-1H IS the information matrix Ω. With N poses and M landmarks, Ω is (3N+2M)×(3N+2M) but each row has only ~10 non-zeros (degree of the graph). Sparse Cholesky (e.g., CHOLMOD) exploits this for near-linear solve time.

The key insight: EKF-SLAM maintains P (dense, because all landmarks are marginally correlated). Graph-SLAM works with Ω (sparse, because most landmarks are conditionally independent). Same information, dual representation — but sparsity changes the computational game entirely.

🔨 Derivation Loop Closure Constraint Math ✓ ATTEMPTED

The robot is at pose xk (current) and recognizes it's revisiting the location of pose x0 (from the past). The relative transform between xk and x0 is measured as zloop with covariance Σloop.

Your task: Write the loop closure constraint as a factor in the least-squares objective, and show why it causes ALL intermediate poses x1...xk-1 to shift when you re-optimize.

The loop closure adds a term: ||zloop − (xk ⊖ x0)||²Σloop to the objective. Here xk ⊖ x0 is the relative transform (what the pose difference SHOULD be if the graph is consistent).
Before loop closure, the only path from x0 to xk goes through x1, x2, ..., xk-1 (the odometry chain). If the loop closure says "xk should actually be right next to x0" but odometry says otherwise, the error gets distributed across ALL intermediate edges. Each intermediate pose shifts proportionally.

Full objective function:

x* = argmin Σi=0..k-1 ||zodom,i − (xi+1 ⊖ xi)||²Σodom + ||zloop − (xk ⊖ x0)||²Σloop

Why ALL poses shift: The Gauss-Newton system HTΩH Δx = HTΩe has H as a banded matrix (tridiagonal-ish) for the odometry chain, PLUS an entry connecting x0 and xk. Solving this linear system propagates the loop closure correction through the chain. The correction at pose xi is proportional to i/k — poses near x0 barely move, poses near xk move the most. This is because odometry between consecutive poses is "stiff" (low noise), so bending the chain is distributed uniformly.

The key insight: A loop closure with covariance Σloop much smaller than accumulated odometry uncertainty (Σodom × k) effectively "snaps" the trajectory back. The tighter the loop constraint, the more dramatic the correction.

Chapter 8: Loop Closure

The robot has been exploring for 10 minutes. Odometry has drifted badly. Then it returns to a place it visited at the start — and recognizes it. This is loop closure, and it's the most impactful event in SLAM. A single loop closure can correct accumulated drift across the entire trajectory.

Detection uses place recognition: comparing the current scene's features against a database of all previously visited places. Bag-of-words (DBoW) and learned descriptors are common approaches. Once a loop is detected, a new constraint is added to the graph, and re-optimization snaps everything into alignment.

Loop Closure Correction

The red path shows drifted odometry forming a loop. Click "Close Loop" to see the constraint propagate corrections backward through the entire trajectory.

This is why graph-SLAM wins: When a loop closure arrives, graph-SLAM re-optimizes all poses. EKF-SLAM can only update the current state. FastSLAM struggles because particles can't "go back in time." The ability to retroactively fix the past is graph-SLAM's superpower.
Check: Why is loop closure so important?
💥 Break-It Lab What Dies When You Remove SLAM Components? ✓ ATTEMPTED
A robot drives in a large loop, observing landmarks along the way. The system includes: loop closure detection, data association, and landmark observations. Toggle each off to see what breaks.
Remove Loop Closure ACTIVE
Failure mode: Without loop closure, odometry drift accumulates unbounded. The robot returns to the start but the map shows a gap — the endpoint doesn't match the startpoint. Over long trajectories, the map "unravels" like a loose spring. The error grows as O(√t) with time.
Wrong Data Association (50% errors) ACTIVE
Failure mode: Matching observations to wrong landmarks corrupts the map catastrophically. The EKF "confidently" updates positions toward incorrect values, and because of correlations, one bad match poisons nearby landmarks. The map doesn't just drift — it folds onto itself. Unrecoverable without rejecting bad matches.
Reduce Observations (1/5th landmarks visible) ACTIVE
Failure mode: With few landmark observations, the robot relies mostly on odometry. Uncertainty grows between observations, and when it finally does see a landmark, the correction is large and jerky (high innovation). The map becomes sparse and poorly constrained — landmark estimates have high variance, making future data association harder (vicious cycle).
⚔ Adversarial: The mysterious 2-meter offset
Your SLAM system maps a 200m indoor corridor loop. The map looks perfect — walls are straight, corners are sharp. But when the robot returns to the starting position, the map endpoint is 2 meters away from the start. The raw odometry error over 200m is only 0.5% (1m expected). Loop closure is enabled and "fires" (you see it in the log). What's failing?

Chapter 9: 2D SLAM Simulator

Time to see it all come together. Below is a 2D room with landmarks. A robot explores the space, observing landmarks within its sensor range. Watch the map build in real-time. Toggle between the EKF view (uncertainty ellipses) and the graph view (nodes and edges). Use arrow keys or click "Auto-explore" to drive the robot.

Live 2D SLAM
Landmarks found: 0
Sensor range100
Observation noise5

Arrow keys to drive manually. Sensor range shown as a circle around the robot.

Try this: Drive the robot in a big loop, then return to the start. Watch how re-observing old landmarks tightens the entire map. That's loop closure in action!
🏗 Design Challenge You're the Architect: Warehouse Drone SLAM ✓ ATTEMPTED
A logistics company needs autonomous inventory-scanning drones flying through a 10,000 m² warehouse. The drones must build and maintain a map, localize within 10cm accuracy, and handle dynamic obstacles (forklift drivers, moving pallets). Design the SLAM system.
Sensors
2D lidar (30Hz, 30m range) + wheel/prop odometry + downward camera
Compute
Jetson Orin NX (8GB, 70 TOPS) — shared with obstacle avoidance
Environment
10,000 m², metal shelves (reflective for lidar), repetitive aisles
Dynamics
~20 forklifts moving at any time, pallets relocated daily
Accuracy
<10cm position, <2° heading, must not drift over 8-hour shifts
Startup
Cold start (no prior map) must work; subsequent flights use saved map
1. EKF-SLAM, particle filter, or graph-based? The warehouse has ~50,000 potential features. What's feasible on Orin?
2. How do you handle perceptual aliasing? Every aisle looks identical to lidar. What breaks nearest-neighbor matching?
3. Dynamic obstacles: forklifts appear as "landmarks" then disappear. How do you prevent them from corrupting the map?
4. The 8-hour shift requirement means the map must be updated over time (shelves move). Lifelong SLAM or periodic re-mapping?
5. Multi-drone: 5 drones share the warehouse. How do they share map updates without bandwidth explosion?

1. Graph-SLAM (specifically Cartographer/LOAM variants). EKF-SLAM is O(N²) and dies at 50k features. Particle filters need too many particles for 10cm accuracy in 10,000m². Graph-SLAM with sliding window (keep last 500 poses, marginalize old ones) stays real-time on Orin. Use scan matching (ICP/NDT) instead of individual features to handle repetitive geometry.

2. Multi-modal constraints + aisle numbering. Pure lidar geometry is ambiguous in repetitive aisles. Solutions: (a) downward camera reads floor markings/barcodes for absolute localization (like waypoints), (b) count aisle transitions as topological constraints, (c) use multi-hypothesis tracking — maintain 2-3 loop closure candidates until one can be disambiguated.

3. Dynamic object filtering. (a) Compare current scan to static map — points that don't match the map are classified as dynamic and excluded from SLAM. (b) Use occupancy grid with decay: cells revert to "unknown" after not being observed as occupied for N scans. (c) Temporal consistency check: landmarks must persist for >5 observations before becoming part of the permanent map.

4. Hybrid: persistent static layer + ephemeral dynamic layer. Walls and shelving racks form the static map (updated monthly). Pallet positions form a dynamic layer (updated per-flight). The static map provides localization; the dynamic layer provides obstacle avoidance and inventory tracking. This is essentially "lifelong SLAM with semantic labels."

5. Centralized map server with delta updates. Each drone sends only its new pose graph nodes + scan descriptors to a central server. Server performs multi-robot graph optimization and broadcasts compressed map deltas. Bandwidth: ~50KB/s per drone (pose + descriptor), not full scans. The drones download "regions of interest" when entering a new area.

Chapter 10: Connections & Beyond

Classical SLAM is the foundation, but the field has exploded in the last decade. Modern systems combine visual features, inertial measurements, and even neural networks. Here's where SLAM connects to everything else.

SystemWhat It AddsWhen To Use
Visual SLAM (ORB-SLAM)Camera features + bundle adjustmentCamera-only systems
VIO (Visual-Inertial Odometry)Tightly coupled camera + IMUDrones, AR/VR headsets
LiDAR SLAM (LOAM)Point cloud matchingSelf-driving cars, surveying
NeRF/3DGS SLAMNeural scene representationDense 3D reconstruction
Related microLessons: The EKF is the heart of EKF-SLAM (see Kalman Filter lesson). Bayes filters are the general framework that all SLAM filters instantiate. Particle filters power FastSLAM.
Open problems: Long-term SLAM (maps that update over months), dynamic environments (moving objects), multi-robot SLAM (sharing maps), and semantic SLAM (understanding what objects are, not just where).
🔗 Pattern Recognition
Factor Graphs = Message Passing = Attention
Graph-SLAM
Pose nodes exchange information through shared measurements. Each optimization iteration propagates corrections through the graph. Constraints = edges.
Transformer Attention
Token nodes exchange information through attention weights. Each layer propagates context through the sequence. Similarity = edge weight. → Transformer lesson

The deep structure: both are message passing on graphs. In SLAM, messages are corrections weighted by measurement precision. In attention, messages are value vectors weighted by query-key similarity. Both iteratively refine node estimates by aggregating neighbor information. Graph neural networks, belief propagation, and loopy BP all share this backbone.

Modern SLAM systems (like those using learned features) sometimes use a GNN to predict loop closures. Why is this a natural fit given the message-passing structure above?

"SLAM is not one problem, but a family of problems — and no single solution dominates all domains."
— Cadena et al., "Past, Present, and Future of SLAM"

You now understand the core concepts behind every SLAM system. From vacuum robots to self-driving cars, the chicken-and-egg problem is the same — only the sensors and scale change.