microSLAM — Simultaneous Localization and Mapping

Chapter 0: The Chicken-and-Egg Problem

Imagine you wake up in a dark building with a flashlight. You want to draw a floor plan (a map). But to mark anything on the map, you need to know where you are. And to know where you are, you need... a map. It's a chicken-and-egg problem.

SLAM (Simultaneous Localization And Mapping) solves both at the same time. As the robot moves, it observes landmarks, estimates its own pose, and builds a map — all in one continuous loop. It's one of the fundamental problems in robotics.

The core idea: You can't separate "where am I?" from "what does the world look like?" They're coupled. Improving your map improves your position estimate, and improving your position estimate improves your map. SLAM exploits this feedback loop.

The SLAM Problem

A robot (green) moves through unknown space, sensing landmarks (yellow). It must estimate both its own path and the landmark positions.

Check: Why is SLAM called a "chicken-and-egg" problem?

Because robots look like chickens You need a map to localize, but need your location to build the map Because the algorithm only works with two sensors

🔗 Pattern Recognition

SLAM Extends the Kalman Filter State Vector

Kalman Filter

Track ONE thing (e.g. a ball's position) with state x and covariance P. Predict → Update loop.

EKF-SLAM

Track the robot AND all landmarks jointly. Same predict → update loop, but the state vector keeps growing. → Kalman Filter lesson

The deep connection: SLAM is just a Kalman filter where you keep appending new states every time you discover something. The predict/update structure is identical — only the dimensionality grows.

If a Kalman filter tracks a single ball with a 4D state [x, y, vx, vy], what would the state look like if you tracked 3 balls simultaneously? How is that analogous to SLAM?

Chapter 1: What Is a Map?

A "map" in robotics isn't necessarily a picture. It's a representation of the environment that the robot can use for navigation and planning. There are several common types, each with tradeoffs.

Landmark Map

A list of distinctive point features with (x,y) coordinates. Sparse, compact, great for localization.

↓

Occupancy Grid

A 2D grid where each cell stores the probability of being occupied. Dense, good for path planning.

↓

Topological Map

A graph of places and connections. "Kitchen connects to hallway." Abstract, ignores geometry.

Map Representations

Toggle between three representations of the same room. Each shows the same space differently.

Classical SLAM typically uses landmark maps — a sparse set of distinctive points. This keeps the math tractable. Modern SLAM systems often produce dense maps (point clouds, meshes) as a byproduct.

Check: Which map type stores a probability for each grid cell?

Landmark map Occupancy grid Topological map

Chapter 2: Robot Pose & Coordinates

A robot's pose in 2D is three numbers: position (x, y) and heading angle θ. Together, they describe where the robot is and which direction it's facing. This is a rigid body transform — it tells us how to convert from the robot's frame to the world frame.

pose = (x, y, θ) where θ is the heading angle

When the robot moves, its new pose depends on the old pose plus a motion command (drive forward, turn). But real motors are imperfect, so every motion adds noise. Over time, this odometry drift accumulates and the robot becomes lost.

Odometry Drift

Watch how odometry (red dashed) drifts away from the true path (teal). Without corrections, errors compound.

Motion noise0.030

Key problem: Odometry drift is why robots can't just dead-reckon forever. After 100 meters, a 1% error means you're a full meter off. SLAM corrects this drift by observing landmarks.

Check: What three quantities define a robot's 2D pose?

Position x, position y, and heading θ Speed, acceleration, and jerk Roll, pitch, and yaw

Chapter 3: Feature Extraction (Frontend)

Before the robot can localize or map, it needs to find distinctive features in its sensor data. These are things like corners, edges, blobs — anything that looks unique and can be reliably re-detected later. This is called the frontend of a SLAM system.

Each detected feature gets a descriptor — a compact numerical summary that captures what it looks like. Think of it as a fingerprint. When the robot sees a feature again from a different viewpoint, the descriptor helps match it to the stored version.

Raw Sensor Data

Camera image, lidar scan, sonar pings

↓

Feature Detection

Find corners (Harris), blobs (SIFT), edges (Canny)

↓

Descriptor Extraction

Compute a fingerprint vector for each feature

↓

Feature List

[(x₁,y₁,d₁), (x₂,y₂,d₂), ...]

Corner Detection

Features are detected at corners and distinctive points (yellow). Move the threshold slider to see how detection sensitivity changes.

Detection threshold0.50

Good features are distinctive (not all walls look the same), repeatable (detected from different angles), and compact (quick to compare). Corners are ideal; flat surfaces are terrible.

Check: What is a feature descriptor?

The raw pixel values of the image A compact numerical fingerprint for re-identifying a feature The GPS coordinate of the feature

Chapter 4: Data Association

The robot sees a corner. Is it the same corner it saw 30 seconds ago, or a completely different one? This matching problem is called data association, and getting it wrong is catastrophic — if you match to the wrong landmark, your map becomes inconsistent.

The simplest approach is nearest neighbor: match each observation to the closest known landmark. But this fails when landmarks are close together. More robust methods include JCBB (Joint Compatibility Branch and Bound), which considers all matches jointly.

Data Association Challenge

Current observations (teal) must match to known landmarks (yellow). Lines show matches. Red lines are wrong matches. Click "Scramble" to see how ambiguity arises.

Wrong matches are poison. A single incorrect data association can corrupt the entire map. This is why robust matching algorithms and outlier rejection are critical in real SLAM systems.

Method	Approach	Robustness
Nearest Neighbor	Match to closest landmark	Low
Mahalanobis Gate	NN but weighted by uncertainty	Medium
JCBB	Check all matches jointly	High
RANSAC	Random sample consensus	High

Check: What is the data association problem?

Matching current observations to previously seen landmarks Transmitting data between two robots Converting sensor data to a database format

Checkpoint — Before you move on

A robot observes 3 features. Two match known landmarks easily, but the third is equidistant from two known landmarks. Explain: (1) Why is this ambiguity dangerous? (2) How would a Mahalanobis gate help? (3) What information besides distance could resolve it?

✓ Gate cleared

Model Answer

1. Why it's dangerous: If you match to the wrong landmark, the EKF update will "pull" both the robot pose and the landmark estimate toward an incorrect configuration. Worse, because EKF-SLAM maintains correlations between ALL landmarks, this one bad match can corrupt estimates for landmarks you aren't even currently observing. The covariance becomes inconsistent — the filter thinks it's certain about wrong positions.

2. Mahalanobis gate: Instead of raw Euclidean distance, the Mahalanobis distance weights by the innovation covariance S = H P H^T + R. A landmark with high uncertainty (large P block) will have a larger "gate" around it, while a well-localized landmark has a tight gate. This prevents matching to a precise landmark that happens to be nearby in space but would require an implausible robot pose correction.

3. Beyond distance: Feature descriptors! Two landmarks may be spatially close but look completely different (a corner vs. an edge). JCBB goes further — it checks whether the JOINT set of matches is mutually consistent, rejecting individual matches that would require contradictory robot poses.

Chapter 5: EKF-SLAM

The classic approach. We augment the robot's state vector with every landmark position. If we have N landmarks, the state is [x, y, θ, l_1x, l_1y, ..., l_Nx, l_Ny] — a vector of size 3+2N. The covariance matrix is (3+2N) × (3+2N), tracking correlations between everything.

The EKF (Extended Kalman Filter) handles the nonlinear motion and observation models by linearizing them at each step. Predict: move robot, grow uncertainty. Update: observe a landmark, shrink uncertainty — for the robot and all correlated landmarks.

State: x = [robot pose, landmark 1, landmark 2, ...]
Covariance: P is (3+2N) × (3+2N) — grows quadratically!

EKF-SLAM: Growing State Vector

Watch the state vector grow as new landmarks are discovered. Uncertainty ellipses (cyan) show correlations. Click "Step" or toggle auto-play.

Landmarks: 0 | State dim: 3

The O(N²) curse: Every update touches the entire covariance matrix. With 1000 landmarks, that's a million-entry matrix to update per step. EKF-SLAM doesn't scale to large environments.

Check: Why does EKF-SLAM struggle with large environments?

The robot moves too fast The covariance matrix grows quadratically with the number of landmarks It can only handle one sensor type

🔨 Derivation Why does EKF-SLAM covariance grow quadratically? ▶ ✓ ATTEMPTED

In EKF-SLAM, the state vector is x = [r, l₁, l₂, ..., l_N] where r is the 3D robot pose and each l_i is a 2D landmark position. The covariance matrix P tracks uncertainty and correlations between all state elements.

Your task: Show that (1) the covariance matrix is (3+2N) × (3+2N), (2) explain why the off-diagonal blocks are non-zero and what they physically mean, and (3) derive why a single EKF update step costs O(N²) operations.

P has block structure: P_rr (3×3 robot uncertainty), P_{rl_i} (3×2 cross-correlation between robot and landmark i), and P_{l_il_j} (2×2 cross-correlation between landmarks). How many such blocks exist?

When you observe landmark i from robot pose r, the EKF update adjusts BOTH the landmark and the robot pose. Since r is shared across all observations, updating r changes the "context" for all other landmarks. The Kalman gain K multiplied through the full state creates cross-correlations: "if the robot is actually 1m to the left, then ALL landmarks must also shift."

The EKF update is: P ← (I - KH)P. The Kalman gain K = PH^TS^-1 requires multiplying P (size (3+2N)×(3+2N)) by H^T. Even though H is sparse (only involves the observed landmark), the update (I-KH)P touches every element of P because K has non-zero entries for all state elements.

Step 1: State and covariance dimensions.

State x = [x_r, y_r, θ_r, x₁, y₁, x₂, y₂, ..., x_N, y_N]^T has dimension d = 3 + 2N. The covariance P is d×d = (3+2N)² entries.

Step 2: Why off-diagonals are non-zero.

Consider observing landmark j. The observation model is z = h(x_r, l_j) + noise. The Jacobian H has non-zero blocks only for x_r and l_j. BUT the Kalman gain K = PH^TS^-1 multiplies the FULL covariance P by H^T. Since P_{rl_i} blocks are non-zero (for previously observed landmarks), K has non-zero entries for ALL landmarks, not just the observed one.

Physically: P_{l_il_j} ≠ 0 means "landmarks i and j were both observed from the same uncertain robot pose, so their errors are correlated." If we learn landmark i is 1m east of where we thought, we gain information about landmark j too.

Step 3: O(N²) per update.

The covariance update P ← P - KSK^T (Joseph form) requires forming KSK^T which is (3+2N)×(3+2N). Even with sparse H, K is a (3+2N)×2 matrix (2 because observation is [range, bearing]), so KSK^T is a rank-2 update to the full P. Computing this outer product is O((3+2N)²) = O(N²).

The key insight: Correlations are both EKF-SLAM's strength (observing one landmark improves ALL estimates) and its curse (you can never drop the off-diagonal blocks without losing optimality, and they grow as O(N²)).

💻 Build It Implement EKF-SLAM Landmark Initialization ▶ ✓ ATTEMPTED

When the robot observes a feature that doesn't match any known landmark, it must augment the state vector and covariance matrix. Given a range-bearing observation (r, φ) from the robot's current pose, initialize the new landmark position and expand P.

signature def augment_state(mu, P, z_range, z_bearing, R_obs): """ Augment EKF-SLAM state with a new landmark. Args: mu: np.array, shape (3+2N,) -- current state [x,y,theta, lm1x,lm1y, ...] P: np.array, shape (3+2N, 3+2N) -- current covariance z_range: float -- measured range to new landmark z_bearing: float -- measured bearing to new landmark (relative to robot heading) R_obs: np.array, shape (2,2) -- observation noise covariance [[sigma_r^2, 0],[0, sigma_phi^2]] Returns: mu_new: np.array, shape (3+2N+2,) -- augmented state P_new: np.array, shape (3+2N+2, 3+2N+2) -- augmented covariance """

Test case

Robot at (5, 3, π/4), observes landmark at range=2, bearing=0 (straight ahead).
Expected landmark position: (5 + 2cos(π/4), 3 + 2sin(π/4)) = (6.41, 4.41).
New P should be (5×5) if starting with 1 existing landmark.

The landmark position in world coords is: l_x = x_r + r·cos(θ_r + φ), l_y = y_r + r·sin(θ_r + φ). The Jacobian G_r = ∂l/∂(x,y,θ) is needed for cross-correlations: P_{new row/col} = G_r · P_rr.

python
import numpy as np

def augment_state(mu, P, z_range, z_bearing, R_obs):
    n = len(mu)
    xr, yr, theta = mu[0], mu[1], mu[2]

    # 1. Landmark position from inverse observation model
    angle = theta + z_bearing
    lx = xr + z_range * np.cos(angle)
    ly = yr + z_range * np.sin(angle)

    # 2. Augment state vector
    mu_new = np.append(mu, [lx, ly])

    # 3. Jacobian of inverse obs model w.r.t. robot pose
    Gr = np.array([
        [1, 0, -z_range * np.sin(angle)],
        [0, 1,  z_range * np.cos(angle)]
    ])

    # 4. Jacobian w.r.t. observation [range, bearing]
    Gz = np.array([
        [np.cos(angle), -z_range * np.sin(angle)],
        [np.sin(angle),  z_range * np.cos(angle)]
    ])

    # 5. Augment covariance
    P_new = np.zeros((n + 2, n + 2))
    P_new[:n, :n] = P  # existing block unchanged

    # Cross-correlation: new landmark with existing state
    Prr = P[:3, :3]
    cross = Gr @ P[:3, :]  # (2 x n) cross-correlation row
    P_new[n:, :n] = cross
    P_new[:n, n:] = cross.T

    # New landmark self-covariance
    P_new[n:, n:] = Gr @ Prr @ Gr.T + Gz @ R_obs @ Gz.T

    return mu_new, P_new

Bonus: What happens to the cross-correlation row (P_new[n:, :n]) as the robot's heading uncertainty grows? Why does this mean landmarks initialized during high uncertainty are "less useful" for future corrections?

⚔ Adversarial: The hidden power of correlations

You run EKF-SLAM and observe 50 landmarks. The covariance matrix P shows strong off-diagonal correlations between landmarks 1-20 (observed early) but near-zero correlations between landmarks 40-50 (observed recently from uncertain poses). You then get a PERFECT GPS fix on the robot's current position.

Only the robot pose estimate improves Landmarks 40-50 improve most because they have high uncertainty Landmarks 1-20 improve most because they have strong correlations with the robot

Chapter 6: Particle Filter SLAM (FastSLAM)

FastSLAM takes a clever shortcut using Rao-Blackwellization. The insight: if you knew the robot's path exactly, the landmark positions would be conditionally independent. You wouldn't need that massive joint covariance matrix.

So FastSLAM uses particles to represent the robot's pose distribution. Each particle carries its own mini-EKF for each landmark. Since landmarks are independent given the path, each mini-EKF is just 2×2 — no quadratic blowup.

Particle (path hypothesis)

Each particle is one possible robot trajectory

↓

Per-particle landmark EKFs

Each landmark: independent 2×2 EKF. O(N) per particle.

↓

Resampling

Keep particles that match observations. Kill unlikely ones.

FastSLAM Particles

Each green dot is a particle (robot pose hypothesis). After resampling, particles cluster around the true pose. Landmarks shown in yellow.

Num particles50

Scaling: EKF-SLAM is O(N²) per update. FastSLAM is O(M·N) where M is the number of particles. With M=100 particles and N=1000 landmarks, that's 100K instead of 1M operations.

Check: What does Rao-Blackwellization exploit in FastSLAM?

Landmarks move with the robot Landmarks are conditionally independent given the robot's path The robot has perfect odometry

Chapter 7: Graph-Based SLAM

The modern workhorse. Instead of maintaining a filter, we build a graph. Each node is a robot pose (at some time) or a landmark position. Each edge is a constraint — an odometry measurement between poses, or a landmark observation. This forms a factor graph.

SLAM becomes an optimization problem: find the node positions that best satisfy all constraints simultaneously. This is solved with iterative least-squares methods like Gauss-Newton. The key advantage: we can re-optimize the entire trajectory when new information arrives (like a loop closure).

x* = argmin Σ_edges || z_ij − h(x_i, x_j) ||²_Σ

Factor Graph

Blue nodes = robot poses. Yellow nodes = landmarks. Edges = constraints. Click "Optimize" to see least-squares adjustment pull everything into consistency.

Factor graphs are sparse. Each constraint only connects a few nodes. This means the system matrix (Hessian) is sparse, and we can use efficient sparse solvers. Libraries like g2o, GTSAM, and Ceres make this practical.

Approach	State	Updates	Scalability
EKF-SLAM	Current only	Recursive	O(N²)
FastSLAM	Current + particles	Sequential	O(M·N)
Graph-SLAM	All history	Batch optimize	Sparse O(N)

Check: In graph-based SLAM, what do the edges represent?

The robot's velocity Walls in the environment Constraints (odometry or observation measurements)

🔨 Derivation The Information Filter — Why Graph-SLAM is Sparse ▶ ✓ ATTEMPTED

The EKF maintains covariance P (dense). The information filter maintains the inverse: the information matrix Ω = P^-1 and information vector ξ = P^-1μ. In SLAM, Ω is naturally sparse — zero entries between landmarks that were never observed together.

Your task: Show that (1) Ω_ij = 0 when landmarks i and j were never co-observed from the same robot pose, (2) explain why this sparsity makes graph-SLAM efficient, and (3) show the connection between Ω and the graph's adjacency structure.

In a Gaussian, the precision matrix (inverse covariance) encodes conditional independence. Ω_ij = 0 means "i and j are conditionally independent given all other variables." Two landmarks are conditionally independent unless they share a direct constraint (were observed from the same pose).

The factor graph has an edge between nodes i and j if and only if there's a measurement connecting them. The information matrix has Ω_ij ≠ 0 in exactly the same pattern. The matrix IS the adjacency structure of the graph (weighted by measurement precision).

Optimizing the graph requires solving Ω·Δx = ξ − Ω·x. If Ω is sparse (each row has ~5-10 non-zeros instead of N), sparse Cholesky factorization solves this in O(N) to O(N¹·⁵) instead of O(N³). This is why graph-SLAM scales to millions of poses.

Step 1: Conditional independence → sparse precision.

In a Gaussian N(μ, P), the precision matrix Ω = P^-1 has a beautiful property: Ω_ij = 0 ⇔ x_i ⊥ x_j | x_\{i,j\} (conditional independence). Landmark i and landmark j are conditionally independent given the robot poses they were observed from. If they were NEVER observed from the same pose, they have no direct constraint — only indirect ones through shared poses.

Step 2: Factor graph = sparsity pattern.

Each factor (measurement) adds to Ω only in the rows/columns of the connected nodes. An odometry factor between poses x_k and x_k+1 adds to Ω_k,k, Ω_k,k+1, Ω_k+1,k, Ω_k+1,k+1. A landmark observation from pose x_k to landmark l_j adds to their mutual entries. The rest stays zero.

Step 3: Sparsity → efficiency.

The least-squares normal equations give H^TΣ^-1H Δx* = H^TΣ^-1e where H is the Jacobian. The matrix H^TΣ^-1H IS the information matrix Ω. With N poses and M landmarks, Ω is (3N+2M)×(3N+2M) but each row has only ~10 non-zeros (degree of the graph). Sparse Cholesky (e.g., CHOLMOD) exploits this for near-linear solve time.

The key insight: EKF-SLAM maintains P (dense, because all landmarks are marginally correlated). Graph-SLAM works with Ω (sparse, because most landmarks are conditionally independent). Same information, dual representation — but sparsity changes the computational game entirely.

🔨 Derivation Loop Closure Constraint Math ▶ ✓ ATTEMPTED

The robot is at pose x_k (current) and recognizes it's revisiting the location of pose x₀ (from the past). The relative transform between x_k and x₀ is measured as z_loop with covariance Σ_loop.

Your task: Write the loop closure constraint as a factor in the least-squares objective, and show why it causes ALL intermediate poses x₁...x_k-1 to shift when you re-optimize.

The loop closure adds a term: ||z_loop − (x_k ⊖ x₀)||²_{Σ_loop} to the objective. Here x_k ⊖ x₀ is the relative transform (what the pose difference SHOULD be if the graph is consistent).

Before loop closure, the only path from x₀ to x_k goes through x₁, x₂, ..., x_k-1 (the odometry chain). If the loop closure says "x_k should actually be right next to x₀" but odometry says otherwise, the error gets distributed across ALL intermediate edges. Each intermediate pose shifts proportionally.

Full objective function:

x* = argmin Σ_i=0..k-1 ||z_odom,i − (x_i+1 ⊖ x_i)||²_{Σ_odom} + ||z_loop − (x_k ⊖ x₀)||²_{Σ_loop}

Why ALL poses shift: The Gauss-Newton system H^TΩH Δx = H^TΩe has H as a banded matrix (tridiagonal-ish) for the odometry chain, PLUS an entry connecting x₀ and x_k. Solving this linear system propagates the loop closure correction through the chain. The correction at pose x_i is proportional to i/k — poses near x₀ barely move, poses near x_k move the most. This is because odometry between consecutive poses is "stiff" (low noise), so bending the chain is distributed uniformly.

The key insight: A loop closure with covariance Σ_loop much smaller than accumulated odometry uncertainty (Σ_odom × k) effectively "snaps" the trajectory back. The tighter the loop constraint, the more dramatic the correction.

Chapter 8: Loop Closure

The robot has been exploring for 10 minutes. Odometry has drifted badly. Then it returns to a place it visited at the start — and recognizes it. This is loop closure, and it's the most impactful event in SLAM. A single loop closure can correct accumulated drift across the entire trajectory.

Detection uses place recognition: comparing the current scene's features against a database of all previously visited places. Bag-of-words (DBoW) and learned descriptors are common approaches. Once a loop is detected, a new constraint is added to the graph, and re-optimization snaps everything into alignment.

Loop Closure Correction

The red path shows drifted odometry forming a loop. Click "Close Loop" to see the constraint propagate corrections backward through the entire trajectory.

This is why graph-SLAM wins: When a loop closure arrives, graph-SLAM re-optimizes all poses. EKF-SLAM can only update the current state. FastSLAM struggles because particles can't "go back in time." The ability to retroactively fix the past is graph-SLAM's superpower.

Check: Why is loop closure so important?

It corrects accumulated drift across the entire trajectory It adds more landmarks to the map It makes the robot drive faster

💥 Break-It Lab What Dies When You Remove SLAM Components? ▶ ✓ ATTEMPTED

A robot drives in a large loop, observing landmarks along the way. The system includes: loop closure detection, data association, and landmark observations. Toggle each off to see what breaks.

Remove Loop Closure ACTIVE

Failure mode: Without loop closure, odometry drift accumulates unbounded. The robot returns to the start but the map shows a gap — the endpoint doesn't match the startpoint. Over long trajectories, the map "unravels" like a loose spring. The error grows as O(√t) with time.

Wrong Data Association (50% errors) ACTIVE

Failure mode: Matching observations to wrong landmarks corrupts the map catastrophically. The EKF "confidently" updates positions toward incorrect values, and because of correlations, one bad match poisons nearby landmarks. The map doesn't just drift — it folds onto itself. Unrecoverable without rejecting bad matches.

Reduce Observations (1/5th landmarks visible) ACTIVE

Failure mode: With few landmark observations, the robot relies mostly on odometry. Uncertainty grows between observations, and when it finally does see a landmark, the correction is large and jerky (high innovation). The map becomes sparse and poorly constrained — landmark estimates have high variance, making future data association harder (vicious cycle).

⚔ Adversarial: The mysterious 2-meter offset

Your SLAM system maps a 200m indoor corridor loop. The map looks perfect — walls are straight, corners are sharp. But when the robot returns to the starting position, the map endpoint is 2 meters away from the start. The raw odometry error over 200m is only 0.5% (1m expected). Loop closure is enabled and "fires" (you see it in the log). What's failing?

The IMU has too much bias The map needs more landmarks The loop closure constraint has an incorrect relative transform (perceptual aliasing or wrong match) The covariance matrix overflowed

Chapter 9: 2D SLAM Simulator

Time to see it all come together. Below is a 2D room with landmarks. A robot explores the space, observing landmarks within its sensor range. Watch the map build in real-time. Toggle between the EKF view (uncertainty ellipses) and the graph view (nodes and edges). Use arrow keys or click "Auto-explore" to drive the robot.

Live 2D SLAM

Landmarks found: 0

Sensor range100

Observation noise5

Arrow keys to drive manually. Sensor range shown as a circle around the robot.

Try this: Drive the robot in a big loop, then return to the start. Watch how re-observing old landmarks tightens the entire map. That's loop closure in action!

🏗 Design Challenge You're the Architect: Warehouse Drone SLAM ▶ ✓ ATTEMPTED

A logistics company needs autonomous inventory-scanning drones flying through a 10,000 m² warehouse. The drones must build and maintain a map, localize within 10cm accuracy, and handle dynamic obstacles (forklift drivers, moving pallets). Design the SLAM system.

Sensors

2D lidar (30Hz, 30m range) + wheel/prop odometry + downward camera

Compute

Jetson Orin NX (8GB, 70 TOPS) — shared with obstacle avoidance

Environment

10,000 m², metal shelves (reflective for lidar), repetitive aisles

Dynamics

~20 forklifts moving at any time, pallets relocated daily

Accuracy

<10cm position, <2° heading, must not drift over 8-hour shifts

Startup

Cold start (no prior map) must work; subsequent flights use saved map

1. EKF-SLAM, particle filter, or graph-based? The warehouse has ~50,000 potential features. What's feasible on Orin?

2. How do you handle perceptual aliasing? Every aisle looks identical to lidar. What breaks nearest-neighbor matching?

3. Dynamic obstacles: forklifts appear as "landmarks" then disappear. How do you prevent them from corrupting the map?

4. The 8-hour shift requirement means the map must be updated over time (shelves move). Lifelong SLAM or periodic re-mapping?

5. Multi-drone: 5 drones share the warehouse. How do they share map updates without bandwidth explosion?

1. Graph-SLAM (specifically Cartographer/LOAM variants). EKF-SLAM is O(N²) and dies at 50k features. Particle filters need too many particles for 10cm accuracy in 10,000m². Graph-SLAM with sliding window (keep last 500 poses, marginalize old ones) stays real-time on Orin. Use scan matching (ICP/NDT) instead of individual features to handle repetitive geometry.

2. Multi-modal constraints + aisle numbering. Pure lidar geometry is ambiguous in repetitive aisles. Solutions: (a) downward camera reads floor markings/barcodes for absolute localization (like waypoints), (b) count aisle transitions as topological constraints, (c) use multi-hypothesis tracking — maintain 2-3 loop closure candidates until one can be disambiguated.

3. Dynamic object filtering. (a) Compare current scan to static map — points that don't match the map are classified as dynamic and excluded from SLAM. (b) Use occupancy grid with decay: cells revert to "unknown" after not being observed as occupied for N scans. (c) Temporal consistency check: landmarks must persist for >5 observations before becoming part of the permanent map.

4. Hybrid: persistent static layer + ephemeral dynamic layer. Walls and shelving racks form the static map (updated monthly). Pallet positions form a dynamic layer (updated per-flight). The static map provides localization; the dynamic layer provides obstacle avoidance and inventory tracking. This is essentially "lifelong SLAM with semantic labels."

5. Centralized map server with delta updates. Each drone sends only its new pose graph nodes + scan descriptors to a central server. Server performs multi-robot graph optimization and broadcasts compressed map deltas. Bandwidth: ~50KB/s per drone (pose + descriptor), not full scans. The drones download "regions of interest" when entering a new area.

Chapter 10: Connections & Beyond

Classical SLAM is the foundation, but the field has exploded in the last decade. Modern systems combine visual features, inertial measurements, and even neural networks. Here's where SLAM connects to everything else.

System	What It Adds	When To Use
Visual SLAM (ORB-SLAM)	Camera features + bundle adjustment	Camera-only systems
VIO (Visual-Inertial Odometry)	Tightly coupled camera + IMU	Drones, AR/VR headsets
LiDAR SLAM (LOAM)	Point cloud matching	Self-driving cars, surveying
NeRF/3DGS SLAM	Neural scene representation	Dense 3D reconstruction

Related microLessons: The EKF is the heart of EKF-SLAM (see Kalman Filter lesson). Bayes filters are the general framework that all SLAM filters instantiate. Particle filters power FastSLAM.

Open problems: Long-term SLAM (maps that update over months), dynamic environments (moving objects), multi-robot SLAM (sharing maps), and semantic SLAM (understanding what objects are, not just where).

🔗 Pattern Recognition

Factor Graphs = Message Passing = Attention

Graph-SLAM

Pose nodes exchange information through shared measurements. Each optimization iteration propagates corrections through the graph. Constraints = edges.

Transformer Attention

Token nodes exchange information through attention weights. Each layer propagates context through the sequence. Similarity = edge weight. → Transformer lesson

The deep structure: both are message passing on graphs. In SLAM, messages are corrections weighted by measurement precision. In attention, messages are value vectors weighted by query-key similarity. Both iteratively refine node estimates by aggregating neighbor information. Graph neural networks, belief propagation, and loopy BP all share this backbone.

Modern SLAM systems (like those using learned features) sometimes use a GNN to predict loop closures. Why is this a natural fit given the message-passing structure above?

"SLAM is not one problem, but a family of problems — and no single solution dominates all domains."

— Cadena et al., "Past, Present, and Future of SLAM"

You now understand the core concepts behind every SLAM system. From vacuum robots to self-driving cars, the chicken-and-egg problem is the same — only the sensors and scale change.

Understand ClassicalSLAM