The Bayes filter: the universal algorithm that all probabilistic robotics builds upon.
A robot is moving through a corridor. It cannot directly observe where it is. All it has are noisy sensor readings and the knowledge of what motor commands it has issued. From this imperfect information, it must figure out its location.
This is the state estimation problem: infer the hidden state of the world from a sequence of actions and observations. The state might be the robot's position and orientation, or it might include the positions of obstacles, the charge level of the battery, or the locations of other agents. The key property is that the state is not directly observable — it must be inferred.
This chapter derives the Bayes filter from scratch. By the end, you'll understand the two-step predict-update cycle that underlies every probabilistic robotics algorithm — from the simplest histogram filter to the most sophisticated SLAM system.
Before diving into the filter, let's make sure the probability toolbox is sharp. We need just a few key concepts.
Random variable: A quantity whose value is uncertain. We write X for the variable and x for a specific value it might take. P(X = x) or just p(x) is the probability that X takes value x.
Joint probability: p(x, y) is the probability that X = x AND Y = y simultaneously.
Conditional probability: p(x | y) is the probability of x given that we know y. Defined as p(x | y) = p(x, y) / p(y).
Marginalization (sum rule): If we know the joint p(x, y) but only care about x, we sum out y:
For continuous variables, replace the sum with an integral. This is called the law of total probability. It says: to find p(x), consider all the ways x could happen (all possible values of y) and add up their contributions.
Bayes' rule: The star of the show. It lets us "flip" a conditional:
| Term | Name | In robotics |
|---|---|---|
| p(x | y) | Posterior | Belief about state given data |
| p(y | x) | Likelihood | Sensor model: "how likely is this reading if the robot is at x?" |
| p(x) | Prior | What we believed before this observation |
| p(y) | Evidence | Normalizing constant (often computed by summing numerator) |
The state xt at time t is a complete description of the world, as far as the robot cares. For a mobile robot on a flat floor, the state might be (x, y, θ) — position and heading. For a robot arm, it might be joint angles and velocities. The state is the minimal information needed to predict the future (given future actions).
The robot never observes the state directly. Instead, it has two types of data:
Sensor data zt
Observations of the environment: laser scans, camera images, sonar pings. These carry information about the state but are corrupted by noise.
Control data ut
Motor commands or odometry: "drive forward 1 meter," "turn left 30 degrees." These change the state but imperfectly due to actuator noise.
The belief bel(xt) is the robot's internal probability distribution over the state at time t, given all the data so far:
This is a function — for every possible state xt, it gives a probability (or probability density). A sharp, peaked belief means "I'm pretty sure where I am." A broad, flat belief means "I have no idea."
At every time step, the robot goes through the same cycle:
Two probabilistic models govern this loop:
Motion model p(xt | ut, xt-1): Given the previous state and the control command, what is the distribution over the new state? This encodes how the robot moves and how noisy its actuators are. If the robot commands "drive 1 meter forward," this distribution might be a Gaussian centered at 1 meter with some standard deviation reflecting wheel slip.
Measurement model p(zt | xt): Given the current state, what sensor readings would we expect? This encodes the physics of the sensor. For a laser pointing at a wall 3 meters away, this might be a Gaussian centered at 3 meters with some sensor noise.
Here it is — the most important algorithm in this entire book. Every filter in every subsequent chapter is a special case of this.
pseudocode # Bayes Filter Input: bel(xt-1), ut, zt # Prediction step (incorporate motion) for each xt: bel̅(xt) = ∫ p(xt | ut, xt-1) · bel(xt-1) dxt-1 # Update step (incorporate measurement) for each xt: bel(xt) = η · p(zt | xt) · bel̅(xt) return bel(xt)
Two lines. That's it. The entire Bayes filter is two operations applied at every time step:
Prediction (line 1):
For each possible new state xt, sum up the probability of arriving there from every possible previous state xt-1, weighted by how likely that previous state was. This "smears" the belief forward through the motion model. The belief gets wider (more uncertain).
Update (line 2):
Multiply the prediction by the likelihood of the observed sensor reading at each state. States consistent with the measurement get boosted; inconsistent states get suppressed. Then normalize (η). The belief gets sharper (less uncertain).
Let's unpack the prediction step. Written out fully:
This is a convolution of the old belief with the motion model. Think of it as taking each "particle" of probability in the old belief and pushing it forward through the motion model, accounting for the noise.
A concrete example: suppose the robot is at position 5 (with some uncertainty) and it commands "move right by 2." The motion model says the actual motion is Gaussian with mean 2 and standard deviation 0.5. Then:
• The probability mass at position 5 gets spread over a Gaussian centered at 7.
• The probability mass at position 4.5 (also possible) gets spread over a Gaussian centered at 6.5.
• All these shifted Gaussians are added up. The result is a new, wider distribution.
In the discrete case (histogram filter), the integral becomes a sum:
For each possible new state xt, loop over all possible previous states, multiply the transition probability by the old belief at that state, and sum. This is just matrix-vector multiplication: bel̅ = T · bel, where T is the transition matrix.
After prediction, we have bel̅(xt) — the belief after motion but before sensing. Now a measurement zt arrives. The update step is:
This is a pointwise multiplication followed by normalization. For each possible state xt, multiply the predicted belief by the likelihood that we'd observe zt if we were actually in state xt. Then normalize so the result integrates to 1.
Geometrically: the measurement model p(zt | xt) acts like a "window" or "mask" over the predicted belief. States where the measurement is likely get amplified. States where the measurement is unlikely get suppressed.
The update step can be dramatically informative. If the robot is uncertain about its position (broad bel̅) and then sees a distinctive landmark, the measurement model will have a sharp peak at the correct position. Multiplying the broad prior by the sharp likelihood produces a sharp posterior — the robot suddenly knows where it is.
But beware: the update step can also be misleading if the measurement model is wrong. If the robot misidentifies a landmark, the likelihood peaks in the wrong place, and the belief shifts to the wrong state. Robust sensor models and multi-hypothesis tracking (Chapter 7) help mitigate this.
This is the canonical example from the textbook. A robot is in a hallway with three doors. It doesn't know where it is. Its sensor can detect "door" or "no door" but with some error probability. Let's watch the Bayes filter in action.
The robot starts with uniform belief (no idea where it is). Doors at positions 2, 5, and 8 in a 10-cell hallway. Click Sense to observe "door" at current position, or Move Right to shift right by 1 cell. Watch the belief bars update.
Start by clicking Sense. The robot detects "door" and the belief shifts — the three door positions become more likely. But which door? The robot doesn't know yet; all three are plausible. Now click Move Right and then Sense again. If the second reading is also "door," positions consistent with two adjacent doors become dominant. After a few sense-move cycles, the belief converges to the correct position.
The Bayes filter makes one critical assumption: the Markov property. It states that the current state xt is a sufficient statistic for the future. In other words, if you know xt, then knowing xt-1, xt-2, ... gives you no additional predictive power.
This is what allows recursion. Without the Markov assumption, we'd need to condition on the entire history, and the filter would need to grow with every time step. With it, the belief at time t depends only on the belief at time t-1, plus the new action and observation.
In practice, the Markov assumption is always an approximation. Real robots have dynamics that depend on unobserved variables. But the assumption works surprisingly well because the belief itself acts as a buffer — by maintaining a full distribution rather than a point estimate, the system is robust to moderate violations.
| Markov holds | Markov violated |
|---|---|
| State includes all relevant variables | Unmodeled dynamics (wind, friction changes) |
| Sensor noise is independent across readings | Correlated sensor errors (bias drift) |
| Environment is static or included in state | Dynamic objects not modeled (people walking) |
This chapter established the mathematical backbone of probabilistic robotics: the Bayes filter.
| Concept | Key point |
|---|---|
| State xt | Hidden variable we want to estimate (e.g., robot pose) |
| Belief bel(xt) | Probability distribution over states, encoding all knowledge |
| Prediction | Convolve belief with motion model → belief gets wider |
| Update | Multiply by measurement likelihood → belief gets sharper |
| Markov assumption | Current state is sufficient for predicting the future |