The Complete Beginner's Path

Understand the Bayes
Filter

The algorithm that lets a robot figure out where it is by combining movement and sensing — the parent of every probabilistic filter you'll ever meet.

Prerequisites: Basic probability + Curiosity. That's it.
9
Chapters
8+
Simulations
0
Assumed Knowledge

Chapter 0: Why Bayes?

You're a robot in a hallway. You can't see yourself from above. You have no GPS. All you have is a noisy sensor that can detect whether you're next to a door, and wheels that sometimes slip. Where are you?

You don't know — not for certain. But you can believe. You start with a rough guess spread across every possible position. Then you sense something: "I see a door." Some positions have doors, some don't, so your belief shifts. Then you move forward and sense again. With each sense-move cycle, your belief sharpens until you're nearly certain where you are.

The core idea: You can't eliminate uncertainty, but you can manage it. The Bayes filter maintains a probability distribution over all possible states, and refines it with every action and observation. It's how robots think about "where am I?"
Lost in the Hallway

The robot starts with no idea where it is (uniform belief). Click Sense or Move to watch the belief evolve.

Check: Why can't the robot just trust its wheel odometry to know its position?

Chapter 1: Bayes' Theorem — Updating What You Believe

Before we build a filter, we need one formula. But let's not start with the formula — let's start with counting.

Imagine 100 people. 20 are programmers. Of those 20 programmers, 15 drink coffee. Of the 80 non-programmers, 40 drink coffee. You meet someone drinking coffee. What's the probability they're a programmer?

Total coffee drinkers: 15 + 40 = 55. Programmers among them: 15. So: P(programmer | coffee) = 15/55 ≈ 0.27. That's Bayes' theorem in action — you updated your belief about someone being a programmer after observing they drink coffee.

P(A|B) = P(B|A) · P(A) / P(B)
Translation: posterior = likelihood × prior / evidence. The posterior is your updated belief. The prior is what you believed before. The likelihood is how well the evidence fits each hypothesis. The evidence normalizes everything to sum to 1.
Interactive Venn Diagram

Adjust the overlap to see how P(A|B) changes. A = hypothesis, B = evidence.

P(A) prior0.30
P(B|A) likelihood0.75
P(B|¬A)0.50
P(A|B) = 0.39
Check: In Bayes' theorem, what does the "prior" represent?

Chapter 2: Belief — A Distribution Over States

The robot's world is a 1D hallway divided into discrete cells. At any moment, the robot's belief is a probability distribution — a number for each cell saying "how likely am I to be here?" All the numbers sum to 1.

When the robot knows nothing, the belief is uniform: equal probability everywhere. As it gathers evidence, some cells become more probable and others shrink. The belief is the robot's internal mental model of where it might be.

Belief bel(x)
A probability for every cell: [0.05, 0.12, 0.03, ...]
Constraint
Σ bel(xi) = 1  —  probabilities must sum to 1
Visualization
A histogram with one bar per cell
Belief Histogram

Click cells to manually assign probability. The histogram always renormalizes to sum to 1.

Discrete vs continuous: In this lesson we use discrete cells (a histogram). The Kalman filter uses a continuous Gaussian. Particle filters use samples. But the concept is identical: belief = a distribution over states.
Check: What must always be true about a belief distribution?

Chapter 3: The Motion Model — P(x'|x,u)

When the robot's wheels turn, it tries to move forward. But wheels slip. The robot might overshoot, undershoot, or stay put. The motion model P(x'|x,u) describes: "if I'm at cell x and I command motion u, what's the probability I end up at cell x'?"

Mathematically, applying the motion model to a belief is a convolution: each cell "spreads" its probability to neighboring cells according to the motion kernel. The result is that the belief shifts in the direction of motion and blurs due to uncertainty. Information is lost. Entropy increases.

bel¯(x') = Σx P(x'|x, u) · bel(x)
Intuition: Motion always makes you less certain. If you were 90% sure you were at cell 5 and you move right by 1, maybe now you're 70% at cell 6, 10% at cell 5, and 10% at cell 7. The belief smears out. Only sensing can sharpen it back.
Motion Convolution

Start with a spike belief, then apply motion. Watch how the belief blurs and shifts with each step.

Exact prob0.60
Overshoot0.15
Steps: 0
Check: What happens to the belief when the robot moves?

Chapter 4: The Sensor Model — P(z|x)

The robot has a door sensor. It returns "door" or "no door." But it's not perfect — sometimes it says "door" when there's none (false positive), and sometimes it misses a real door (false negative). The sensor model P(z|x) encodes: "if I'm truly at cell x, what's the probability of reading z?"

This is the likelihood function. For each cell, it tells us how well the sensor reading "fits" that position. Cells with doors get a high likelihood when the sensor says "door." Cells without doors get a low likelihood. This is the information that sharpens belief.

P(z = door | x) = 0.9 if cell x has a door,   0.1 otherwise
Likelihood vs probability: P(z|x) is NOT the probability of being at x. It's the probability of seeing z if you were at x. Confusing these is the #1 Bayes mistake. The likelihood tells us how to weight each hypothesis.
Sensor Likelihood

The hallway has doors at specific cells (orange). Toggle the sensor reading to see how the likelihood changes.

Hit = 0.9, False alarm = 0.1
Hit rate0.90
Check: If the sensor says "door" and cell 3 has a door, what is P(z="door"|x=3) with hit rate 0.9?

Chapter 5: The Predict Step

The predict step applies the motion model to the current belief. Every cell's probability mass gets redistributed according to how the robot might move. The result is a blurred, shifted version of the old belief.

This is mathematically a convolution: bel¯(x') = Σx P(x'|x,u) bel(x). In the discrete case, we slide a small kernel across the histogram and sum. The output is always smoother (less certain) than the input.

Before Predict
Sharp belief — peaked at a few cells
↓ apply motion model
After Predict
Blurred belief — spread across more cells
Watch the Predict Step

The teal histogram is before prediction. Click Predict to apply one motion step. Watch the histogram blur and shift right.

Motion amount1
Motion noise0.30
Entropy: low
Key insight: Prediction ALWAYS increases uncertainty. Every predict step adds entropy. If the robot just drives forward without sensing, the belief eventually becomes uniform — it has no idea where it is. Only the update step can fight this drift.
Check: After many predict steps without any updates, the belief becomes...

Chapter 6: The Update Step

The update step incorporates a sensor reading. It's pure Bayes' theorem applied to every cell: multiply the predicted belief by the sensor likelihood, then renormalize so the probabilities sum to 1.

bel(x) = η · P(z|x) · bel¯(x)

Here η is the normalizing constant (1 / Σ P(z|xi) bel¯(xi)). The effect: cells that are consistent with the sensor reading get boosted; cells that are inconsistent get suppressed. The histogram sharpens. Uncertainty decreases.

The magic moment: Multiply and renormalize. That's the entire update. Cells where the sensor reading is likely get amplified. Cells where it's unlikely get crushed. The belief sharpens toward the truth. This is Bayesian inference in its purest form.
Watch the Update Step

Blue is predicted belief. Yellow is sensor likelihood. Green is the updated belief after multiply + normalize.

Check: The update step works by...

Chapter 7: The Full Algorithm — Robot Localization

Now we put it all together. The Bayes filter is an infinite loop of Predict (move, blur) and Update (sense, sharpen). Below is the classic Probabilistic Robotics visualization: a 1D hallway with colored doors, a robot that moves, and a belief histogram that evolves in real-time.

Initialize
bel(x) = 1/N (uniform — "I have no idea")
Predict
bel¯(x') = Σx P(x'|x, u) bel(x)
Update
bel(x) = η P(z|x) bel¯(x)
↓ repeat

The Bayes Filter in Python

Python
def bayes_filter(bel, u, z, motion_model, sensor_model):
    """One cycle of the discrete Bayes filter."""
    n = len(bel)

    # ── Predict (convolution) ──
    bel_bar = [0.0] * n
    for x_prime in range(n):
        for x in range(n):
            bel_bar[x_prime] += motion_model(x_prime, x, u) * bel[x]

    # ── Update (Bayes rule) ──
    for x in range(n):
        bel_bar[x] *= sensor_model(z, x)

    # ── Normalize ──
    eta = 1.0 / sum(bel_bar)
    return [b * eta for b in bel_bar]
Hallway Localization — The Showcase

A robot moves through a hallway with colored doors. Watch the belief histogram converge to the true position. The green bar marks the robot's true cell.

Sensor accuracy0.85
Motion noise0.20
Step: 0
Experiment: Reset and watch the uniform belief sharpen over 10-20 steps. Then crank sensor accuracy down to 55% and see how much slower convergence is. Try high motion noise — the belief re-blurs after every move, fighting the update step.
Check: In the Bayes filter, which step reduces uncertainty?

Chapter 8: The Filter Family Tree

The Bayes filter is not one algorithm — it's the parent of an entire family. Every probabilistic filter is a special case of the Bayes filter, making different assumptions about the belief representation and the motion/sensor models.

FilterBeliefAssumptionCost
Discrete BayesHistogramFinite grid of cellsO(n)
Kalman (KF)Gaussian (μ, Σ)Linear + Gaussian noiseO(n³)
EKFGaussianLocally linear (Jacobian)O(n³)
UKFGaussian (sigma pts)Smooth nonlinearityO(n³)
Particle FilterWeighted samplesAny distributionO(N·n)
The Family Tree
The unifying idea: Every filter does the same two steps — predict (apply motion, grow uncertainty) and update (incorporate measurement, shrink uncertainty). They only differ in how they represent the belief and how they compute the math. Understand the Bayes filter and you understand them all.
Where from here? The Kalman filter lesson covers the Gaussian continuous case in depth. Each filter in the family tree trades off expressiveness against computational cost. The discrete Bayes filter is the simplest and most intuitive — and now you understand it completely.
"Probability theory is nothing but common sense reduced to calculation."
— Pierre-Simon Laplace

You now understand Bayesian state estimation. Every sensor reading is noisy, every action is uncertain — but belief persists and refines. That's the Bayes filter.

Check: What makes the Kalman filter a special case of the Bayes filter?