The framework that lets machines reason about cause and effect under uncertainty — from medical diagnosis to spam filtering to gene regulatory networks.
It's raining outside. The grass is wet. Are these facts independent? Obviously not — rain causes wet grass. But the sprinkler could also cause wet grass. And maybe the season affects both rain and sprinkler usage. Events in the real world form a tangled web of dependencies.
A Bayesian network represents this web compactly. Instead of specifying a probability for every possible combination of events (which grows exponentially), we only specify local dependencies — how each variable relates to its direct causes.
Click nodes to toggle their state (true/false). Watch how evidence propagates through the network.
A Bayesian network is a DAG: a Directed Acyclic Graph. Each node is a random variable. Each directed edge (arrow) means "this variable directly influences that one." The "acyclic" part means no loops — you can't follow arrows and end up where you started.
The graph encodes the factorization of the joint probability. For variables X1, ..., Xn with parents Pa(Xi):
This factorization is why Bayes nets are efficient. Instead of one giant table with 2n entries, we have n small tables, each conditioned on only a few parents.
Hover over nodes to see their parents, children, and Markov blanket highlighted.
Each node stores a CPT (Conditional Probability Table). For a root node (no parents), it's just a prior: P(Rain) = 0.2. For a node with parents, it specifies the probability for each combination of parent states: P(Wet Grass | Rain=T, Sprinkler=T) = 0.99.
The CPT is where domain knowledge lives. A doctor building a medical Bayes net fills in CPTs like P(Cough | Flu=yes, Smoker=yes) = 0.85 based on clinical experience or data.
Click nodes to see their CPT. Adjust probabilities with the sliders below.
| Rain | Sprinkler | P(Wet=T) |
|---|---|---|
| F | F | 0.01 |
| F | T | 0.90 |
| T | F | 0.80 |
| T | T | 0.99 |
The whole point of a Bayes net is to encode conditional independencies. Two variables X and Y are conditionally independent given Z (written X ⊥ Y | Z) if knowing Z makes X irrelevant for predicting Y.
Example: Does knowing the season help you predict if the grass is wet? Yes. But if you already know whether it rained AND whether the sprinkler was on, then season adds nothing. Grass is conditionally independent of Season given {Rain, Sprinkler}.
Select two variables and a conditioning set. The display shows whether they're conditionally independent in this network.
How do you read off conditional independencies from the graph structure? The answer is d-separation. It's a purely graphical criterion: check if information can flow between two nodes along a path, given what you've observed.
There are three fundamental structures. Each behaves differently when the middle node is observed vs. unobserved:
Click the middle node to toggle observing it. Watch how information flow (green = flows, red = blocked) changes for each structure.
Given evidence (some variables observed), we want to compute the posterior probability of a query variable. The brute-force way: sum over all combinations of unobserved variables. But that's exponential. Variable elimination does it smarter by summing out variables one at a time, exploiting the factored structure.
Query: P(Rain | Wet=true). Click "Next Step" to eliminate variables one at a time. Watch the factors combine and shrink.
For tree-structured networks (no undirected loops), there's an elegant alternative: belief propagation (the sum-product algorithm). Each node sends messages to its neighbors summarizing what it knows. After two passes (leaves to root, root to leaves), every node has the exact marginal posterior.
A message from node i to node j says: "Based on everything I've seen on my side of the tree, here's what I believe about your state." It's local computation with global results.
Watch messages flow through a tree. Click "Send Messages" to see each round of propagation. After two passes, all beliefs are exact.
Build your own Bayesian network! Click to add nodes, drag between nodes to add edges. Set CPTs and run inference queries. This is the full SLAM experience — but for probability.
So far we've assumed the graph structure is given. But what if you have data and need to discover the dependencies? This is structure learning — arguably the hardest problem in Bayesian networks.
There are two main approaches: score-based (search over possible graphs, score each one) and constraint-based (test conditional independencies in the data, build the graph from the results).
Generate data from a hidden network. Click "Learn Structure" to watch the PC algorithm discover the DAG by testing conditional independencies.
Bayesian networks are everywhere. An HMM (Hidden Markov Model) is a special case — a chain-structured Bayes net where hidden states emit observations. A Kalman filter is a continuous Gaussian Bayes net with linear dynamics.
| Model | Structure | Variables | Key Algorithm |
|---|---|---|---|
| Bayes Net | General DAG | Discrete/continuous | Variable elimination |
| HMM | Chain | Discrete hidden | Forward-backward |
| Kalman Filter | Chain | Continuous Gaussian | Predict-update |
| MRF/CRF | Undirected | Any | Belief propagation |
| Causal Model | DAG + interventions | Any | do-calculus |
You now understand the language of probabilistic reasoning with graphs. From medical diagnosis to robot perception, Bayesian networks turn uncertain evidence into principled decisions.