Turning outcomes into numbers — the bridge from sample spaces to computation.
You flip three fair coins. You could describe the outcome as (H, T, H), or (T, T, T), or any of the eight possible sequences. But often you don't care about the exact sequence — you care about how many heads you got. That number — 0, 1, 2, or 3 — captures the thing you actually want to reason about.
A random variable is exactly this idea: a way to assign a number to each outcome of an experiment. It is the bridge between the messy world of sample spaces and the clean world of arithmetic.
Think of it like a variable in a programming language. In Python, x = len(results) maps a list of outcomes to a single integer. A random variable does the same thing for probability: it takes an outcome from the sample space and maps it to a number. The value hasn't been determined yet — it depends on which outcome occurs — but once the experiment runs, the variable takes on exactly one value.
Here is the three-coin example. Define Y = number of heads on three fair coin flips:
| Event | Outcomes | Y value | Probability |
|---|---|---|---|
| No heads | (T,T,T) | 0 | 1/8 |
| One head | (H,T,T), (T,H,T), (T,T,H) | 1 | 3/8 |
| Two heads | (H,H,T), (H,T,H), (T,H,H) | 2 | 3/8 |
| Three heads | (H,H,H) | 3 | 1/8 |
Notice something crucial: multiple outcomes can map to the same number. Three different sequences all give Y = 1. The random variable collapses the sample space into something more manageable. Instead of tracking 8 outcomes, we now track 4 possible values and their probabilities.
Why is this useful? Because once we have a mapping from outcomes to numbers, we can ask powerful questions: What is the average value? How spread out are the values? What is the probability the value exceeds some threshold? These questions would be awkward to phrase in terms of raw outcomes, but with numbers they become natural.
random.randint(1, 6), it is sampling from a random variable. Hash functions, load balancers, randomized algorithms, A/B tests, machine learning — they all reason about random variables. This chapter gives you the formal language for that reasoning.Formally, a random variable is a function from the sample space Ω to the real numbers. If the experiment produces outcome ω ∈ Ω, then the random variable X assigns the number X(ω) to it.
We use capital letters (X, Y, Z) for random variables and lowercase letters (x, y, z) for specific non-random values. This convention is universal in probability and statistics.
A random variable is not the same thing as an event. An event is a scenario (a subset of the sample space). A random variable is an object that takes on a value. But you can create events from random variables using relational operators:
| Expression | Meaning | Type |
|---|---|---|
| Y | The random variable itself | Random variable |
| Y = 2 | "Y takes on the value 2" | Event |
| Y < 3 | "Y is less than 3" | Event |
| Y ≥ 1 | "Y is at least 1" | Event |
| P(Y = 2) | Probability of the event Y = 2 | Number |
This is just like coding. In Python, y is an integer variable, y == 2 is a boolean expression, and prob(y == 2) returns a float. Same hierarchy: variable → event (boolean) → probability (number).
Worked example: Let Y = number of heads on 3 coin flips. What is P(Y = y) for a general value y?
Where C(3, y) = 3! / (y!(3 − y)!) is the binomial coefficient. Let's check: C(3, 0)/8 = 1/8, C(3, 1)/8 = 3/8, C(3, 2)/8 = 3/8, C(3, 3)/8 = 1/8. These match our table from Chapter 0.
Every random variable has a set of properties that fully describe its behavior:
| Property | Notation | What it tells you |
|---|---|---|
| Support (range) | {0, 1, 2, 3} | Which values the RV can take on |
| PMF / PDF | P(X = x) | How likely each value is |
| CDF | P(X ≤ x) | Cumulative probability up to x |
| Expectation | E[X] | Weighted average (center of mass) |
| Variance | Var(X) | How spread out the values are |
We will build up each of these properties over this chapter and the next. The most fundamental is the PMF — once you know it, you can derive everything else.
The most important thing to know about a random variable is: how likely is each value? For discrete random variables, this information is captured by the probability mass function (PMF).
The PMF is a function that takes a value and returns the probability that the random variable equals that value:
Think of "mass" literally. Imagine you have one kilogram of probability to distribute across all possible values. The PMF tells you how much mass sits at each point. A fair die puts 1/6 kg at each of {1, 2, 3, 4, 5, 6}. The sum of two dice puts the most mass at 7 and tapers off toward the edges.
Let's see both examples. For a single fair die, X = value shown:
For Y = sum of two dice, the PMF has a triangular shape because there are more ways to roll a 7 (six combinations) than a 2 (one combination):
| y | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| P(Y=y) | 1/36 | 2/36 | 3/36 | 4/36 | 5/36 | 6/36 | 5/36 | 4/36 | 3/36 | 2/36 | 1/36 |
Toggle between 1 die and 2 dice. Hover over bars to see exact probabilities. The PMF shows how probability mass is distributed across possible values.
You can also write a PMF as code. Here is the PMF for the sum of two dice:
python def pmf_sum_two_dice(y): # Returns P(Y = y) where Y = sum of two dice if y < 2 or y > 12: return 0 if y <= 7: return (y - 1) / 36 else: return (13 - y) / 36
Worked example — from data to PMF: You simulate rolling two dice 10,000 times and count: the value 7 appears 1,672 times, the value 2 appears 275 times. What are the approximate PMF values?
P(Y = 7) ≈ 1672 / 10000 = 0.1672. The true value is 6/36 = 0.1667. Very close.
P(Y = 2) ≈ 275 / 10000 = 0.0275. The true value is 1/36 = 0.0278. Again, close. With more samples, these empirical frequencies converge to the true PMF — this is the definition of probability at work.
The PMF answers "what is the probability of exactly this value?" But often you want to ask: "what is the probability the value is at most x?" This cumulative question is answered by the cumulative distribution function (CDF).
The CDF at any point x is just the sum of all PMF values at or below x. It accumulates probability from left to right — hence the name "cumulative."
Worked example: Let Y = number of heads on 3 coin flips. The PMF is P(0) = 1/8, P(1) = 3/8, P(2) = 3/8, P(3) = 1/8. The CDF is:
| y | P(Y = y) [PMF] | F(y) = P(Y ≤ y) [CDF] |
|---|---|---|
| 0 | 1/8 = 0.125 | 1/8 = 0.125 |
| 1 | 3/8 = 0.375 | 4/8 = 0.500 |
| 2 | 3/8 = 0.375 | 7/8 = 0.875 |
| 3 | 1/8 = 0.125 | 8/8 = 1.000 |
The CDF always starts near 0 (for very small x) and ends at 1 (for large x). It is non-decreasing — it can only go up or stay flat, never drop. For discrete random variables, the CDF is a staircase function: it jumps up at each value in the support, and is flat between them.
The CDF is especially useful for answering range queries:
| Question | Formula | Example (Y = heads on 3 flips) |
|---|---|---|
| P(Y ≤ 2) | F(2) | 7/8 = 0.875 |
| P(Y > 1) | 1 − F(1) | 1 − 4/8 = 0.500 |
| P(1 < Y ≤ 3) | F(3) − F(1) | 1 − 0.5 = 0.500 |
The CDF for Y = number of heads on n coin flips. Toggle the number of coins. The staircase jumps at each integer, and the height of each jump equals the PMF at that point.
Worked example: Using the CDF for Y = sum of two dice, what is P(4 ≤ Y ≤ 9)?
P(4 ≤ Y ≤ 9) = F(9) − F(3). We need F(9) = (1+2+3+4+5+6+5+4)/36 = 30/36. And F(3) = (1+2)/36 = 3/36. So P(4 ≤ Y ≤ 9) = 27/36 = 3/4 = 0.75. Three quarters of the time, two dice sum to something between 4 and 9.
Now that we have two ways to describe a distribution — the PMF and CDF — let's nail down their formal properties. These are the rules of the game, and every valid distribution must obey them.
Three properties of a valid PMF:
Three properties of a valid CDF:
Worked example — is this a valid PMF? Someone claims a random variable Z has P(Z = 1) = 0.3, P(Z = 2) = 0.5, P(Z = 3) = 0.1.
Check non-negativity: all values are ≥ 0. Check sum: 0.3 + 0.5 + 0.1 = 0.9 ≠ 1. This is not a valid PMF. The mass doesn't add up. There's 0.1 of probability unaccounted for — either there's a missing value or the probabilities are wrong.
Converting between PMF and CDF:
python def pmf_to_cdf(pmf_values, support): # pmf_values[i] = P(X = support[i]) cdf = [] running_sum = 0 for p in pmf_values: running_sum += p cdf.append(running_sum) return cdf def cdf_to_pmf(cdf_values): # PMF is the difference of consecutive CDF values pmf = [cdf_values[0]] for i in range(1, len(cdf_values)): pmf.append(cdf_values[i] - cdf_values[i - 1]) return pmf
If X is a random variable and g is a function, then g(X) is also a random variable. You take whatever value X produces and feed it through g. The new random variable inherits its randomness from X, but the distribution can look completely different.
Example: Let X = value of a fair die roll, so X ∈ {1, 2, 3, 4, 5, 6}. Define Y = X2. Then Y takes values in {1, 4, 9, 16, 25, 36}, each with probability 1/6.
To find the PMF of g(X), work through each value of X:
If g maps multiple x values to the same y, you sum their probabilities. This is where things get interesting.
Worked example: Let X = fair die roll and define Y = X mod 3. What is the PMF of Y?
| X | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Y = X mod 3 | 1 | 2 | 0 | 1 | 2 | 0 |
Y = 0 when X ∈ {3, 6}, so P(Y = 0) = 2/6 = 1/3.
Y = 1 when X ∈ {1, 4}, so P(Y = 1) = 2/6 = 1/3.
Y = 2 when X ∈ {2, 5}, so P(Y = 2) = 2/6 = 1/3.
The die roll mod 3 produces a uniform distribution over {0, 1, 2}. This is not obvious — it happens because exactly two of the six die values map to each remainder.
Worked example — non-uniform collapse: Let X be uniform on {1, 2, 3, 4} and define Y = |X − 2.5|. Then:
X = 1 → |1 − 2.5| = 1.5. X = 2 → |2 − 2.5| = 0.5. X = 3 → |3 − 2.5| = 0.5. X = 4 → |4 − 2.5| = 1.5.
P(Y = 0.5) = P(X = 2) + P(X = 3) = 1/4 + 1/4 = 1/2. P(Y = 1.5) = P(X = 1) + P(X = 4) = 1/2.
The absolute deviation collapses 4 values into 2, and both outcomes are equally likely.
An indicator random variable is the simplest possible random variable. It equals 1 if some event happens and 0 otherwise. That's it.
This looks trivially simple, but indicator variables are one of the most powerful tools in all of probability. They turn counting problems into addition problems and make complex expected value calculations clean and elegant.
Why are they useful? Because of one magical property:
Worked example — counting with indicators: Flip 10 fair coins. Let X = number of heads. We can write X as a sum of indicators:
where Ik = 1 if coin k is heads, 0 otherwise. Now we can compute the expected number of heads without knowing the full PMF of X:
We used linearity of expectation — the expectation of a sum equals the sum of expectations. This works regardless of dependence. We will derive this fully in the next chapter on expectation, but the indicator decomposition technique is so important it deserves its own introduction.
Worked example — birthday problem: In a room of n people, what is the expected number of pairs who share a birthday? There are C(n, 2) = n(n − 1)/2 pairs. For each pair, define an indicator Iij = 1 if persons i and j share a birthday.
E[Iij] = P(same birthday) = 1/365 (assuming 365 equally likely birthdays).
By linearity: E[total matching pairs] = C(n, 2) · (1/365) = n(n − 1) / 730.
For n = 30: E = 30 · 29 / 730 ≈ 1.19 matching pairs. This is remarkably easy compared to computing the full distribution of matching pairs!
python # Indicator variable simulation: birthday problem import random def birthday_matching_pairs(n=30, trials=100000): total = 0 for _ in range(trials): bdays = [random.randint(1, 365) for _ in range(n)] pairs = 0 for i in range(n): for j in range(i + 1, n): if bdays[i] == bdays[j]: pairs += 1 total += pairs return total / trials # Returns ~1.19, matching n(n-1)/730
This is the payoff. Below is an interactive simulation that lets you build your own random variable from scratch. Roll dice, flip coins, or define a custom distribution — then see the PMF and CDF update in real time as you accumulate samples.
You can also run the experiment thousands of times at once and watch the empirical PMF (the histogram of observed frequencies) converge toward the true PMF. This is probability's fundamental promise: with enough trials, observed frequencies converge to true probabilities.
Select a distribution. Click Sample to draw one value, or Run 1000 to simulate many at once. The warm bars show the empirical PMF (observed frequencies). The teal line shows the true PMF. The lower panel shows the CDF. Watch them converge.
Try this experiment: select "Sum of 2 Dice" and click Sample 1 repeatedly. After about 30 samples, guess which value has the highest bar. Then click Run 1000 and see if you were right. The triangular shape of the PMF emerges reliably with enough data.
Problem 1: Building a PMF from scratch.
A spinner has 5 equal sectors labeled 1 through 5. You spin it twice and let X = maximum of the two spins. Find the PMF of X.
Solution: There are 25 equally likely outcomes (5 × 5). For X = k, we need both spins ≤ k and at least one spin = k.
P(X ≤ k) = (k/5)2 because both spins must be ≤ k. So P(X = k) = P(X ≤ k) − P(X ≤ k − 1) = k2/25 − (k − 1)2/25 = (2k − 1)/25.
| k | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| P(X = k) | 1/25 | 3/25 | 5/25 | 7/25 | 9/25 |
Check: 1 + 3 + 5 + 7 + 9 = 25, so the sum is 25/25 = 1. Valid PMF.
Problem 2: CDF range query.
Using X from Problem 1, find P(2 ≤ X ≤ 4).
Solution: P(2 ≤ X ≤ 4) = F(4) − F(1) = (1 + 3 + 5 + 7)/25 − 1/25 = 16/25 − 1/25 = 15/25 = 3/5 = 0.6.
Problem 3: Functions of random variables.
Let X be a fair die roll. Define Y = (X mod 2). Find the PMF of Y (i.e., is X even or odd?).
Solution: X mod 2 = 1 when X ∈ {1, 3, 5} and X mod 2 = 0 when X ∈ {2, 4, 6}. So P(Y = 0) = 3/6 = 1/2 and P(Y = 1) = 3/6 = 1/2. Y is a fair coin flip. This makes X mod 2 an indicator for "X is odd."
Problem 4: Indicator decomposition.
You deal 5 cards from a standard 52-card deck. What is the expected number of aces?
Solution: Let Ik = 1 if card k (of the 5 dealt) is an ace. By symmetry, P(card k is an ace) = 4/52 = 1/13 for each card position. By linearity of expectation:
On average, about 0.385 of the 5 cards are aces. Notice we did not need the hypergeometric distribution — the indicator trick made it trivial.
Problem 5: Validating a PMF.
A classmate claims P(X = k) = k/10 for k ∈ {1, 2, 3, 4}. Is this a valid PMF?
Solution: Sum = 1/10 + 2/10 + 3/10 + 4/10 = 10/10 = 1. All values are non-negative. This is a valid PMF. The random variable is more likely to take on larger values — P(X = 4) = 0.4 is the mode.
Random variables are the central abstraction of probability. Everything before this chapter — sample spaces, events, conditional probability — was building toward this moment. Everything after this chapter builds on top of random variables.
What comes next: Chapter 6 introduces expectation — the most important single number you can extract from a distribution. You already saw a preview with indicator variables: E[IA] = P(A). The full treatment will give you linearity of expectation, the Law of the Unconscious Statistician (LOTUS), and the tools to compute averages for any discrete random variable.
| Concept | What it answers | Where it leads |
|---|---|---|
| Random variable | What number does the outcome map to? | Everything in probability |
| PMF | How likely is each value? | Named distributions (Binomial, Poisson, ...) |
| CDF | How likely is "at most x"? | Continuous distributions, quantiles, percentiles |
| Functions of RVs | What happens when you transform a variable? | LOTUS, derived distributions |
| Indicator variables | Did event A happen (0 or 1)? | Counting arguments, linearity of expectation |