Coin flips, enigma machines, shuffled decks, and bacterial evolution — probability meets the real world.
In the first three chapters we built a toolkit: counting rules, probability axioms, conditional probability, Bayes' theorem. These are abstract machines. This chapter is where we use them.
Each section ahead tackles a self-contained problem from a different domain — cryptography, social networks, card games, machine learning, biology. The math is the same toolkit you already know. What changes is the modeling step: translating a messy real-world situation into a clean probability question.
Here is a preview of what we will cover:
| Problem | Domain | Key Tool |
|---|---|---|
| Exactly k heads in n flips | Statistics, polling, medicine | Permutations of indistinct objects |
| Enigma configurations | Cryptography, WW2 history | Step rule, combinations |
| Running into a friend | Social networks | Complement rule |
| Unique card shuffles | Combinatorics | Permutations (52!) |
| Random graph density | Network science | Combinations, conditional counting |
| Gini impurity | Machine learning (decision trees) | Complement rule, weighted sums |
| Antibiotic resistance | Biology, evolution | LOTP, Bayes' theorem |
Notice how diverse the domains are, yet how uniform the math is. That is the power of probability — the same handful of rules govern coin flips and bacterial evolution alike.
You flip a coin n times. Each flip lands heads with probability p. What is the probability of getting exactly k heads? This sounds like a toy problem, but it is the foundation for polling, clinical trials, A/B testing, and much more. Anytime you have n independent trials with two outcomes, you are flipping coins.
Let us build intuition with n = 10 and p = 0.6.
All heads: Each flip is independent, so multiply p by itself n times. P(all heads) = pn = 0.610 ≈ 0.006. Very unlikely — even with a biased coin.
All tails: The probability of tails on one flip is (1 − p). So P(all tails) = (1 − p)n = 0.410 ≈ 0.0001. Even more unlikely, because tails is the less probable outcome.
First k heads, then n − k tails: For k = 4, this is the specific sequence HHHHTTTTTT. By independence: pk · (1 − p)n−k = 0.64 · 0.46 ≈ 0.000531.
Now the real question: exactly k heads in any order. The sequence HHHHTTTTTT is just one ordering. There are many others — HHTHHTTTT, THTHHHTTT, and so on. How many?
Each ordering is a permutation of k identical H's and (n − k) identical T's. By the permutations of indistinct objects rule from Chapter 1:
Every ordering has the same probability — pk · (1 − p)n−k — because it contains exactly k heads and n − k tails, and the flips are independent. Since orderings are mutually exclusive (you can only get one exact sequence), we sum:
Exactly k = 4 heads in n = 10 flips with p = 0.6:
About an 11% chance. The most likely outcome is k = 6 heads (since p = 0.6 and n = 10, the "expected" number of heads is np = 6).
What about P(more than 4 heads)? Since "exactly 5 heads" and "exactly 6 heads" are mutually exclusive events, we just sum:
During World War 2, the Nazis encrypted their military communications using a device called the Enigma machine. Each day they chose a new configuration, and if the Allies could discover it, they could read every enemy message. Alan Turing and his team at Bletchley Park built one of the world's first computers to search through configurations. But how many configurations were there?
The Enigma machine has three rotors, each set to one of 26 positions. By the step rule:
The machine also has a plugboard where wires swap pairs of letters. Each wire connects two distinct letters. Wires are indistinct (A↔B is the same as B↔A). Each letter can connect to at most one wire. How many ways can we place wires?
One wire: Choose 2 letters from 26. This is a combination: C(26, 2) = 325.
Two wires: Place the first wire in C(26, 2) ways. The second wire uses 2 of the remaining 24 letters: C(24, 2) ways. But since wires are indistinct, we have double-counted (wire A then wire B = wire B then wire A). Divide by 2!:
Three wires: Extend the pattern and divide by 3! to account for wire ordering:
k wires in general: The pattern emerges clearly. For each wire we place, two letters are consumed, and we divide by k! because wire order does not matter:
Arbitrary number of wires: The sets of configurations with exactly i wires are mutually exclusive from those with exactly j wires (you cannot simultaneously use both i and j wires). So the total number of plugboard configurations sums over all possible wire counts (0 through 13, since 26 letters can support at most 13 pairs):
The WW2 Enigma used exactly 10 wires (connecting 20 letters). The rotor set was chosen from 5 available rotors (adding C(5,3) = 60 ways). The total number of configurations:
That is 159 quintillion settings. No human could search through them — which is exactly why Turing needed to build a machine.
You live in an area with a large population — say Stanford with 17,000 students. You have about 150 friends. On a given day you see 100 random people. What are the chances you bump into at least one friend?
This is a complement problem. Instead of counting all the ways you could see a friend, count the probability you see none and subtract from 1.
Each of the 100 people you see is drawn uniformly from the 17,000 population. Treat each sighting as independent. The probability that any one person is not your friend is (17,000 − 150) / 17,000 ≈ 0.9912.
About a 59% chance of a serendipitous encounter! This feels surprisingly high — that is the power of repeated independent trials. Even though each individual sighting has only a 0.88% chance of being a friend, 100 trials add up quickly.
This pattern shows up constantly. What is the chance at least one server in a cluster fails? At least one user converts? At least one mutation occurs? Always use the complement.
Adjust population, friends, and sightings. Click Simulate to run 1,000 trials and see how often you bump into at least one friend.
Here is a surprising claim: if you shuffle a standard deck of cards seven times, you can be almost certain that the exact ordering has never existed before in the history of the universe. Let us prove this.
A standard deck has 52 unique cards. The number of distinct orderings is the number of permutations of 52 distinct objects:
For scale, there are roughly 1082 atoms in the observable universe. 52! is "only" 15 orders of magnitude smaller. It is an incomprehensibly large number.
We need an upper bound on n, the total number of card shuffles in human history. Overestimate aggressively: assume 7 billion people have been shuffling once per second since 52-card decks were invented around the 15th century (~500 years ≈ 1.6 × 1010 seconds).
Let U be the event that your shuffle is different from all n prior shuffles. Each prior shuffle was one of 52! orderings, so the probability that any single prior shuffle matches yours is 1/52!.
That is so close to 1 that no computer can distinguish it from certainty. Your shuffle is unique. It was unique before you did it, and it will remain unique forever.
Bayer and Diaconis (1992) proved that seven riffle shuffles suffice to produce a nearly uniform random ordering. Fewer than seven and the deck retains structure from its original order — a skilled magician could exploit this. More than seven offers diminishing returns. This sharp threshold at exactly seven is one of the most elegant results in applied combinatorics.
The methodology they used — analyzing random walks on the symmetric group S52 — paved the way for studying pseudorandom numbers produced by computers. Every time your computer generates a "random" number, the question of how random is governed by the same mathematics.
Consider a network (graph) with 10 nodes. We add 12 random undirected edges, where each pair of nodes is equally likely to receive an edge. How dense is this graph, and what is the probability a randomly chosen node has a specific degree?
An undirected edge connects two distinct nodes. No self-loops. The number of possible edge locations is:
We are choosing 12 locations from 45 possible ones. The number of unique graphs is:
Pick a node at random. It has 9 possible neighbors (every other node). We want P(degree = i) — the probability that exactly i of the 12 edges are incident to our node.
Think of it as a two-step counting problem for the event space:
About a 25% chance. This is actually the most likely degree for this particular graph configuration.
Imagine a bag of 7 shapes: 6 squares and 1 triangle. You draw two shapes with replacement. What is the probability the two shapes are different? This probability — the chance of drawing a mismatched pair — is called the Gini impurity of the set. It measures diversity.
Treat each shape as distinct. The sample space has 7 × 7 = 49 equally likely ordered pairs. Let A = "both squares" and B = "both triangles." These are mutually exclusive.
The set is mostly squares, so it is not very diverse. The Gini impurity is low.
For a set of n items where ni is the count of type i:
This is simply 1 minus the sum of squared proportions. Maximum impurity occurs when all types are equally represented. Minimum (G = 0) occurs when every item is the same type.
Now consider 7 shapes: 4 squares, 2 triangles, 1 star.
More diverse than before (0.571 vs 0.245), as expected — the items are more evenly spread across types.
Decision trees and Random Forests use Gini impurity to decide which feature to split on. At each node, the algorithm tries every possible split and picks the one that produces child nodes with the lowest Gini impurity. A node with G = 0 is "pure" — all items belong to one class. The goal is to reduce impurity at every split.
Consider a split that sends items left or right. The left child has nL items with Gini GL, the right child has nR items with Gini GR. The algorithm picks the split that minimizes the weighted Gini:
This is just a weighted average of diversity in each child. The best split creates children that are as homogeneous as possible.
Now let us bring the math to life. Below are two interactive simulations. The first lets you explore the Binomial coin-flip formula from Chapter 1. The second lets you experiment with Gini impurity from Chapter 6.
Adjust n (number of flips) and p (probability of heads), and see the full probability distribution P(exactly k heads) for every k from 0 to n. The bar chart updates instantly.
Drag the sliders to change n and p. The tallest bar is the most likely number of heads. Hover over bars to see exact probabilities.
Adjust the number of items in each category. Watch how the Gini impurity changes as the distribution becomes more or less uniform.
Set the count for each category. The bar shows the Gini impurity — 0 means perfectly pure, approaching 1 means maximum diversity.
Antibiotics are one of the great achievements of modern medicine. But bacteria evolve resistance, and probability helps us understand exactly how fast this happens.
You have 1 million bacteria in your gut. 10% carry a mutation that makes them slightly more resistant to antibiotics. You take a course of antibiotics:
Two questions. First: what fraction of bacteria survive? Second: of those survivors, what fraction carry the mutation?
Let E = "bacterium survives" and M = "bacterium has the mutation." By LOTP:
About 2.9% of bacteria survive. That means roughly 29,000 out of 1 million.
Now the critical question — of the survivors, what fraction are mutants?
| Group | Count | Survival Rate | Survivors |
|---|---|---|---|
| Mutant (M) | 100,000 | 20% | 20,000 |
| Non-mutant (MC) | 900,000 | 1% | 9,000 |
| Total | 1,000,000 | 29,000 |
Among 29,000 survivors: 20,000 / 29,000 ≈ 69% are mutants. Exactly what Bayes' theorem predicted.
Adjust mutation prevalence and survival rates. Watch how a single round of selection shifts the population composition. Click Apply Antibiotics to run the selection step.
Let us step back and see the bigger picture. Every application in this chapter used the same small set of tools from Chapters 1–3:
| Application | Tool Used | Where It Leads |
|---|---|---|
| Many coin flips | Permutations of indistinct objects | Binomial random variable (Ch 5+) |
| Enigma machine | Step rule, combinations | Cryptographic key spaces |
| Serendipity | Complement rule, independence | Birthday paradox, hash collisions |
| Random shuffles | Permutations (n!) | Pseudorandom number theory |
| Random graphs | Hypergeometric counting | Erdős–Rényi model, network science |
| Gini impurity | Complement rule, squared proportions | Decision trees, Random Forests |
| Bacteria evolution | LOTP + Bayes' theorem | Evolutionary biology, epidemiology |
Looking ahead: Chapter 5 introduces random variables — a formalism that lets us move from "P(exactly k heads)" to "a function X that maps outcomes to numbers." This abstraction will unify coin flips, node degrees, and Gini calculations under a single framework, leading to expected value, variance, and all of classical statistics.
The Binomial formula from coin flips becomes the Binomial distribution. The node-degree formula from random graphs becomes the Hypergeometric distribution. The Gini impurity becomes a measure of entropy. Each application here is a preview of a deeper theoretical object.