Ch 4: Applications of Core Probability

Chapter 0: Why Applications?

In the first three chapters we built a toolkit: counting rules, probability axioms, conditional probability, Bayes' theorem. These are abstract machines. This chapter is where we use them.

Each section ahead tackles a self-contained problem from a different domain — cryptography, social networks, card games, machine learning, biology. The math is the same toolkit you already know. What changes is the modeling step: translating a messy real-world situation into a clean probability question.

The core skill: Every application in this chapter follows the same recipe. (1) Identify the experiment and sample space. (2) Apply counting or conditional probability rules. (3) Interpret the number. Modeling is the hard part — once the problem is framed, the math is mechanical.

Here is a preview of what we will cover:

Problem	Domain	Key Tool
Exactly k heads in n flips	Statistics, polling, medicine	Permutations of indistinct objects
Enigma configurations	Cryptography, WW2 history	Step rule, combinations
Running into a friend	Social networks	Complement rule
Unique card shuffles	Combinatorics	Permutations (52!)
Random graph density	Network science	Combinations, conditional counting
Gini impurity	Machine learning (decision trees)	Complement rule, weighted sums
Antibiotic resistance	Biology, evolution	LOTP, Bayes' theorem

Notice how diverse the domains are, yet how uniform the math is. That is the power of probability — the same handful of rules govern coin flips and bacterial evolution alike.

Check: What is the hardest part of applying probability to a real-world problem?

Computing the final arithmetic Modeling: translating the situation into a well-defined sample space and events Memorizing the formulas

Chapter 1: Many Coin Flips

You flip a coin n times. Each flip lands heads with probability p. What is the probability of getting exactly k heads? This sounds like a toy problem, but it is the foundation for polling, clinical trials, A/B testing, and much more. Anytime you have n independent trials with two outcomes, you are flipping coins.

Warmups

Let us build intuition with n = 10 and p = 0.6.

All heads: Each flip is independent, so multiply p by itself n times. P(all heads) = pⁿ = 0.6¹⁰ ≈ 0.006. Very unlikely — even with a biased coin.

All tails: The probability of tails on one flip is (1 − p). So P(all tails) = (1 − p)ⁿ = 0.4¹⁰ ≈ 0.0001. Even more unlikely, because tails is the less probable outcome.

First k heads, then n − k tails: For k = 4, this is the specific sequence HHHHTTTTTT. By independence: p^k · (1 − p)^n−k = 0.6⁴ · 0.4⁶ ≈ 0.000531.

The General Formula

Now the real question: exactly k heads in any order. The sequence HHHHTTTTTT is just one ordering. There are many others — HHTHHTTTT, THTHHHTTT, and so on. How many?

Each ordering is a permutation of k identical H's and (n − k) identical T's. By the permutations of indistinct objects rule from Chapter 1:

Number of orderings = n! / (k! · (n − k)!) = C(n, k)

Every ordering has the same probability — p^k · (1 − p)^n−k — because it contains exactly k heads and n − k tails, and the flips are independent. Since orderings are mutually exclusive (you can only get one exact sequence), we sum:

P(exactly k heads) = C(n, k) · p^k · (1 − p)^n−k

This is the Binomial formula. We will formalize it as the Binomial random variable in a later chapter. For now, just appreciate its structure: C(n,k) counts the orderings, p^k(1−p)^n−k gives the probability of each one.

Worked Example

Exactly k = 4 heads in n = 10 flips with p = 0.6:

P = C(10, 4) · 0.6⁴ · 0.4⁶ = 210 · 0.1296 · 0.004096 = 210 · 0.000531 ≈ 0.111

About an 11% chance. The most likely outcome is k = 6 heads (since p = 0.6 and n = 10, the "expected" number of heads is np = 6).

More Than k Heads

What about P(more than 4 heads)? Since "exactly 5 heads" and "exactly 6 heads" are mutually exclusive events, we just sum:

P(more than k heads) = ∑_i=k+1ⁿ C(n, i) · pⁱ · (1 − p)ⁿ⁻ⁱ

Key insight: The formula P(exactly k heads) = C(n,k) · p^k(1−p)^n−k works for any independent binary trial — voters choosing a candidate, patients responding to a drug, users clicking an ad. The "coin" metaphor extends far beyond actual coins.

Check: You flip a fair coin (p = 0.5) three times. What is P(exactly 2 heads)?

1/4 3/8 1/2

Chapter 2: The Enigma Machine

During World War 2, the Nazis encrypted their military communications using a device called the Enigma machine. Each day they chose a new configuration, and if the Allies could discover it, they could read every enemy message. Alan Turing and his team at Bletchley Park built one of the world's first computers to search through configurations. But how many configurations were there?

The Rotors

The Enigma machine has three rotors, each set to one of 26 positions. By the step rule:

Rotor configurations = 26 × 26 × 26 = 26³ = 17,576

The Plugboard

The machine also has a plugboard where wires swap pairs of letters. Each wire connects two distinct letters. Wires are indistinct (A↔B is the same as B↔A). Each letter can connect to at most one wire. How many ways can we place wires?

One wire: Choose 2 letters from 26. This is a combination: C(26, 2) = 325.

Two wires: Place the first wire in C(26, 2) ways. The second wire uses 2 of the remaining 24 letters: C(24, 2) ways. But since wires are indistinct, we have double-counted (wire A then wire B = wire B then wire A). Divide by 2!:

C(26,2) · C(24,2) / 2! = 325 · 276 / 2 = 44,850

Three wires: Extend the pattern and divide by 3! to account for wire ordering:

C(26,2) · C(24,2) · C(22,2) / 3! = 325 · 276 · 231 / 6 = 3,453,450

k wires in general: The pattern emerges clearly. For each wire we place, two letters are consumed, and we divide by k! because wire order does not matter:

Total_k = ∏_i=1^k C(28 − 2i, 2) / k!

Arbitrary number of wires: The sets of configurations with exactly i wires are mutually exclusive from those with exactly j wires (you cannot simultaneously use both i and j wires). So the total number of plugboard configurations sums over all possible wire counts (0 through 13, since 26 letters can support at most 13 pairs):

Total = ∑_k=0¹³ Total_k = 532,985,208,200,576

Worked Example: The Full Machine

The WW2 Enigma used exactly 10 wires (connecting 20 letters). The rotor set was chosen from 5 available rotors (adding C(5,3) = 60 ways). The total number of configurations:

26³ × Total₁₀ × 60 ≈ 159 × 10¹⁸

That is 159 quintillion settings. No human could search through them — which is exactly why Turing needed to build a machine.

Why this matters: The Enigma problem is pure combinatorics — the step rule and the combination formula, applied carefully with attention to indistinguishability. The same reasoning appears in modern cryptography: security depends on the search space being astronomically large.

Check: Why do we divide by k! when counting k indistinct wires on the plugboard?

Because the step rule treats wires as ordered, but the k! orderings of the same set of wires are identical Because each wire can be reversed Because we need to account for the rotor positions

Chapter 3: Serendipity & the Birthday Paradox

You live in an area with a large population — say Stanford with 17,000 students. You have about 150 friends. On a given day you see 100 random people. What are the chances you bump into at least one friend?

This is a complement problem. Instead of counting all the ways you could see a friend, count the probability you see none and subtract from 1.

Setting Up the Model

Each of the 100 people you see is drawn uniformly from the 17,000 population. Treat each sighting as independent. The probability that any one person is not your friend is (17,000 − 150) / 17,000 ≈ 0.9912.

Worked Example

P(no friends in 100 people) = (16,850 / 17,000)¹⁰⁰ ≈ 0.9912¹⁰⁰ ≈ 0.414

P(at least one friend) = 1 − 0.414 ≈ 0.586

About a 59% chance of a serendipitous encounter! This feels surprisingly high — that is the power of repeated independent trials. Even though each individual sighting has only a 0.88% chance of being a friend, 100 trials add up quickly.

The complement trick: Whenever you need P(at least one X), compute 1 − P(none). This avoids messy inclusion-exclusion over many overlapping events. The same logic underpins the Birthday Paradox: with just 23 people in a room, there is a >50% chance that two share a birthday.

This pattern shows up constantly. What is the chance at least one server in a cluster fails? At least one user converts? At least one mutation occurs? Always use the complement.

Serendipity Simulator

Adjust population, friends, and sightings. Click Simulate to run 1,000 trials and see how often you bump into at least one friend.

Population17000

Friends150

Sightings100

Check: With 17,000 people, 150 friends, and 100 sightings, is P(at least one friend) closer to 20%, 60%, or 95%?

About 20% About 60% About 95%

Chapter 4: Random Shuffles

Here is a surprising claim: if you shuffle a standard deck of cards seven times, you can be almost certain that the exact ordering has never existed before in the history of the universe. Let us prove this.

How Many Orderings?

A standard deck has 52 unique cards. The number of distinct orderings is the number of permutations of 52 distinct objects:

52! ≈ 8.07 × 10⁶⁷

For scale, there are roughly 10⁸² atoms in the observable universe. 52! is "only" 15 orders of magnitude smaller. It is an incomprehensibly large number.

How Many Shuffles Have Ever Happened?

We need an upper bound on n, the total number of card shuffles in human history. Overestimate aggressively: assume 7 billion people have been shuffling once per second since 52-card decks were invented around the 15th century (~500 years ≈ 1.6 × 10¹⁰ seconds).

n ≈ 7 × 10⁹ × 1.6 × 10¹⁰ ≈ 10²⁰

Probability Your Shuffle is Unique

Let U be the event that your shuffle is different from all n prior shuffles. Each prior shuffle was one of 52! orderings, so the probability that any single prior shuffle matches yours is 1/52!.

P(U) = (1 − 1/52!)ⁿ ≈ 1 − n/52! ≈ 1 − 10²⁰ / (8 × 10⁶⁷) ≈ 1 − 1.2 × 10⁻⁴⁸

That is so close to 1 that no computer can distinguish it from certainty. Your shuffle is unique. It was unique before you did it, and it will remain unique forever.

The takeaway: Factorials grow so fast that even modest-looking numbers like 52! dwarf anything in the physical universe. This is why brute-force search over permutations is hopeless — and why we need clever algorithms (like Turing built for Enigma).

Bayer and Diaconis (1992) proved that seven riffle shuffles suffice to produce a nearly uniform random ordering. Fewer than seven and the deck retains structure from its original order — a skilled magician could exploit this. More than seven offers diminishing returns. This sharp threshold at exactly seven is one of the most elegant results in applied combinatorics.

The methodology they used — analyzing random walks on the symmetric group S₅₂ — paved the way for studying pseudorandom numbers produced by computers. Every time your computer generates a "random" number, the question of how random is governed by the same mathematics.

A sense of scale: If you dealt one unique ordering per second, exhausting all 52! orderings would take about 2.5 × 10⁶⁰ years. The universe is roughly 1.4 × 10¹⁰ years old. You would need to run for 10⁵⁰ universe lifetimes. This is why your shuffle is guaranteed to be unique.

Check: Approximately how many distinct orderings of a 52-card deck exist?

About 8 × 10⁶⁷ About 52⁵² About 10⁸²

Chapter 5: Counting Random Graphs

Consider a network (graph) with 10 nodes. We add 12 random undirected edges, where each pair of nodes is equally likely to receive an edge. How dense is this graph, and what is the probability a randomly chosen node has a specific degree?

Step 1: Count Edge Locations

An undirected edge connects two distinct nodes. No self-loops. The number of possible edge locations is:

C(10, 2) = 45

Step 2: Count Ways to Place 12 Edges

We are choosing 12 locations from 45 possible ones. The number of unique graphs is:

C(45, 12) = 45! / (12! · 33!) ≈ 1.7 × 10¹¹

Step 3: Node Degree Distribution

Pick a node at random. It has 9 possible neighbors (every other node). We want P(degree = i) — the probability that exactly i of the 12 edges are incident to our node.

Think of it as a two-step counting problem for the event space:

Choose i edges from the 9 possible locations connected to our node: C(9, i) ways.
Choose the remaining 12 − i edges from the 45 − 9 = 36 locations not incident to our node: C(36, 12 − i) ways.

P(degree = i) = C(9, i) · C(36, 12 − i) / C(45, 12)

Worked Example: P(degree = 3)

P(degree = 3) = C(9,3) · C(36,9) / C(45,12) = 84 · 94,143,280 / 17,417,133,617 ≈ 0.254

About a 25% chance. This is actually the most likely degree for this particular graph configuration.

Key insight: This is a hypergeometric setup. We have a finite pool of 45 edge slots, 9 of which are "special" (incident to our node). We draw 12 without replacement. The probability of getting exactly i special slots is the hypergeometric formula — identical in structure to the formula above.

Check: In a network with 10 nodes, how many possible locations are there for undirected edges (no self-loops)?

45 100 90

Chapter 6: Set Diversity & Gini Impurity

Imagine a bag of 7 shapes: 6 squares and 1 triangle. You draw two shapes with replacement. What is the probability the two shapes are different? This probability — the chance of drawing a mismatched pair — is called the Gini impurity of the set. It measures diversity.

Computing P(same)

Treat each shape as distinct. The sample space has 7 × 7 = 49 equally likely ordered pairs. Let A = "both squares" and B = "both triangles." These are mutually exclusive.

|A| = 6 × 6 = 36, |B| = 1 × 1 = 1

P(same) = (36 + 1) / 49 = 37/49 ≈ 0.755

Gini Impurity

Gini = P(different) = 1 − P(same) = 1 − 37/49 = 12/49 ≈ 0.245

The set is mostly squares, so it is not very diverse. The Gini impurity is low.

General Formula

For a set of n items where n_i is the count of type i:

G = 1 − ∑_i (n_i / n)²

This is simply 1 minus the sum of squared proportions. Maximum impurity occurs when all types are equally represented. Minimum (G = 0) occurs when every item is the same type.

Worked Example: A More Diverse Set

Now consider 7 shapes: 4 squares, 2 triangles, 1 star.

P(same) = (4² + 2² + 1²) / 7² = (16 + 4 + 1) / 49 = 21/49

Gini = 1 − 21/49 = 28/49 = 4/7 ≈ 0.571

More diverse than before (0.571 vs 0.245), as expected — the items are more evenly spread across types.

Gini in Decision Trees

Decision trees and Random Forests use Gini impurity to decide which feature to split on. At each node, the algorithm tries every possible split and picks the one that produces child nodes with the lowest Gini impurity. A node with G = 0 is "pure" — all items belong to one class. The goal is to reduce impurity at every split.

Consider a split that sends items left or right. The left child has n_L items with Gini G_L, the right child has n_R items with Gini G_R. The algorithm picks the split that minimizes the weighted Gini:

G_weighted = (n_L / n) · G_L + (n_R / n) · G_R

This is just a weighted average of diversity in each child. The best split creates children that are as homogeneous as possible.

The connection: Gini impurity is just the probability that two randomly drawn items disagree. A low Gini means the set is homogeneous. A high Gini means it is diverse. Decision trees seek to create the most homogeneous groups possible.

Check: A set contains 5 red balls and 5 blue balls (10 total). What is the Gini impurity?

0.5 0.25 1.0

Chapter 7: Showcase — Interactive Explorations

Now let us bring the math to life. Below are two interactive simulations. The first lets you explore the Binomial coin-flip formula from Chapter 1. The second lets you experiment with Gini impurity from Chapter 6.

Coin Flip Distribution

Adjust n (number of flips) and p (probability of heads), and see the full probability distribution P(exactly k heads) for every k from 0 to n. The bar chart updates instantly.

Binomial Distribution Explorer

Drag the sliders to change n and p. The tallest bar is the most likely number of heads. Hover over bars to see exact probabilities.

n (flips)10

p (heads)0.60

Gini Impurity Explorer

Adjust the number of items in each category. Watch how the Gini impurity changes as the distribution becomes more or less uniform.

Gini Impurity Calculator

Set the count for each category. The bar shows the Gini impurity — 0 means perfectly pure, approaching 1 means maximum diversity.

Category A6

Category B3

Category C1

Try it: In the Binomial explorer, set p = 0.5 and increase n from 5 to 50. Notice how the distribution becomes more bell-shaped and concentrated around n/2. In the Gini explorer, set all three categories equal and watch the impurity reach its maximum near 0.667.

Check: In the Binomial distribution with n = 20 and p = 0.5, the most likely value of k is:

10 20 5

Chapter 8: Bacteria Evolution

Antibiotics are one of the great achievements of modern medicine. But bacteria evolve resistance, and probability helps us understand exactly how fast this happens.

The Setup

You have 1 million bacteria in your gut. 10% carry a mutation that makes them slightly more resistant to antibiotics. You take a course of antibiotics:

Bacteria with the mutation survive with probability 20%.
Bacteria without the mutation survive with probability 1%.

Two questions. First: what fraction of bacteria survive? Second: of those survivors, what fraction carry the mutation?

P(survival) via Law of Total Probability

Let E = "bacterium survives" and M = "bacterium has the mutation." By LOTP:

P(E) = P(E|M) · P(M) + P(E|M^C) · P(M^C)

P(E) = 0.20 × 0.10 + 0.01 × 0.90 = 0.02 + 0.009 = 0.029

About 2.9% of bacteria survive. That means roughly 29,000 out of 1 million.

P(mutation | survived) via Bayes' Theorem

Now the critical question — of the survivors, what fraction are mutants?

P(M|E) = P(E|M) · P(M) / P(E) = (0.20 × 0.10) / 0.029 ≈ 0.69

The punchline: Before antibiotics, 10% of bacteria had the mutation. After antibiotics, 69% of survivors have it. One course of antibiotics turned a rare mutation into the majority trait. If these survivors reproduce, the next generation is overwhelmingly resistant. This is natural selection, quantified by Bayes' theorem.

Worked Numerical Breakdown

Group	Count	Survival Rate	Survivors
Mutant (M)	100,000	20%	20,000
Non-mutant (M^C)	900,000	1%	9,000
Total	1,000,000		29,000

Among 29,000 survivors: 20,000 / 29,000 ≈ 69% are mutants. Exactly what Bayes' theorem predicted.

Bacteria Evolution Simulator

Adjust mutation prevalence and survival rates. Watch how a single round of selection shifts the population composition. Click Apply Antibiotics to run the selection step.

Initial mutation %10%

Mutant survival %20%

Normal survival %1%

Check: Before antibiotics, 10% of bacteria are mutants. After one course, about what percentage of survivors are mutants?

About 20% About 69% About 90%

Chapter 9: Connections

Let us step back and see the bigger picture. Every application in this chapter used the same small set of tools from Chapters 1–3:

Application	Tool Used	Where It Leads
Many coin flips	Permutations of indistinct objects	Binomial random variable (Ch 5+)
Enigma machine	Step rule, combinations	Cryptographic key spaces
Serendipity	Complement rule, independence	Birthday paradox, hash collisions
Random shuffles	Permutations (n!)	Pseudorandom number theory
Random graphs	Hypergeometric counting	Erdős–Rényi model, network science
Gini impurity	Complement rule, squared proportions	Decision trees, Random Forests
Bacteria evolution	LOTP + Bayes' theorem	Evolutionary biology, epidemiology

The pattern: Model the situation (define sample space, events). Apply counting or conditional probability. Interpret. The math is a small, reusable toolkit. The art is in the modeling.

Looking ahead: Chapter 5 introduces random variables — a formalism that lets us move from "P(exactly k heads)" to "a function X that maps outcomes to numbers." This abstraction will unify coin flips, node degrees, and Gini calculations under a single framework, leading to expected value, variance, and all of classical statistics.

The Binomial formula from coin flips becomes the Binomial distribution. The node-degree formula from random graphs becomes the Hypergeometric distribution. The Gini impurity becomes a measure of entropy. Each application here is a preview of a deeper theoretical object.

Chapters 1–3

Counting + Probability + Conditional

↓ apply to real problems

Chapter 4 (this chapter)

Coins, Enigma, Graphs, Gini, Bacteria

↓ formalize with random variables

Chapter 5+

Distributions, expectation, variance

Check: Which application in this chapter is a direct preview of the Binomial random variable?

Many coin flips (exactly k heads in n trials) Enigma machine configurations Gini impurity