Ch 1: Counting & Combinatorics — Piech Probability CS

Chapter 0: Why Counting?

You probably thought you finished learning how to count somewhere around kindergarten. But the kind of counting that underpins probability is a different beast entirely. We are not counting apples. We are counting possibilities — the number of ways an experiment can unfold, the number of configurations a system can occupy, the number of outcomes that satisfy a constraint.

Why does this matter for computer science? Because every time you analyze an algorithm's behavior, hash data into buckets, generate cryptographic keys, or train a machine learning model, you are implicitly reasoning about how many things are possible. If you cannot count the possibilities, you cannot compute probabilities.

The core idea: Counting is the foundation of probability. Every probability is a ratio — "favorable outcomes divided by total outcomes." If you cannot count the numerator and denominator, you cannot do probability at all.

Here is a taste of why counting gets wild fast. Peter Norvig observed that the number of atoms in the observable universe is roughly 10⁸⁰. That sounds unimaginably large. But consider a tiny 12-pixel image where each pixel can be one of 17 million colors. By the time we finish this chapter, you will see that this produces roughly 10⁸⁶ distinct images — a million times more than the number of atoms in the universe. And that is just 12 pixels.

A Go board (19 × 19 points, each empty, black, or white) has about 3³⁶¹ ≈ 10¹⁷² configurations. That is the square of the number of atoms in the universe. No computer can store that many states. Understanding why numbers explode like this — and how to count precisely when they do — is what this chapter teaches.

Counting

Step Rule, Or Rule, Overcounting corrections

↓ build higher abstractions

Combinatorics

Permutations, Combinations, Bucketing, Stars & Bars

↓ enables

Probability

P(event) = favorable / total — both require counting

We will build everything from two primitive rules — the Step Rule (multiply when "and") and the Or Rule (add when "or"). Every formula in combinatorics — permutations, combinations, the multinomial coefficient — is just a clever application of these two ideas plus one correction technique: dividing out overcounting.

There is also a meta-strategy that shows up again and again: overcount then correct. Start by counting too many things (because it is easier), then figure out how many times each outcome was counted, and divide. Permutations with indistinct objects, combinations, and the multinomial coefficient are all derived this way. Once you internalize this pattern, most combinatorics problems become approachable.

Why CS students need this: Hash function analysis, birthday-problem arguments, randomized algorithm proofs, error-correcting codes, machine learning generalization bounds, crypto key spaces — they all start with "how many possibilities exist?" The answer is always counting.

How possibilities explode: n choices at each of k steps gives n^k total outcomes.

Check: A 12-pixel image with 17 million colors per pixel has roughly how many distinct images?

12 × 17 million = 204 million 17 million² ≈ 10¹⁴ 17 million¹² ≈ 10⁸⁶

Chapter 1: The Step Rule

Imagine you are building something in stages. First you make one choice, then another, then another. Each choice is independent of the others. How many total outcomes are possible?

Step Rule of Counting (Product Rule): If an experiment has two parts, where the first part can result in one of m outcomes and the second part can result in one of n outcomes regardless of the first part's outcome, then the total number of outcomes is m · n.

This generalizes immediately: if you have k steps with n₁, n₂, …, n_k choices respectively, the total is n₁ · n₂ · … · n_k.

Worked example — hash table collisions. A hash table has 100 buckets. Two strings are independently hashed. How many ways can they land?

Step 1: The first string lands in one of 100 buckets. Step 2: The second string (independently) lands in one of 100 buckets. Total: 100 × 100 = 10,000 ways.

Worked example — dice. Two 6-sided dice are rolled. The first die has 6 outcomes, the second has 6 outcomes regardless. Total: 6 × 6 = 36 outcomes.

The Step Rule also explains why problems explode. An image with n pixels, each with c color choices, is an n-step experiment. Total images: cⁿ. With c = 17 million and n = 12, that is (1.7 × 10⁷)¹² ≈ 10⁸⁶.

Worked example — unique Go board states. A Go board has 19 × 19 = 361 points. Each point can be empty, black, or white — 3 choices. We construct the board one point at a time, step by step. By the Step Rule: 3³⁶¹ ≈ 10¹⁷² total configurations. That is the square of the number of atoms in the universe. Even filtering down to legal positions yields about 10¹⁷⁰. No computer could ever enumerate them all — which is precisely why Go AI requires clever algorithms rather than brute force.

The logarithmic flip side: The Step Rule cuts both ways. If each step is binary, k binary steps produce 2^k outcomes. Want 10 million unique outcomes? You only need k = ⌈log₂(10,000,000)⌉ = 24 binary decisions. This is why binary search, decision trees, and hash functions are so efficient — each step doubles the number of distinguishable outcomes.

Counting with "or". Sometimes outcomes come from one source or another. If sets A and B are mutually exclusive (no overlap), then |A or B| = |A| + |B|. If they might overlap, we must correct for double-counting using inclusion-exclusion:

|A ∪ B| = |A| + |B| − |A ∩ B|

Worked example — bit strings. An 8-bit string must start with "01" or end with "10". Strings starting with "01": 2⁶ = 64. Strings ending with "10": 2⁶ = 64. Strings doing both: 2⁴ = 16. By inclusion-exclusion: 64 + 64 − 16 = 112.

Rule	When to use	Formula
Step Rule (AND)	Multi-step experiments, choices independent	m · n
Sum Rule (OR, disjoint)	Outcomes from A or B, no overlap	\|A\| + \|B\|
Inclusion-Exclusion	Outcomes from A or B, possible overlap	\|A\| + \|B\| − \|A ∩ B\|

Check: A Go board has 19×19 = 361 points, each with 3 states (black, white, empty). How many total configurations?

361 × 3 = 1,083 3³⁶¹ ≈ 10¹⁷² 361³ ≈ 47 million

Chapter 2: Factorials & Permutations

How many ways can you arrange a deck of cards? Line up 5 people for a photo? Order the letters of a word? These are all permutation problems — counting the number of ordered arrangements.

Permutation Rule: The number of ways to arrange n distinct objects in order is n! = n · (n−1) · (n−2) · … · 2 · 1.

Why? Use the Step Rule. For the first position, you have n choices. For the second, n−1 remain. For the third, n−2. And so on down to 1. Multiply them all: n!.

Worked example — BAYES. How many unique orderings of the letters B, A, Y, E, S? All 5 letters are distinct, so the answer is 5! = 120.

Worked example — smartphone passcode. A 4-digit passcode has 4 smudge marks over 4 different digits. How many possible codes? Since each digit is used exactly once and order matters: 4! = 24.

Permutations with Indistinct Objects

What if some items are identical? If you rearrange two identical 0s, the string looks the same. We have overcounted — and we need to divide out the redundancy.

Permutations with repeats = n! / (n₁! · n₂! · … · n_r!)

where n₁, n₂, …, n_r are the counts of each indistinct group, and n₁ + n₂ + … + n_r = n.

Worked example — bit strings. How many distinct bit strings from three 0s and two 1s? We have 5 positions. If all digits were unique: 5! = 120. But the three 0s are interchangeable (3! = 6 rearrangements) and the two 1s are interchangeable (2! = 2). So: 5! / (3! · 2!) = 120 / 12 = 10.

Worked example — MISSISSIPPI. 11 letters: 1 M, 4 I's, 4 S's, 2 P's. If all 11 letters were distinct we would have 11! = 39,916,800 arrangements. But swapping the four I's among themselves (4! = 24 ways) leaves the string unchanged. Same for the four S's (4!) and the two P's (2!). So we divide out all the redundant rearrangements:

11! / (1! · 4! · 4! · 2!) = 39,916,800 / (1 · 24 · 24 · 2) = 34,650

All permutations of distinct objects. Drag to rearrange and watch the count.

Overcounting principle: If every distinct outcome has been counted exactly k times in your total, divide by k. This single idea — "overcount then correct" — powers most combinatorics formulas.

Check: A phone passcode has 4 digits, but only 3 smudge marks (one digit repeated). How many possible codes?

24 36 (three cases of 4!/2! = 12 each) 12

Chapter 3: Combinations

Permutations care about order: ABC is different from CBA. But often order does not matter. Choosing 3 books from a shelf, selecting committee members, drawing cards from a deck — the order you pick them in is irrelevant. These are combinations.

Combination formula: The number of ways to choose r objects from n distinct objects (order does not matter, no replacement) is:
C(n, r) = n! / (r! · (n−r)!)

Where does this come from? Think of it in four steps:

Start with all n! permutations of the n objects.
Take only the first r positions. That gives n!/(n−r)! ordered selections.
But the order among the r chosen items is irrelevant — each group of r has been counted r! times.
Divide by r! to correct for the overcounting.

Result:

C(n, r) = n! / (r! · (n−r)!)

Worked example — Hunger Games. District 12 has 8,000 villagers. How many ways to choose 2 tributes?

C(8000, 2) = 8000! / (2! · 7998!) = (8000 × 7999) / 2 = 31,996,000

Worked example — books with a constraint. Choose 3 books from 6, but two specific books (8th and 9th edition) must not both be chosen. Total without constraint: C(6,3) = 20. Forbidden selections (both editions plus 1 other): C(4,1) = 4. Valid: 20 − 4 = 16.

Alternatively, enumerate by cases: (pick 8th ed + 2 from remaining 4) + (pick 9th ed + 2 from remaining 4) + (pick 3 from the other 4) = C(4,2) + C(4,2) + C(4,3) = 6 + 6 + 4 = 16. Same answer, different route.

Two strategies for constraints: You can either (1) count everything then subtract forbidden cases, or (2) split into mutually exclusive cases and add. Both always give the same answer. Pick whichever is simpler for the problem at hand.

Concept	Order matters?	Formula
Permutation (all n)	Yes	n!
Permutation (r of n)	Yes	n! / (n−r)!
Combination (r of n)	No	n! / (r!(n−r)!)

Check: Why is C(n,r) equal to n!/(r!(n-r)!) rather than just n!/(n-r)!?

Because we must account for the unselected items Because order among the r chosen items doesn't matter, so we divide out the r! redundant arrangements Because n! is always too large

Chapter 4: The Binomial Coefficient

The combination formula C(n, r) is so important that it gets its own notation and its own name: the binomial coefficient, written as:

&binom;(n, r) = C(n, r) = n! / (r!(n−r)!)

Read it as "n choose r." The notation with the stacked numbers inside parentheses is standard — you will see it everywhere in probability, statistics, and algorithm analysis. It is called "binomial" because these numbers appear as the coefficients when you expand (a + b)ⁿ — the Binomial Theorem:

(a + b)ⁿ = ∑_r=0ⁿ C(n,r) · a^n−r · b^r

Worked example — Pascal's Triangle. Each entry in Pascal's Triangle is a binomial coefficient. Row n, position r gives C(n, r). The fundamental recurrence is:

C(n, r) = C(n−1, r−1) + C(n−1, r)

Why? Consider the n-th object. Either it is in your selection (then choose r−1 more from the remaining n−1) or it is not (then choose all r from the remaining n−1). These two cases are mutually exclusive and exhaustive.

Pascal's Triangle: each cell is the sum of the two above it. Click a cell to highlight the recurrence.

Useful identities:
C(n, 0) = C(n, n) = 1 — there is one way to choose nothing or everything.
C(n, r) = C(n, n−r) — choosing who is in is the same as choosing who is out.
∑_r=0ⁿ C(n, r) = 2ⁿ — the total number of subsets of n items.

Worked example — committee. A club of 20 members forms a committee of 5. How many possible committees? C(20, 5) = 20! / (5! · 15!) = 15,504.

Worked example — subsets. How many subsets does a set of 10 elements have? Each element is either in or out (2 choices), so by the Step Rule: 2¹⁰ = 1,024. This matches ∑_r=0¹⁰ C(10, r) = 1,024.

This is a beautiful connection: the identity ∑ C(n,r) = 2ⁿ is just the Binomial Theorem with a = b = 1. It tells us that the total number of subsets of any size equals 2ⁿ. This fact shows up in power set arguments, in counting Boolean functions, and in information theory.

python
import itertools
# All C(5,3) = 10 ways to choose 3 from 5
choices = list(itertools.combinations([1,2,3,4,5], 3))
print(len(choices))  # 10
print(choices)
# [(1,2,3), (1,2,4), (1,2,5), (1,3,4),
#  (1,3,5), (1,4,5), (2,3,4), (2,3,5),
#  (2,4,5), (3,4,5)]

Check: Why does C(n,r) = C(n, n-r)?

Choosing r items to include is equivalent to choosing n-r items to exclude Because n! is symmetric It only holds when r = n/2

Chapter 5: Bucketing Distinct Objects

A different flavor of counting: instead of choosing or arranging, we are assigning. You have n distinct items and r containers. Each item goes into exactly one container. How many ways?

Bucketing distinct items: Place n distinguishable items into r containers. The number of ways is rⁿ.

Why rⁿ? Each of the n items independently chooses one of r containers. By the Step Rule: r · r · … · r (n times) = rⁿ.

Worked example — hash table. Place 10 different strings into 5 hash buckets. Each string independently picks a bucket: 5¹⁰ = 9,765,625 possible assignments.

Notice the difference from combinations. In combinations, we ask "which items are selected?" In bucketing, we ask "where does each item go?" The items are distinct and each one makes an independent choice.

Bucketing shows up constantly in CS. Hashing is bucketing. Assigning processes to CPUs is bucketing. Coloring graph vertices is bucketing. The formula rⁿ tells you the size of the search space, which is the first thing you need to know when analyzing an algorithm's complexity.

Bucketing vs. permutations: If r = n and each container holds exactly one item, bucketing becomes a permutation (n! arrangements). Bucketing is more general — multiple items can share a container, and containers can be empty.

Worked example — survey. 8 people each choose their favorite among 3 programming languages. Each person's choice is independent. Total outcomes: 3⁸ = 6,561.

Scenario	Items	Containers	Count
Hashing strings	10 strings	5 buckets	5¹⁰
Language survey	8 people	3 languages	3⁸
Coin flips	20 flips	2 outcomes	2²⁰

Check: 6 distinguishable dice are rolled. How many total outcomes?

6! = 720 6 × 6 = 36 6⁶ = 46,656

Chapter 6: Stars & Bars

Bucketing distinct objects is straightforward — each one picks a container independently. But what if the objects are indistinguishable? You have 10 identical dollars to distribute among 4 charities. How many ways?

Since the dollars are identical, all that matters is how many each charity gets, not which specific dollars. We cannot apply the Step Rule directly because the items have no identity. If we tried, we would massively overcount — giving dollar #1 to charity A and dollar #2 to charity B would be "different" from the reverse, even though both just mean "one dollar to A, one to B."

We need a completely different approach. Enter one of the most elegant constructions in combinatorics.

Stars and Bars (Divider Method): The number of ways to place n indistinguishable items into r containers is:
C(n + r − 1, n) = (n + r − 1)! / (n! · (r − 1)!)

The creative construction: represent the n items as stars (*) and the boundaries between r containers as bars (|). You need exactly r − 1 bars. Now count the distinct permutations of n stars and r−1 bars:

(n + r − 1)! / (n! · (r − 1)!)

Worked example — visual. Place 5 identical items into 3 bins. We arrange 5 stars and 2 bars:

**|*|** → (2, 1, 2)
****||* → (4, 0, 1)
||***** → (0, 0, 5)

Total: C(5+3−1, 5) = C(7, 5) = 7! / (5! · 2!) = 21.

Drag the bars to redistribute stars among bins. The counter shows all valid arrangements.

Worked example — startup investment. $10M to invest in 4 companies (in $1M increments).

C(10 + 4 − 1, 10) = C(13, 10) = 13! / (10! · 3!) = 286 ways

What if Company 1 must get at least $3M? Give them $3M first, then distribute the remaining $7M among 4 companies: C(7 + 4 − 1, 7) = C(10, 7) = 120.

What if you do not have to invest all $10M? Add a 5th "company" (yourself) to absorb unspent funds: C(10 + 5 − 1, 10) = C(14, 10) = 1,001.

The integer equation connection: Stars and Bars is equivalent to counting non-negative integer solutions to x₁ + x₂ + … + x_r = n. Each x_i ≥ 0 represents how many stars go in bin i. Adding lower-bound constraints (x_i ≥ k) just means pre-allocating k stars to bin i and solving for the remainder.

Check: How many ways to put 5 identical balls into 3 bins?

C(7,5) = 21 3⁵ = 243 5! / 3! = 20

Chapter 7: The Multinomial Coefficient

We have seen how to choose a single group of r items from n. But what if you need to split n distinct items into multiple groups of specified sizes? This is the multinomial coefficient.

Multinomial coefficient: The number of ways to divide n distinct objects into r groups of sizes n₁, n₂, …, n_r (where n₁ + n₂ + … + n_r = n) is:
n! / (n₁! · n₂! · … · n_r!)

This is the exact same formula as permutations with indistinct objects! The deep parallel: imagine labeling each object with its group number. Objects in group 1 get label "1", group 2 get label "2", etc. Counting distinct labelings is the same as counting permutations of n₁ copies of "1", n₂ copies of "2", and so on.

You can also derive it step-by-step using sequential combinations. First choose n₁ items from all n for group 1: C(n, n₁) ways. Then choose n₂ from the remaining n − n₁ for group 2: C(n − n₁, n₂) ways. Continue until all groups are filled. Multiply everything together and the factorials telescope beautifully into n! / (n₁! · n₂! · … · n_r!).

Worked example — server assignment. Camazon has 13 distinct servers to assign to 3 datacenters: A gets 6, B gets 4, C gets 3.

13! / (6! · 4! · 3!) = 60,060

We can verify this by thinking step-by-step with combinations. Choose 6 of 13 for datacenter A: C(13, 6). Then 4 of the remaining 7 for B: C(7, 4). Then the last 3 go to C: C(3, 3) = 1. By the Step Rule:

C(13,6) · C(7,4) · C(3,3) = 1716 · 35 · 1 = 60,060

Same answer. The multinomial coefficient and the sequential-combination approach are two views of the same counting process.

Unifying view: Almost every formula in this chapter is a special case of the multinomial coefficient.
• Permutation of n distinct objects: n! / (1!·1!·…·1!) = n!
• Combination C(n,r): n! / (r!·(n−r)!) — two groups, "chosen" and "not chosen"
• Permutations with repeats: same formula, groups are identical items

Formula	What it counts	Special case of
n!	Arrangements of n distinct items	Multinomial with all n_i=1
C(n,r)	Unordered selections of r from n	Multinomial with 2 groups
n!/(n₁!…n_r!)	Arrangements with repeats OR group assignment	The general form

Check: 12 students are split into 3 project teams of 4. How many ways?

12! / (4! · 4! · 4!) = 34,650 C(12,4) = 495 3¹²

Chapter 8: Counting Calculator

Now let's put all the formulas to work. This interactive calculator lets you explore every counting technique from this chapter. Pick a formula, set the parameters, and see the result with a step-by-step breakdown.

Instructions: Select a formula from the dropdown, adjust n and r with the sliders, and watch the calculation update in real time. The visualization shows the combinatorial structure.

Formula:

n = 5

r = 3

Check: Use the calculator. How many ways to distribute 7 identical cookies among 3 children?

3⁷ = 2,187 C(9, 7) = 36 7! = 5,040

Chapter 9: Connections

This chapter built the counting toolkit that everything in probability rests on. Let's see how the pieces connect and where they lead.

Step Rule & Or Rule

The two atomic counting operations: multiply for "and", add for "or"

↓

Permutations & Combinations

Built from the Step Rule + overcounting correction (divide by r!)

↓

Multinomial Coefficient

The general form that unifies permutations, combinations, and group assignment

↓

Stars & Bars

Handles indistinguishable objects via a clever encoding as permutations

Decision tree for counting problems

When facing a counting problem, ask these questions in order:

Are items distinct or identical? Distinct → bucketing or combinations. Identical → Stars & Bars.
Does order matter? Yes → permutations. No → combinations.
Are there repeated items? Yes → divide by the redundancy (permutations with indistinct objects).
Are there constraints? Use subtraction (total − forbidden) or split into mutually exclusive cases.

What comes next

In Chapter 2, we will define probability formally. Every probability calculation relies on the counting techniques from this chapter. The denominator (total outcomes) and numerator (favorable outcomes) both require counting. Without the tools you just learned, probability would be impossible to compute.

The connection is direct. "What is the probability of being dealt a flush in poker?" means: count the number of 5-card hands that are flushes (numerator, uses combinations) and divide by the total number of 5-card hands (denominator, also combinations). "What fraction of random 8-bit strings match our receiver criterion?" means: count valid strings (inclusion-exclusion) and divide by 2⁸ (Step Rule). Counting is probability.

The big picture: Counting → Probability → Random Variables → Expectation → the entire edifice of statistics and machine learning. This chapter is the bedrock. Every formula we derived — the Step Rule, permutations, combinations, Stars & Bars, the multinomial — will reappear throughout the course.

Formula	Expression	Use when
Step Rule	m · n	Independent sequential choices
Inclusion-Exclusion	\|A\|+\|B\|−\|A∩B\|	Overlapping "or" counts
Permutations	n!	Ordered arrangements, all distinct
Perm. with repeats	n!/(n₁!…n_r!)	Ordered arrangements, some identical
Combinations	C(n,r)	Unordered selection, all distinct
Bucketing (distinct)	rⁿ	Distinct items into labeled bins
Stars & Bars	C(n+r−1, n)	Identical items into labeled bins
Multinomial	n!/(n₁!…n_r!)	Splitting n distinct items into sized groups

Check: 10 identical candies, 4 children. Which formula?

Permutations: 10! Combinations: C(10, 4) Stars & Bars: C(13, 10)