The foundation beneath all of probability: learning to count the things that matter.
You probably thought you finished learning how to count somewhere around kindergarten. But the kind of counting that underpins probability is a different beast entirely. We are not counting apples. We are counting possibilities — the number of ways an experiment can unfold, the number of configurations a system can occupy, the number of outcomes that satisfy a constraint.
Why does this matter for computer science? Because every time you analyze an algorithm's behavior, hash data into buckets, generate cryptographic keys, or train a machine learning model, you are implicitly reasoning about how many things are possible. If you cannot count the possibilities, you cannot compute probabilities.
Here is a taste of why counting gets wild fast. Peter Norvig observed that the number of atoms in the observable universe is roughly 1080. That sounds unimaginably large. But consider a tiny 12-pixel image where each pixel can be one of 17 million colors. By the time we finish this chapter, you will see that this produces roughly 1086 distinct images — a million times more than the number of atoms in the universe. And that is just 12 pixels.
A Go board (19 × 19 points, each empty, black, or white) has about 3361 ≈ 10172 configurations. That is the square of the number of atoms in the universe. No computer can store that many states. Understanding why numbers explode like this — and how to count precisely when they do — is what this chapter teaches.
We will build everything from two primitive rules — the Step Rule (multiply when "and") and the Or Rule (add when "or"). Every formula in combinatorics — permutations, combinations, the multinomial coefficient — is just a clever application of these two ideas plus one correction technique: dividing out overcounting.
There is also a meta-strategy that shows up again and again: overcount then correct. Start by counting too many things (because it is easier), then figure out how many times each outcome was counted, and divide. Permutations with indistinct objects, combinations, and the multinomial coefficient are all derived this way. Once you internalize this pattern, most combinatorics problems become approachable.
How possibilities explode: n choices at each of k steps gives nk total outcomes.
Imagine you are building something in stages. First you make one choice, then another, then another. Each choice is independent of the others. How many total outcomes are possible?
This generalizes immediately: if you have k steps with n1, n2, …, nk choices respectively, the total is n1 · n2 · … · nk.
Worked example — hash table collisions. A hash table has 100 buckets. Two strings are independently hashed. How many ways can they land?
Step 1: The first string lands in one of 100 buckets. Step 2: The second string (independently) lands in one of 100 buckets. Total: 100 × 100 = 10,000 ways.
Worked example — dice. Two 6-sided dice are rolled. The first die has 6 outcomes, the second has 6 outcomes regardless. Total: 6 × 6 = 36 outcomes.
The Step Rule also explains why problems explode. An image with n pixels, each with c color choices, is an n-step experiment. Total images: cn. With c = 17 million and n = 12, that is (1.7 × 107)12 ≈ 1086.
Worked example — unique Go board states. A Go board has 19 × 19 = 361 points. Each point can be empty, black, or white — 3 choices. We construct the board one point at a time, step by step. By the Step Rule: 3361 ≈ 10172 total configurations. That is the square of the number of atoms in the universe. Even filtering down to legal positions yields about 10170. No computer could ever enumerate them all — which is precisely why Go AI requires clever algorithms rather than brute force.
Counting with "or". Sometimes outcomes come from one source or another. If sets A and B are mutually exclusive (no overlap), then |A or B| = |A| + |B|. If they might overlap, we must correct for double-counting using inclusion-exclusion:
Worked example — bit strings. An 8-bit string must start with "01" or end with "10". Strings starting with "01": 26 = 64. Strings ending with "10": 26 = 64. Strings doing both: 24 = 16. By inclusion-exclusion: 64 + 64 − 16 = 112.
| Rule | When to use | Formula |
|---|---|---|
| Step Rule (AND) | Multi-step experiments, choices independent | m · n |
| Sum Rule (OR, disjoint) | Outcomes from A or B, no overlap | |A| + |B| |
| Inclusion-Exclusion | Outcomes from A or B, possible overlap | |A| + |B| − |A ∩ B| |
How many ways can you arrange a deck of cards? Line up 5 people for a photo? Order the letters of a word? These are all permutation problems — counting the number of ordered arrangements.
Why? Use the Step Rule. For the first position, you have n choices. For the second, n−1 remain. For the third, n−2. And so on down to 1. Multiply them all: n!.
Worked example — BAYES. How many unique orderings of the letters B, A, Y, E, S? All 5 letters are distinct, so the answer is 5! = 120.
Worked example — smartphone passcode. A 4-digit passcode has 4 smudge marks over 4 different digits. How many possible codes? Since each digit is used exactly once and order matters: 4! = 24.
What if some items are identical? If you rearrange two identical 0s, the string looks the same. We have overcounted — and we need to divide out the redundancy.
where n1, n2, …, nr are the counts of each indistinct group, and n1 + n2 + … + nr = n.
Worked example — bit strings. How many distinct bit strings from three 0s and two 1s? We have 5 positions. If all digits were unique: 5! = 120. But the three 0s are interchangeable (3! = 6 rearrangements) and the two 1s are interchangeable (2! = 2). So: 5! / (3! · 2!) = 120 / 12 = 10.
Worked example — MISSISSIPPI. 11 letters: 1 M, 4 I's, 4 S's, 2 P's. If all 11 letters were distinct we would have 11! = 39,916,800 arrangements. But swapping the four I's among themselves (4! = 24 ways) leaves the string unchanged. Same for the four S's (4!) and the two P's (2!). So we divide out all the redundant rearrangements:
All permutations of distinct objects. Drag to rearrange and watch the count.
Permutations care about order: ABC is different from CBA. But often order does not matter. Choosing 3 books from a shelf, selecting committee members, drawing cards from a deck — the order you pick them in is irrelevant. These are combinations.
Where does this come from? Think of it in four steps:
Result:
Worked example — Hunger Games. District 12 has 8,000 villagers. How many ways to choose 2 tributes?
Worked example — books with a constraint. Choose 3 books from 6, but two specific books (8th and 9th edition) must not both be chosen. Total without constraint: C(6,3) = 20. Forbidden selections (both editions plus 1 other): C(4,1) = 4. Valid: 20 − 4 = 16.
Alternatively, enumerate by cases: (pick 8th ed + 2 from remaining 4) + (pick 9th ed + 2 from remaining 4) + (pick 3 from the other 4) = C(4,2) + C(4,2) + C(4,3) = 6 + 6 + 4 = 16. Same answer, different route.
| Concept | Order matters? | Formula |
|---|---|---|
| Permutation (all n) | Yes | n! |
| Permutation (r of n) | Yes | n! / (n−r)! |
| Combination (r of n) | No | n! / (r!(n−r)!) |
The combination formula C(n, r) is so important that it gets its own notation and its own name: the binomial coefficient, written as:
Read it as "n choose r." The notation with the stacked numbers inside parentheses is standard — you will see it everywhere in probability, statistics, and algorithm analysis. It is called "binomial" because these numbers appear as the coefficients when you expand (a + b)n — the Binomial Theorem:
Worked example — Pascal's Triangle. Each entry in Pascal's Triangle is a binomial coefficient. Row n, position r gives C(n, r). The fundamental recurrence is:
Why? Consider the n-th object. Either it is in your selection (then choose r−1 more from the remaining n−1) or it is not (then choose all r from the remaining n−1). These two cases are mutually exclusive and exhaustive.
Pascal's Triangle: each cell is the sum of the two above it. Click a cell to highlight the recurrence.
Worked example — committee. A club of 20 members forms a committee of 5. How many possible committees? C(20, 5) = 20! / (5! · 15!) = 15,504.
Worked example — subsets. How many subsets does a set of 10 elements have? Each element is either in or out (2 choices), so by the Step Rule: 210 = 1,024. This matches ∑r=010 C(10, r) = 1,024.
This is a beautiful connection: the identity ∑ C(n,r) = 2n is just the Binomial Theorem with a = b = 1. It tells us that the total number of subsets of any size equals 2n. This fact shows up in power set arguments, in counting Boolean functions, and in information theory.
python import itertools # All C(5,3) = 10 ways to choose 3 from 5 choices = list(itertools.combinations([1,2,3,4,5], 3)) print(len(choices)) # 10 print(choices) # [(1,2,3), (1,2,4), (1,2,5), (1,3,4), # (1,3,5), (1,4,5), (2,3,4), (2,3,5), # (2,4,5), (3,4,5)]
A different flavor of counting: instead of choosing or arranging, we are assigning. You have n distinct items and r containers. Each item goes into exactly one container. How many ways?
Why rn? Each of the n items independently chooses one of r containers. By the Step Rule: r · r · … · r (n times) = rn.
Worked example — hash table. Place 10 different strings into 5 hash buckets. Each string independently picks a bucket: 510 = 9,765,625 possible assignments.
Notice the difference from combinations. In combinations, we ask "which items are selected?" In bucketing, we ask "where does each item go?" The items are distinct and each one makes an independent choice.
Bucketing shows up constantly in CS. Hashing is bucketing. Assigning processes to CPUs is bucketing. Coloring graph vertices is bucketing. The formula rn tells you the size of the search space, which is the first thing you need to know when analyzing an algorithm's complexity.
Worked example — survey. 8 people each choose their favorite among 3 programming languages. Each person's choice is independent. Total outcomes: 38 = 6,561.
| Scenario | Items | Containers | Count |
|---|---|---|---|
| Hashing strings | 10 strings | 5 buckets | 510 |
| Language survey | 8 people | 3 languages | 38 |
| Coin flips | 20 flips | 2 outcomes | 220 |
Bucketing distinct objects is straightforward — each one picks a container independently. But what if the objects are indistinguishable? You have 10 identical dollars to distribute among 4 charities. How many ways?
Since the dollars are identical, all that matters is how many each charity gets, not which specific dollars. We cannot apply the Step Rule directly because the items have no identity. If we tried, we would massively overcount — giving dollar #1 to charity A and dollar #2 to charity B would be "different" from the reverse, even though both just mean "one dollar to A, one to B."
We need a completely different approach. Enter one of the most elegant constructions in combinatorics.
The creative construction: represent the n items as stars (*) and the boundaries between r containers as bars (|). You need exactly r − 1 bars. Now count the distinct permutations of n stars and r−1 bars:
Worked example — visual. Place 5 identical items into 3 bins. We arrange 5 stars and 2 bars:
Total: C(5+3−1, 5) = C(7, 5) = 7! / (5! · 2!) = 21.
Drag the bars to redistribute stars among bins. The counter shows all valid arrangements.
Worked example — startup investment. $10M to invest in 4 companies (in $1M increments).
What if Company 1 must get at least $3M? Give them $3M first, then distribute the remaining $7M among 4 companies: C(7 + 4 − 1, 7) = C(10, 7) = 120.
What if you do not have to invest all $10M? Add a 5th "company" (yourself) to absorb unspent funds: C(10 + 5 − 1, 10) = C(14, 10) = 1,001.
We have seen how to choose a single group of r items from n. But what if you need to split n distinct items into multiple groups of specified sizes? This is the multinomial coefficient.
This is the exact same formula as permutations with indistinct objects! The deep parallel: imagine labeling each object with its group number. Objects in group 1 get label "1", group 2 get label "2", etc. Counting distinct labelings is the same as counting permutations of n1 copies of "1", n2 copies of "2", and so on.
You can also derive it step-by-step using sequential combinations. First choose n1 items from all n for group 1: C(n, n1) ways. Then choose n2 from the remaining n − n1 for group 2: C(n − n1, n2) ways. Continue until all groups are filled. Multiply everything together and the factorials telescope beautifully into n! / (n1! · n2! · … · nr!).
Worked example — server assignment. Camazon has 13 distinct servers to assign to 3 datacenters: A gets 6, B gets 4, C gets 3.
We can verify this by thinking step-by-step with combinations. Choose 6 of 13 for datacenter A: C(13, 6). Then 4 of the remaining 7 for B: C(7, 4). Then the last 3 go to C: C(3, 3) = 1. By the Step Rule:
Same answer. The multinomial coefficient and the sequential-combination approach are two views of the same counting process.
| Formula | What it counts | Special case of |
|---|---|---|
| n! | Arrangements of n distinct items | Multinomial with all ni=1 |
| C(n,r) | Unordered selections of r from n | Multinomial with 2 groups |
| n!/(n1!…nr!) | Arrangements with repeats OR group assignment | The general form |
Now let's put all the formulas to work. This interactive calculator lets you explore every counting technique from this chapter. Pick a formula, set the parameters, and see the result with a step-by-step breakdown.
This chapter built the counting toolkit that everything in probability rests on. Let's see how the pieces connect and where they lead.
When facing a counting problem, ask these questions in order:
In Chapter 2, we will define probability formally. Every probability calculation relies on the counting techniques from this chapter. The denominator (total outcomes) and numerator (favorable outcomes) both require counting. Without the tools you just learned, probability would be impossible to compute.
The connection is direct. "What is the probability of being dealt a flush in poker?" means: count the number of 5-card hands that are flushes (numerator, uses combinations) and divide by the total number of 5-card hands (denominator, also combinations). "What fraction of random 8-bit strings match our receiver criterion?" means: count valid strings (inclusion-exclusion) and divide by 28 (Step Rule). Counting is probability.
| Formula | Expression | Use when |
|---|---|---|
| Step Rule | m · n | Independent sequential choices |
| Inclusion-Exclusion | |A|+|B|−|A∩B| | Overlapping "or" counts |
| Permutations | n! | Ordered arrangements, all distinct |
| Perm. with repeats | n!/(n1!…nr!) | Ordered arrangements, some identical |
| Combinations | C(n,r) | Unordered selection, all distinct |
| Bucketing (distinct) | rn | Distinct items into labeled bins |
| Stars & Bars | C(n+r−1, n) | Identical items into labeled bins |
| Multinomial | n!/(n1!…nr!) | Splitting n distinct items into sized groups |