What does it mean for an infinite sequence to be random? Martin-Löf found the answer: a sequence is random if it passes every effective statistical test.
Is the sequence 01010101010101... random? Clearly not — it has an obvious pattern. What about 01101001100101101001011001101001...? That's the Thue-Morse sequence, and it too has structure (each bit is computable).
What about a sequence generated by radioactive decay? We'd call that random. But what property distinguishes it from the patterned sequences?
Martin-Löf's insight (1966): a sequence is random if it belongs to no effectively null set. An effectively null set is one that can be covered by intervals whose total measure is computably small. The set of non-random sequences is the largest effectively null set — and it exists!
This is a profound resolution. By restricting "null set" to "effectively null set" (computable covers), we get a notion of randomness for individual sequences that makes "every random sequence has property P" literally true, not just idiomatic.
The space Ω = {0, 1}∞ is the set of all infinite binary sequences. We think of an element ω ∈ Ω as the outcome of infinitely many coin flips.
For any finite binary string x, the cylinder set (or interval) Ωx is the set of all infinite sequences that begin with x. For example, Ω01 is the set of all sequences starting with 01.
The uniform Bernoulli measure assigns μ(Ωx) = 2−|x|. Each bit is an independent fair coin flip. The interval Ω01 has measure 1/4 because the probability of starting with 01 is (1/2)(1/2) = 1/4.
A set A ⊆ Ω has measure zero (is a null set) if for every ε > 0, A can be covered by countably many intervals whose total measure is at most ε. The Strong Law of Large Numbers says: the set of sequences where the frequency of ones does not converge to p is a null set under Bernoulli(p).
Let's state the SLLN precisely. Fix p = 1/2 (fair coin). Define A1/2 as the set of all ω ∈ Ω where the frequency of ones converges to 1/2:
The SLLN says μ(A1/2) = 1. Equivalently, the complement — sequences where frequencies don't converge to 1/2 — is a null set.
The Shannon entropy H(u, v) = −u log u − v log v appears naturally here. It reaches its maximum of 1 at u = v = 1/2 (fair coin). For any ε > 0, H(1/2+ε, 1/2−ε) < 1, so the "bad" strings (wrong frequency) form an exponentially shrinking fraction.
People often say "every random sequence has frequency 1/2." But this is idiomatic — it means "the set of exceptions has measure zero." The question is: can we make this literal? Can we define individual random sequences so that every random sequence truly has frequency 1/2? Yes — that's Martin-Löf's contribution.
We'd like to define: "ω is random iff ω does not belong to any null set (other than the trivial ones)." But this doesn't work. Here's why.
Every individual sequence ω is a member of the singleton set {ω}, which has measure zero. So every sequence belongs to some null set. The union of all these singletons is all of Ω — the non-null set.
The resolution: restrict to effectively null sets — null sets whose covering can be computed. This restriction is not arbitrary; it reflects the fact that we can only test for non-randomness using computable procedures. A test that requires non-computable resources to construct doesn't count as a "real" pattern.
A set A ⊆ Ω is effectively null if there is a computable function χ(ε, i) that, for any rational ε > 0, produces a sequence of intervals covering A with total measure at most ε:
The key word is computable. The covering function χ must be algorithmic. You can't just assert a covering exists — you need to produce it.
The crucial theorem: there exists a largest effectively null set (also called a universal Martin-Löf test). This is unlike the classical case, where no largest null set exists. The restriction to computable coverings creates enough structure for a maximal element to exist.
Now we can state it.
Equivalently, ω is random iff ω is not in the largest effectively null set. The complement of the largest effectively null set is the set of all random sequences — and it has measure 1.
Why does the largest effectively null set exist? Because the union of all effectively null sets is itself effectively null. Here's the argument: enumerate all computable covering functions (there are countably many programs). For each program pi and each ε = 2−k, run pi(2−k−i, j) to get intervals that cover the ith null set with total measure at most 2−k−i. Take the union over all i. The total measure is at most ∑ 2−k−i = 2−k+1. Close enough to ε (differs by a factor of 2).
Martin-Löf random sequences satisfy every "effective" law of probability. Here are the key properties:
| Property | Statement for ML-random ω |
|---|---|
| Frequency | lim freq(1's in ω) = p (for Bernoulli(p) measure) |
| Normality | Every finite pattern appears with the "right" frequency |
| LIL | Satisfies the Law of the Iterated Logarithm |
| No computable pattern | No computable function predicts ω's bits better than chance |
| Incompressibility | Prefixes of ω cannot be compressed significantly |
The incompressibility connection is the most important. For the uniform measure, ω is ML-random iff for all n, the prefix ω0..ωn−1 has C(ω[0..n)) ≥ n − O(1). But this requires prefix complexity K (not plain complexity C) to state precisely. We'll see this in Chapters 4 and 5 (Levin-Schnorr theorem).
ML-random sequences are not computable. No algorithm can generate a random sequence bit by bit. This follows because any computable sequence is determined by a finite program, so its prefixes have complexity O(1) — far less than n.
However, random sequences do exist in abundance — the set of ML-random sequences has measure 1. They are just not constructible.
Instead of a binary "random or not," we can measure how non-random a sequence is. A randomness deficiency function d(ω) assigns a non-negative real (or +∞) to each sequence, where higher values indicate more non-randomness.
Think of the deficiency as a hypothesis test. Each effectively null set is a test for a specific pattern. The deficiency aggregates all tests: if ω fails many tests badly, its deficiency is high; if it passes all tests, its deficiency is small.
For finite strings, a natural deficiency is d(x) = n − C(x) for an n-bit string x. A string with d(x) = 0 is incompressible; d(x) = k means the string can be compressed by k bits. Large d means the string has patterns; small d means it looks random.
The simulation below generates random bits and tracks the running frequency of ones. For a fair coin, the frequency should converge to 0.5. The SLLN guarantees this for ML-random sequences.
Click Generate to flip coins. The plot shows running frequency of 1's. Random sequences converge to 0.5; patterned ones may not. Click Biased for a non-random (patterned) sequence.
| Concept | Meaning |
|---|---|
| Ω | Space of all infinite binary sequences |
| Ωx | Cylinder set: sequences starting with x. μ(Ωx) = 2−|x| |
| Null set | Coverable by intervals of arbitrarily small total measure |
| Effectively null | Null set with a computable covering function |
| ML-random | Not in any effectively null set |
| Universal test | Largest effectively null set (exists and is unique up to measure 0) |
| Deficiency | Quantifies how non-random a sequence is |