Shen, Uspensky & Vereshchagin — Chapter 3

Martin-Löf Randomness

What does it mean for an infinite sequence to be random? Martin-Löf found the answer: a sequence is random if it passes every effective statistical test.

Prerequisites: Chapters 1–2 (plain complexity, conditional complexity). Basic measure theory intuition helps.
10
Chapters
3
Simulations
10
Quizzes

Chapter 0: The Question

Is the sequence 01010101010101... random? Clearly not — it has an obvious pattern. What about 01101001100101101001011001101001...? That's the Thue-Morse sequence, and it too has structure (each bit is computable).

What about a sequence generated by radioactive decay? We'd call that random. But what property distinguishes it from the patterned sequences?

The challenge: Probability theory says "the set of non-random sequences has measure zero." But this is vacuous — every individual sequence has measure zero. We need a definition that applies to individual sequences, not just sets.

Martin-Löf's insight (1966): a sequence is random if it belongs to no effectively null set. An effectively null set is one that can be covered by intervals whose total measure is computably small. The set of non-random sequences is the largest effectively null set — and it exists!

This is a profound resolution. By restricting "null set" to "effectively null set" (computable covers), we get a notion of randomness for individual sequences that makes "every random sequence has property P" literally true, not just idiomatic.

Check: Why can't we define "random" as "the sequence has measure zero"?

Chapter 1: Measures on Ω

The space Ω = {0, 1} is the set of all infinite binary sequences. We think of an element ω ∈ Ω as the outcome of infinitely many coin flips.

For any finite binary string x, the cylinder set (or interval) Ωx is the set of all infinite sequences that begin with x. For example, Ω01 is the set of all sequences starting with 01.

The uniform Bernoulli measure assigns μ(Ωx) = 2−|x|. Each bit is an independent fair coin flip. The interval Ω01 has measure 1/4 because the probability of starting with 01 is (1/2)(1/2) = 1/4.

General Bernoulli measures: The Bernoulli(p) measure uses a biased coin: P(1) = p, P(0) = 1−p. The measure of Ωx is pk(1−p)n−k where n = |x| and k is the number of 1's in x. The uniform case is p = 1/2.

A set A ⊆ Ω has measure zero (is a null set) if for every ε > 0, A can be covered by countably many intervals whose total measure is at most ε. The Strong Law of Large Numbers says: the set of sequences where the frequency of ones does not converge to p is a null set under Bernoulli(p).

Check: Under the uniform Bernoulli measure, what is μ(Ω110)?

Chapter 2: The Strong Law of Large Numbers

Let's state the SLLN precisely. Fix p = 1/2 (fair coin). Define A1/2 as the set of all ω ∈ Ω where the frequency of ones converges to 1/2:

limn→∞0 + ω1 + ... + ωn−1) / n = 1/2

The SLLN says μ(A1/2) = 1. Equivalently, the complement — sequences where frequencies don't converge to 1/2 — is a null set.

Proof sketch (for p = 1/2): Fix ε > 0. Using Stirling's approximation, the number of n-bit strings with frequency of ones differing from 1/2 by more than ε is at most 2H(1/2+ε, 1/2−ε)·n, where H(u, v) = −u log u − v log v is the binary Shannon entropy. Since H(1/2+ε, 1/2−ε) < 1, this fraction decreases exponentially. The Borel-Cantelli lemma finishes the job.

The Shannon entropy H(u, v) = −u log u − v log v appears naturally here. It reaches its maximum of 1 at u = v = 1/2 (fair coin). For any ε > 0, H(1/2+ε, 1/2−ε) < 1, so the "bad" strings (wrong frequency) form an exponentially shrinking fraction.

People often say "every random sequence has frequency 1/2." But this is idiomatic — it means "the set of exceptions has measure zero." The question is: can we make this literal? Can we define individual random sequences so that every random sequence truly has frequency 1/2? Yes — that's Martin-Löf's contribution.

Check: The SLLN says the fraction of n-bit "bad" strings (frequency far from 1/2)...

Chapter 3: Null Sets — The Problem

We'd like to define: "ω is random iff ω does not belong to any null set (other than the trivial ones)." But this doesn't work. Here's why.

Every individual sequence ω is a member of the singleton set {ω}, which has measure zero. So every sequence belongs to some null set. The union of all these singletons is all of Ω — the non-null set.

The problem: There is no smallest measure-1 set (equivalently, no largest null set) in the classical sense. Every set of measure 1 can be made slightly smaller by removing a single point. So "belongs to no null set" is impossible — every element belongs to at least one null set (its own singleton).

The resolution: restrict to effectively null sets — null sets whose covering can be computed. This restriction is not arbitrary; it reflects the fact that we can only test for non-randomness using computable procedures. A test that requires non-computable resources to construct doesn't count as a "real" pattern.

Check: Why can't we define random = "belongs to no null set"?

Chapter 4: Effectively Null Sets

A set A ⊆ Ω is effectively null if there is a computable function χ(ε, i) that, for any rational ε > 0, produces a sequence of intervals covering A with total measure at most ε:

A ⊆ Ωχ(ε,0) ∪ Ωχ(ε,1) ∪ Ωχ(ε,2) ∪ ...
μ(Ωχ(ε,0)) + μ(Ωχ(ε,1)) + ... ≤ ε

The key word is computable. The covering function χ must be algorithmic. You can't just assert a covering exists — you need to produce it.

Example: The set of sequences where the frequency of ones doesn't converge to 1/2 is effectively null. The covering algorithm: given ε, for large enough N, list all strings of length ≥ N whose frequency of ones differs from 1/2 by more than some threshold. The total measure of the corresponding intervals converges to 0 exponentially fast.

The crucial theorem: there exists a largest effectively null set (also called a universal Martin-Löf test). This is unlike the classical case, where no largest null set exists. The restriction to computable coverings creates enough structure for a maximal element to exist.

Check: What makes a null set "effectively null"?

Chapter 5: The Martin-Löf Definition

Now we can state it.

Definition (Martin-Löf, 1966): An infinite binary sequence ω is Martin-Löf random (with respect to a computable measure μ) if ω does not belong to any effectively null set.

Equivalently, ω is random iff ω is not in the largest effectively null set. The complement of the largest effectively null set is the set of all random sequences — and it has measure 1.

Why does the largest effectively null set exist? Because the union of all effectively null sets is itself effectively null. Here's the argument: enumerate all computable covering functions (there are countably many programs). For each program pi and each ε = 2−k, run pi(2−k−i, j) to get intervals that cover the ith null set with total measure at most 2−k−i. Take the union over all i. The total measure is at most ∑ 2−k−i = 2−k+1. Close enough to ε (differs by a factor of 2).

Martin-Löf randomness makes "every random sequence has property P" literal: If the set of sequences without P is effectively null, then every ML-random sequence has P. The SLLN, the law of the iterated logarithm, normality — all of these hold literally for every ML-random sequence.
Check: An infinite sequence is Martin-Löf random iff...

Chapter 6: Properties of ML-Randomness

Martin-Löf random sequences satisfy every "effective" law of probability. Here are the key properties:

PropertyStatement for ML-random ω
Frequencylim freq(1's in ω) = p (for Bernoulli(p) measure)
NormalityEvery finite pattern appears with the "right" frequency
LILSatisfies the Law of the Iterated Logarithm
No computable patternNo computable function predicts ω's bits better than chance
IncompressibilityPrefixes of ω cannot be compressed significantly

The incompressibility connection is the most important. For the uniform measure, ω is ML-random iff for all n, the prefix ω0..ωn−1 has C(ω[0..n)) ≥ n − O(1). But this requires prefix complexity K (not plain complexity C) to state precisely. We'll see this in Chapters 4 and 5 (Levin-Schnorr theorem).

Connection to complexity: Intuitively, a sequence is random iff you can't compress its prefixes. "No compressibility" = "no patterns" = "passes all statistical tests." Martin-Löf randomness makes this precise.

ML-random sequences are not computable. No algorithm can generate a random sequence bit by bit. This follows because any computable sequence is determined by a finite program, so its prefixes have complexity O(1) — far less than n.

However, random sequences do exist in abundance — the set of ML-random sequences has measure 1. They are just not constructible.

Check: Can a computable sequence be ML-random?

Chapter 7: Randomness Deficiencies

Instead of a binary "random or not," we can measure how non-random a sequence is. A randomness deficiency function d(ω) assigns a non-negative real (or +∞) to each sequence, where higher values indicate more non-randomness.

Universal deficiency: There exists a largest (up to O(1)) lower semicomputable randomness deficiency function. A sequence ω is ML-random iff this universal deficiency is finite. The value of the deficiency tells you "how far from random" ω is.

Think of the deficiency as a hypothesis test. Each effectively null set is a test for a specific pattern. The deficiency aggregates all tests: if ω fails many tests badly, its deficiency is high; if it passes all tests, its deficiency is small.

For finite strings, a natural deficiency is d(x) = n − C(x) for an n-bit string x. A string with d(x) = 0 is incompressible; d(x) = k means the string can be compressed by k bits. Large d means the string has patterns; small d means it looks random.

Check: A randomness deficiency of 0 means...

Chapter 8: Frequency Convergence Simulator

The simulation below generates random bits and tracks the running frequency of ones. For a fair coin, the frequency should converge to 0.5. The SLLN guarantees this for ML-random sequences.

Frequency of Ones vs. Number of Flips

Click Generate to flip coins. The plot shows running frequency of 1's. Random sequences converge to 0.5; patterned ones may not. Click Biased for a non-random (patterned) sequence.

What to notice: Fair random sequences oscillate around 0.5, converging slowly. Biased sequences converge to a different value (0.7). The patterned sequence locks onto 0.5 immediately — but this is NOT randomness, because the convergence is "too perfect." A true random sequence approaches 0.5 with characteristic fluctuations described by the Law of the Iterated Logarithm.
Check: The sequence 010101010101... has frequency of ones equal to 0.5. Is it ML-random?

Chapter 9: Summary

ConceptMeaning
ΩSpace of all infinite binary sequences
ΩxCylinder set: sequences starting with x. μ(Ωx) = 2−|x|
Null setCoverable by intervals of arbitrarily small total measure
Effectively nullNull set with a computable covering function
ML-randomNot in any effectively null set
Universal testLargest effectively null set (exists and is unique up to measure 0)
DeficiencyQuantifies how non-random a sequence is
Looking ahead: The connection between randomness and incompressibility is the deepest theorem in this subject. To state it precisely, we need prefix complexity K(x) — a variant of C(x) with a self-delimiting constraint. Chapter 4 develops K(x) from a priori probability, and Chapter 5 proves the Levin-Schnorr theorem: ω is ML-random iff K(ω[0..n)) ≥ n − O(1) for all n.
Final check: The key innovation of Martin-Löf's definition is that it defines randomness for...