Shen, Uspensky & Vereshchagin — Chapter 5

Monotone Complexity

The Levin-Schnorr theorem, Chaitin's Ω, and effective Hausdorff dimension — the deepest connections between complexity and randomness.

Prerequisites: Chapters 3–4 (Martin-Löf randomness, prefix complexity, semimeasures).

Chapters

Simulations

Quizzes

Chapter 0: Motivation

Chapter 4 introduced prefix complexity K(x) for finite strings. But Martin-Löf randomness is defined for infinite sequences. To bridge the gap, we need a complexity notion that respects the structure of infinite sequences — specifically, that extending a string can only increase the complexity.

Monotone complexity KM(x) does this: if x is a prefix of y, then KM(x) ≤ KM(y). This monotonicity is natural for infinite sequences: knowing more bits shouldn't make things simpler.

The three complexities:
• C(x) — plain complexity. Simplest definition, but no probabilistic interpretation.
• K(x) — prefix complexity. Descriptions are prefix-free. Kraft inequality holds.
• KM(x) — monotone complexity. Monotone in the prefix order. Connects directly to semimeasures on the Cantor space.

All three agree up to O(log n) for strings of length n, but the fine distinctions matter for the theorems that characterize randomness precisely.

Check: Monotone complexity requires that if x is a prefix of y, then...

KM(x) ≤ KM(y) — extending a string can only increase complexity KM(x) = KM(y) — complexity is unchanged by extension KM(x) ≥ KM(y) — longer strings are simpler

Chapter 1: Semimeasures on the Binary Tree

A semimeasure on the binary tree is a function μ: Σ* → [0, ∞) satisfying:

μ(x) ≥ μ(x0) + μ(x1) for all x

This says that the "mass" at node x is at least the sum of the masses at its children. Some mass may be "lost" at each level — the inequality is not necessarily tight.

If μ(λ) ≤ 1 (where λ is the empty string / root), then μ induces a semimeasure on Ω via μ(Ω_x) = μ(x).

Measures vs. semimeasures on the tree: A measure has μ(x) = μ(x0) + μ(x1) for all x (no mass is lost). A semimeasure allows μ(x) > μ(x0) + μ(x1) (mass can leak at each node). The uniform measure sets μ(x) = 2^−|x|.

A maximal lower semicomputable semimeasure on the tree exists, analogous to the maximal semimeasure on N from Chapter 4. Its negative logarithm gives a priori complexity on the tree.

Check: A semimeasure on the tree allows μ(x) to be...

Greater than or equal to μ(x0) + μ(x1) (mass can leak) Exactly equal to μ(x0) + μ(x1) Less than μ(x0) + μ(x1)

Chapter 2: A Priori Complexity KA

Let M be the maximal lower semicomputable semimeasure on the binary tree. The a priori complexity is:

KA(x) = −log M(x) + O(1)

Since M is monotone on the tree (M(x) ≥ M(x0) + M(x1) ≥ M(x0)), we have KA(x) ≤ KA(x0) and KA(x) ≤ KA(x1). In words: KA is monotone — it can only increase as we extend the string.

KA vs K: KA(x) ≤ K(x) + O(1) always. For most strings, KA(x) ≈ K(x). But KA can be smaller because it benefits from the tree structure: the semimeasure on the tree can "spread" mass more efficiently than the semimeasure on individual strings.

Check: A priori complexity KA(x) is defined as...

−log of the maximal semicomputable semimeasure on the tree, evaluated at x The shortest description in any programming language The length of x

Chapter 3: Monotone Complexity KM

There's another way to define monotone complexity, using monotone machines. A monotone machine is a partial computable function D: Σ* → Σ* with a monotonicity constraint: if p is a prefix of q and both D(p) and D(q) are defined, then D(p) is a prefix of D(q).

Monotone complexity KM(x) is the length of the shortest program p such that D(p) starts with x (D(p) has x as a prefix).

KM(x) = min{ |p| : x ⊑ D(p) }

where ⊑ means "is a prefix of or equal to."

Monotone machines as continuous maps: The monotonicity constraint means the machine maps the Cantor space to itself in a continuous, order-preserving way. Extending the input extends the output. This is natural for "streaming" computation: the machine reads input bits and produces output bits, and more input always means more (or the same) output.

The relation: KA(x) = KM(x) + O(1). A priori complexity and monotone complexity are essentially the same thing, approached from different directions (probabilistic vs. computational).

Check: In a monotone machine, extending the input...

Can only extend the output (or leave it unchanged) Always produces a completely different output Halts the machine

Chapter 4: The Levin-Schnorr Theorem

This is the theorem that ties everything together. It gives a precise complexity characterization of Martin-Löf randomness.

Levin-Schnorr Theorem: An infinite binary sequence ω is Martin-Löf random (with respect to the uniform measure) if and only if there exists a constant c such that K(ω[0..n)) ≥ n − c for all n.

In words: ω is random iff its prefixes are incompressible. The prefixes of a random sequence cannot be compressed by more than a constant number of bits, no matter how long they get.

Both directions are non-trivial:

If ω is ML-random, its prefixes are incompressible: If some prefix ω[0..n) had K(ω[0..n)) < n − c, then the set of sequences with such a compressible prefix forms an effectively null set (by Kraft's inequality). So ML-random sequences avoid this set.
If prefixes are incompressible, ω is ML-random: Any effectively null set can be covered by intervals of small total measure. But if ω has incompressible prefixes, it can't be in any interval Ω_x where x has K(x) < |x| − c. The incompressibility acts as a "shield" against all effective null sets at once.

Why K, not C? The theorem fails for plain complexity C. With C, you can only prove C(ω[0..n)) ≥ n − O(log n), which allows a slowly growing gap. The Kraft inequality for K makes the O(1) bound possible.

Check: The Levin-Schnorr theorem says ML-randomness is equivalent to...

K(ω[0..n)) ≥ n − O(1) for all n (prefix incompressibility) C(ω[0..n)) = n for all n The frequency of 1's equals 1/2

Chapter 5: The Random Number Ω

Chaitin defined a remarkable real number:

Ω = ∑_{p ∈ dom(U)} 2^−|p|

Here U is the optimal prefix machine and the sum is over all programs p on which U halts. By Kraft's inequality, Ω ≤ 1, so Ω is a real number between 0 and 1.

Ω is ML-random. Its binary expansion is a Martin-Löf random sequence. This is the first explicit example of an ML-random sequence (even though we can't compute its digits).

Ω is computably enumerable from below (lower semicomputable): by running all programs in parallel and adding 2^−|p| whenever a program p halts, we get an increasing sequence of rationals converging to Ω. But Ω is not computable — knowing its first n bits would let us solve the halting problem for all programs of length ≤ n.

Ω encodes the halting problem: Knowing the first n bits of Ω lets you determine which programs of length ≤ n halt. Run all programs in parallel. Whenever one halts, add its contribution to your running sum. Stop when the sum matches the first n bits of Ω. At that point, all programs of length ≤ n that will ever halt have already halted.

So Ω is a single real number that concentrates the unsolvability of the halting problem into a sequence of digits. It is random, it is non-computable, and it is definable in a few lines.

Check: Why is Ω non-computable despite being definable?

Because knowing its bits solves the halting problem Because it is irrational Because Ω > 1

Chapter 6: Effective Hausdorff Dimension

Classical Hausdorff dimension measures the "fractal dimension" of sets. The effective version replaces arbitrary covers with computable covers, giving a dimension notion that connects to complexity.

Effective dimension of a sequence: The effective Hausdorff dimension of an infinite sequence ω is:
dim(ω) = lim inf_n→∞ K(ω[0..n)) / n

For ML-random sequences, dim(ω) = 1 (prefixes are incompressible, so K/n → 1). For computable sequences, dim(ω) = 0 (prefixes have complexity O(1)). Sequences in between have fractional dimension.

Example: Consider a sequence where each bit is independently 1 with probability p ≠ 1/2. A "typical" such sequence has K(ω[0..n)) ≈ H(p) · n, where H(p) = −p log p − (1−p) log(1−p) is the binary entropy. So its effective dimension is H(p), which is less than 1. The sequence is "partly random" — it has some structure (the bias) but also genuine randomness.

Effective Hausdorff dimension gives a continuous measure of randomness, interpolating between fully computable (dimension 0) and fully random (dimension 1).

Check: An ML-random sequence has effective Hausdorff dimension...

1 (prefixes are incompressible) 0 (it has no patterns) Between 0 and 1

Chapter 7: Randomness w.r.t. Other Measures

Martin-Löf randomness generalizes beyond the uniform measure. For any computable measure μ on Ω, we can define ML-randomness with respect to μ.

A sequence ω is ML-random with respect to μ iff ω does not belong to any μ-effectively null set. Equivalently, K(ω[0..n)) ≥ −log μ(Ω_ω[0..n)) − O(1).

Randomness for biased coins: For Bernoulli(p), a sequence is ML_p-random iff K(ω[0..n)) ≥ k log(1/p) + (n−k) log(1/(1−p)) − O(1), where k is the number of 1's in ω[0..n). The right-hand side is −log μ_p(Ω_ω[0..n)).

Different measures give different random sequences. A sequence that is random for Bernoulli(1/2) (fair coin) is not random for Bernoulli(1/3) (biased coin), and vice versa. Randomness is always relative to a measure.

Check: A sequence random for a biased coin (p = 0.7) will have frequency of 1's converging to...

0.7 0.5 1.0

Chapter 8: Ω Approximation Simulator

The simulation below approximates Ω by running "programs" (small random computations) and accumulating 2^−|p| for each halting program. The sum grows from below, approaching Ω but never reaching it.

Ω Lower Bound Convergence

Each step simulates checking one more program. Halting programs contribute 2^−|p| to the running sum. Watch the lower bound converge.

Ω ≥ 0

Check: The running sum approximating Ω is always...

A lower bound on Ω (approaches from below) An upper bound on Ω Exactly Ω after enough steps

Chapter 9: Summary

Concept	Key fact
Monotone complexity KM	KM(x) ≤ KM(xy) for all y — monotone in prefix order
KA = KM + O(1)	A priori complexity = monotone complexity
Levin-Schnorr	ω is ML-random iff K(ω[0..n)) ≥ n − O(1) for all n
Ω	Halting probability; ML-random; lower semicomputable; encodes halting problem
Effective dimension	dim(ω) = lim inf K(ω[0..n))/n

Looking ahead: Chapter 6 develops a general framework comparing all these complexity notions. Chapter 7 builds the bridge to Shannon entropy. And Chapters 8–9 apply these ideas to combinatorics and game theory.

Final check: The Levin-Schnorr theorem uses prefix complexity K (not plain C) because...

K satisfies Kraft's inequality, which is needed for the measure-theoretic argument K is computable while C is not K is always larger than C