Shen, Uspensky & Vereshchagin — Chapter 8

Some Applications

The incompressibility method in action: proofs about primes, automata, tape complexity, and forbidden substrings.

Prerequisites: Chapters 1–2 (plain complexity, counting argument). The proofs use only basic C(x).

Chapters

Simulations

Quizzes

Chapter 0: The Method

The incompressibility method is a powerful proof technique. The recipe:

Choose an object (string, number, graph) that is incompressible: C(x) ≥ |x| − O(1).
Assume for contradiction that the object has some property P that would allow a short description.
Construct the short description, contradicting incompressibility.

Why it works: Incompressible objects exist (counting argument). Any "structural" property gives a compression scheme. So incompressible objects must lack that structure. Since most objects are incompressible, most objects lack the structure — and we've proved something about "generic" objects.

The beauty of this method: it replaces complex combinatorial arguments with simple information-theoretic reasoning. Instead of counting elements directly, you argue about description lengths.

Check: The incompressibility method works because...

If a property implies short descriptions, incompressible objects can't have it All strings are incompressible C(x) is computable

Chapter 1: Infinitely Many Primes

Here is a Kolmogorov complexity proof that there are infinitely many primes.

Theorem: There are infinitely many prime numbers.

Proof: Suppose there are only k primes: p₁, ..., p_k. Every natural number n ≤ N can be written as n = p₁^a₁ ·...· p_k^a_k. Each exponent a_i is at most log_{p_i} N ≤ log N. So n is determined by the tuple (a₁, ..., a_k), which can be encoded in k · log log N + O(1) bits.

But for large N, there exist numbers n ≤ N with C(n) ≥ log N − O(1). For these numbers, k · log log N ≥ log N − O(1), which fails for large enough N (since log log N grows much slower than log N). Contradiction.

Comparison with Euclid: This proof is shorter and arguably more intuitive than Euclid's classic proof. Euclid constructs p₁·...·p_k + 1 and argues it has a new prime factor. The complexity proof just says: if there were few primes, every number would have a short description, but not all do.

Check: The key idea in the primes proof is that with k primes, every number n has a description of length...

O(k · log log N) — encoding the exponents in the prime factorization log N — the binary representation k — just the number of primes

Chapter 2: Moving Information Along the Tape

A Turing machine has a tape and a head that moves left or right. Suppose the input is written on the first n cells, and the machine must copy it to cells n+1 through 2n. How many steps does this take?

Theorem: Any Turing machine that copies n bits from the left half to the right half of the tape requires Ω(n²) steps in the worst case.

Proof sketch: Take an incompressible input x of length n. The machine must "move" all the information from left to right. The tape head is the bottleneck: it can carry at most O(log |Q|) bits per step (where |Q| is the number of states — a constant). Each time the head crosses the boundary between the two halves, it transfers at most O(1) bits. The head must cross the boundary Ω(n) times to transfer n bits. But between crossings, the head must traverse Ω(n) cells. Total: Ω(n) crossings × Ω(n) steps per crossing = Ω(n²).

The incompressibility of x is crucial: if x were all zeros, the machine could just write zeros on the right half without reading the left half at all.

Check: The tape head transfers information at rate...

O(1) bits per boundary crossing (limited by the finite state set) n bits per step log n bits per step

Chapter 3: Multi-Head Automata

A finite automaton with k heads reads a fixed input string. Each head moves independently. The automaton accepts or rejects based on the symbols under all k heads simultaneously.

How powerful are k heads vs. k+1 heads? Using incompressibility, one can show:

Theorem: For every k, there is a language recognized by a (k+1)-head automaton but not by any k-head automaton.

The proof constructs a language that requires tracking k+1 independent positions in the string. With only k heads, the automaton can't monitor all positions simultaneously. Incompressibility ensures that the positions are "independent" — knowing k of them gives no information about the (k+1)th.

This is a strict hierarchy: more heads give strictly more power. The incompressibility method provides an elegant proof that avoids the tedious pumping-lemma-style arguments usually needed for automata hierarchy theorems.

Check: The multi-head hierarchy theorem uses incompressibility to ensure...

The head positions encode independent information that can't be tracked by fewer heads The automaton has enough states The language is finite

Chapter 4: Laws of Large Numbers

The incompressibility method gives an alternative proof of the Strong Law of Large Numbers:

Theorem: For an incompressible binary string x of length n, the frequency of ones is close to 1/2. More precisely, if C(x|n) ≥ n − d, then the frequency of ones differs from 1/2 by at most O(√(d/n)).

Proof idea: If x has frequency far from 1/2, then x is one of a small set of "atypical" strings. Atypical strings can be described by (1) the deviation, and (2) their index within the atypical set. The number of atypical strings is exponentially smaller than 2ⁿ (by the binomial coefficient bound), so their index takes fewer than n bits. This contradicts incompressibility.

This proof is more informative than the standard probabilistic proof because it gives a quantitative bound: the "deficiency" d controls the deviation. Fully incompressible strings (d = O(1)) have frequencies very close to 1/2.

Check: An incompressible string has frequency of ones close to 1/2 because...

Strings with wrong frequency are few enough to have short descriptions, contradicting incompressibility All binary strings have exactly 50% ones The counting argument doesn't apply to frequencies

Chapter 5: Forbidden Substrings

Fix a set F of "forbidden" patterns. A string is F-free if it contains none of the patterns in F as a substring. How many F-free strings of length n are there?

Theorem: If F is a finite set of forbidden patterns, the number of F-free strings of length n grows as Θ(αⁿ) for some α depending on F. Moreover, α can be computed from the Rauzy graph of F — it is the spectral radius of the adjacency matrix of the graph of allowed transitions.

The Kolmogorov complexity approach: an F-free string can be described by specifying which transitions it uses. The number of F-free strings is related to the complexity of the "most complex" F-free string, which is related to the growth rate α.

This gives a clean proof that the growth rate is exponential (not, say, polynomial). The incompressible F-free strings are the "generic" ones, and their complexity determines α.

Example: If F = {11} (no two consecutive ones), the F-free strings of length n are counted by the Fibonacci sequence. The growth rate is α = (1 + √5)/2 ≈ 1.618. An incompressible F-free string of length n has complexity approximately n · log α ≈ 0.694n.

Check: The number of strings avoiding the pattern "11" grows as...

Θ(φⁿ) where φ is the golden ratio (Fibonacci growth) Θ(2ⁿ) (exponential in base 2) Θ(n²) (polynomial)

Chapter 6: A Combinatorial Inequality

Here is an example of using complexity to prove a pure combinatorial statement:

Theorem (LYM inequality): If A₀, A₁, ..., A_n are antichains in the subset lattice of {1, ..., n} (where A_k ⊆ C(n,k)), then ∑_k |A_k| / C(n,k) ≤ 1.

Proof by complexity: Each set S ∈ A_k is described by its index in A_k (taking log|A_k| bits) plus k (taking O(log n) bits). By the antichain property, these descriptions are "independent" for different k. The total must satisfy a counting constraint that yields the LYM inequality.

The complexity proof is more conceptual than the standard proof: instead of manipulating binomial coefficients, we argue about information content.

Check: The complexity approach to combinatorial inequalities works by...

Translating cardinality bounds into description length bounds Enumerating all possible strings Computing C(x) for each element

Chapter 7: Lipschitz Transformations

A function f: {0,1}ⁿ → {0,1}ⁿ is Lipschitz if changing one bit of the input changes at most one bit of the output. Bijective Lipschitz functions are "local" permutations — they rearrange bits without propagating changes far.

Theorem: No sequence of bijective Lipschitz transformations can produce all permutations of {0,1}ⁿ. In other words, Lipschitz transformations are not transitive: there exist strings x and y such that no sequence of Lipschitz bijections maps x to y.

The proof uses Kolmogorov complexity to show that Lipschitz bijections can only change C(x) by O(1) per application. So a string of complexity 0 (like 0ⁿ) can never be mapped to a string of complexity n by any bounded number of Lipschitz steps.

Check: Lipschitz bijections preserve Kolmogorov complexity up to...

O(1) per application (they are "local" and can't create much new information) Exactly (complexity is unchanged) O(n) per application

Chapter 8: Forbidden Substring Explorer

The simulation below lets you choose a forbidden pattern and see how the number of valid strings grows with length. Compare the growth rate to 2ⁿ (unrestricted) and observe how the forbidden pattern constrains the count.

Forbidden Substring Growth Rate

Select a forbidden pattern. The plot shows the number of valid strings of each length (log scale). The growth rate α determines the slope.

Check: Forbidding a longer pattern generally...

Leaves more valid strings (the growth rate α is closer to 2) Leaves fewer valid strings Has no effect on the count

Chapter 9: Summary

Application	Key idea
Primes	Few primes ⇒ short factorization descriptions ⇒ contradicts incompressibility
Tape copying	Head is bottleneck: O(1) bits per crossing ⇒ Ω(n²) steps
Multi-head	k+1 heads track more info than k heads
LLN	Wrong frequency ⇒ small set ⇒ short description
Forbidden substrings	Growth rate from Rauzy graph spectral radius
Lipschitz	Local bijections can't change complexity fast

Looking ahead: Chapter 9 approaches randomness from a completely different angle: the frequency and game-theoretic perspectives. Von Mises, martingales, and betting strategies provide alternative characterizations that connect beautifully to Martin-Löf's definition.

Final check: The incompressibility method proves properties of "generic" objects by showing...

Objects without the property can be described in fewer bits, so most objects must have it All objects are compressible C(x) is always equal to |x|