The incompressibility method in action: proofs about primes, automata, tape complexity, and forbidden substrings.
The incompressibility method is a powerful proof technique. The recipe:
The beauty of this method: it replaces complex combinatorial arguments with simple information-theoretic reasoning. Instead of counting elements directly, you argue about description lengths.
Here is a Kolmogorov complexity proof that there are infinitely many primes.
Proof: Suppose there are only k primes: p1, ..., pk. Every natural number n ≤ N can be written as n = p1a1 ·...· pkak. Each exponent ai is at most logpi N ≤ log N. So n is determined by the tuple (a1, ..., ak), which can be encoded in k · log log N + O(1) bits.
But for large N, there exist numbers n ≤ N with C(n) ≥ log N − O(1). For these numbers, k · log log N ≥ log N − O(1), which fails for large enough N (since log log N grows much slower than log N). Contradiction.
A Turing machine has a tape and a head that moves left or right. Suppose the input is written on the first n cells, and the machine must copy it to cells n+1 through 2n. How many steps does this take?
Proof sketch: Take an incompressible input x of length n. The machine must "move" all the information from left to right. The tape head is the bottleneck: it can carry at most O(log |Q|) bits per step (where |Q| is the number of states — a constant). Each time the head crosses the boundary between the two halves, it transfers at most O(1) bits. The head must cross the boundary Ω(n) times to transfer n bits. But between crossings, the head must traverse Ω(n) cells. Total: Ω(n) crossings × Ω(n) steps per crossing = Ω(n2).
The incompressibility of x is crucial: if x were all zeros, the machine could just write zeros on the right half without reading the left half at all.
A finite automaton with k heads reads a fixed input string. Each head moves independently. The automaton accepts or rejects based on the symbols under all k heads simultaneously.
How powerful are k heads vs. k+1 heads? Using incompressibility, one can show:
The proof constructs a language that requires tracking k+1 independent positions in the string. With only k heads, the automaton can't monitor all positions simultaneously. Incompressibility ensures that the positions are "independent" — knowing k of them gives no information about the (k+1)th.
This is a strict hierarchy: more heads give strictly more power. The incompressibility method provides an elegant proof that avoids the tedious pumping-lemma-style arguments usually needed for automata hierarchy theorems.
The incompressibility method gives an alternative proof of the Strong Law of Large Numbers:
Proof idea: If x has frequency far from 1/2, then x is one of a small set of "atypical" strings. Atypical strings can be described by (1) the deviation, and (2) their index within the atypical set. The number of atypical strings is exponentially smaller than 2n (by the binomial coefficient bound), so their index takes fewer than n bits. This contradicts incompressibility.
This proof is more informative than the standard probabilistic proof because it gives a quantitative bound: the "deficiency" d controls the deviation. Fully incompressible strings (d = O(1)) have frequencies very close to 1/2.
Fix a set F of "forbidden" patterns. A string is F-free if it contains none of the patterns in F as a substring. How many F-free strings of length n are there?
The Kolmogorov complexity approach: an F-free string can be described by specifying which transitions it uses. The number of F-free strings is related to the complexity of the "most complex" F-free string, which is related to the growth rate α.
This gives a clean proof that the growth rate is exponential (not, say, polynomial). The incompressible F-free strings are the "generic" ones, and their complexity determines α.
Here is an example of using complexity to prove a pure combinatorial statement:
Proof by complexity: Each set S ∈ Ak is described by its index in Ak (taking log|Ak| bits) plus k (taking O(log n) bits). By the antichain property, these descriptions are "independent" for different k. The total must satisfy a counting constraint that yields the LYM inequality.
The complexity proof is more conceptual than the standard proof: instead of manipulating binomial coefficients, we argue about information content.
A function f: {0,1}n → {0,1}n is Lipschitz if changing one bit of the input changes at most one bit of the output. Bijective Lipschitz functions are "local" permutations — they rearrange bits without propagating changes far.
The proof uses Kolmogorov complexity to show that Lipschitz bijections can only change C(x) by O(1) per application. So a string of complexity 0 (like 0n) can never be mapped to a string of complexity n by any bounded number of Lipschitz steps.
The simulation below lets you choose a forbidden pattern and see how the number of valid strings grows with length. Compare the growth rate to 2n (unrestricted) and observe how the forbidden pattern constrains the count.
Select a forbidden pattern. The plot shows the number of valid strings of each length (log scale). The growth rate α determines the slope.
| Application | Key idea |
|---|---|
| Primes | Few primes ⇒ short factorization descriptions ⇒ contradicts incompressibility |
| Tape copying | Head is bottleneck: O(1) bits per crossing ⇒ Ω(n2) steps |
| Multi-head | k+1 heads track more info than k heads |
| LLN | Wrong frequency ⇒ small set ⇒ short description |
| Forbidden substrings | Growth rate from Rauzy graph spectral radius |
| Lipschitz | Local bijections can't change complexity fast |