Ch 1: Plain Kolmogorov Complexity — Kolmogorov Complexity

Chapter 0: Recap

In the Introduction, we established three foundational ideas:

A description mode D is a computable function. D(y) = x means y is a description of x.
An optimal description mode D₀ exists — one that is at least as good as every other mode, up to an additive constant.
Kolmogorov complexity C(x) = min{|y| : D₀(y) = x} is defined up to O(1) and is not computable.

Chapter 1 of the book formalizes this further, establishing the precise definition (the "plain" version, to distinguish it from prefix and monotone versions that come later) and deriving its key properties.

Why "plain"? Later chapters will introduce prefix complexity K(x) (Chapter 4) and monotone complexity KM(x) (Chapter 5). These are refinements needed for deeper theorems about randomness. For now, plain complexity C(x) is our workhorse.

Check: The "plain" in "plain Kolmogorov complexity" distinguishes it from...

Time-bounded complexity Prefix and monotone complexity (introduced later) Shannon entropy

Chapter 1: The Formal Definition

Let Σ = {0, 1} be the binary alphabet and Σ* the set of all finite binary strings, including the empty string λ. A description mode (decompressor) is a partial computable function D: Σ* → Σ*.

"Partial" means D may not halt on some inputs — and that's fine. A description mode doesn't need to be defined everywhere. "Computable" means there's an algorithm that, given y in the domain of D, eventually halts and returns D(y).

C_D(x) = min{ l(y) : D(y) = x }

Here l(y) denotes the length of y. If no y maps to x under D, we set C_D(x) = +∞.

D₁ is not worse than D₂ if C_D₁(x) ≤ C_D₂(x) + O(1). D is optimal if it is not worse than every other description mode.

Construction of optimal D: Take a universal interpreter U(p, x). Define D(⟨p, y⟩) = U(p, y). The encoding ⟨p, y⟩ must let us recover p from the concatenation — a self-delimiting encoding of p suffices. One standard trick: encode p as p̅ = double each bit and add "01" as terminator. Then |p̅| = 2|p| + 2.

From now on, we fix an optimal D and write C(x) instead of C_D(x). Everything is "up to O(1)."

Check: What does C_D(x) = +∞ mean?

No binary string y satisfies D(y) = x x is infinitely long D takes infinitely long to compute

Chapter 2: The Counting Argument

This is the most frequently used tool in Kolmogorov complexity. It's beautifully simple.

Theorem 5 (Counting): The number of binary strings x with C(x) < n is strictly less than 2ⁿ.

Proof: A description of length less than n is a binary string of length 0, 1, ..., or n−1. The total number of such strings is 1 + 2 + 4 + ... + 2ⁿ⁻¹ = 2ⁿ − 1 < 2ⁿ. Each description maps to at most one string (D is a function), so at most 2ⁿ − 1 strings can have complexity less than n.

Note that this bound has no hidden constant — it holds for any optimal description mode.

Corollary: For any n, there exists a binary string of length n with C(x) ≥ n. In fact, at least 2ⁿ − 2ⁿ + 1 = 1 string of length n has this property. But actually, the fraction is much larger...

Among all 2ⁿ strings of length n:

At most 2ⁿ⁻¹ have C(x) < n − 1 (half the strings are incompressible by even 1 bit)
At most 2ⁿ⁻¹⁰ have C(x) < n − 10 (only 1 in 1024 compresses by 10+ bits)
At most 2^n−k have C(x) < n − k (fraction 2^−k compress by k+ bits)

The counting argument also gives us C(x) ≤ l(x) + O(1), since the identity is a valid description mode. Combining: for most n-bit strings, C(x) is close to n.

Check: How many binary strings of length 20 have C(x) < 15?

Fewer than 2¹⁵ = 32,768 (out of 2²⁰ = 1,048,576 total) Exactly half of them None of them

Chapter 3: Incompressible Strings

A string x is called incompressible if C(x) ≥ l(x). These are the strings that contain "maximal information" — no program can describe them more concisely than the string itself.

From the counting argument, we know incompressible strings of every length exist. In fact, most strings are incompressible (or nearly so). But which specific strings are incompressible?

Here's the catch: you can never prove a specific string is incompressible (in general). Remember, C(x) is not computable. Even worse, no consistent formal system F can prove "C(x) ≥ n" for n larger than C(F) + O(1) — the complexity of the formal system itself. This is essentially Gödel's incompleteness theorem, rephrased in terms of Kolmogorov complexity.

Chaitin's Incompleteness Theorem: For any consistent formal system F, there is a constant c_F such that F cannot prove "C(x) ≥ n" for any n > c_F. The constant c_F is roughly the complexity of F itself. Powerful axiom systems can prove more strings are complex, but there is always a ceiling.

The average complexity of strings of length n is n + O(1). Here's why: let a_k be the fraction of n-bit strings with complexity n − k. The average deficit from n is ∑ k · a_k. Since a_k ≤ 2^−k (counting argument), this sum is bounded by ∑ k/2^k, which converges. So the average complexity is n minus a bounded constant.

Check: Can you write a program that determines whether a given string is incompressible?

Yes, just try all shorter programs No, because C(x) is not computable Only for strings shorter than 100 bits

Chapter 4: Complexity of Numbers

Kolmogorov complexity is defined for binary strings, but we often want to talk about the complexity of other objects: natural numbers, graphs, tuples, sets. The non-growth theorem makes this easy.

A natural number n can be encoded as a binary string in several ways:

Encoding	Example (n=5)	Length
Binary representation	101	log n + O(1)
Lexicographic enumeration	110 (6th string)	log n + O(1)
Unary (n ones)	11111	n

Since any two of these encodings are related by computable functions, the complexities they induce differ by at most O(1). So we can speak of C(n) without specifying the encoding.

The key bound: C(n) ≤ log n + O(1). The binary representation of n has length log n + O(1), and C(x) ≤ l(x) + O(1).

Deleting a bit: Removing the last bit of a string changes its complexity by at most O(1). Why? The functions x ↦ x0, x ↦ x1, and x ↦ (x without last bit) are all computable. So C(x0), C(x1), and C(x with last bit removed) all lie within O(1) of each other.

But deleting an arbitrary bit (not the last one) can change complexity dramatically. Consider x = 0²ⁿ (2n zeros). Its complexity is at most C(n) + O(1) ≈ log n. Now flip one bit at some position. There are 2n possible results, and at least one must have complexity ≥ n (by counting). So flipping one bit can increase complexity from log n to n.

Incrementing a number by 1 changes C(n) by at most O(1). This gives a Lipschitz property: |C(m) − C(n)| ≤ c|m − n| for some constant c. In fact, the bound is tighter: |C(m) − C(n)| ≤ 2 log|m − n| + O(1) when m ≠ n.

Check: The Kolmogorov complexity of the number n is at most...

log n + O(1) n n²

Chapter 5: Upper Semicomputability

While C(x) is not computable, it is upper semicomputable (also called enumerable from above). This means there exists a computable function F(x, k) such that:

F(x, 0) ≥ F(x, 1) ≥ F(x, 2) ≥ ... → C(x)

The sequence F(x, k) decreases monotonically and converges to C(x). You get progressively better upper bounds, but you never know when you've reached the true value.

The algorithm is simple: run all possible programs in parallel. Whenever program y halts and outputs x, you have a new candidate description. Update your best bound: C(x) ≤ l(y). Over time, you discover shorter and shorter programs for x. In the limit, you find the shortest — but the algorithm never announces "I'm done."

Asymmetry of complexity: Low complexity is verifiable (exhibit a short program). High complexity is not verifiable in general (you'd need to check all short programs, but you can't know which ones halt). This is why C(x) is enumerable from above but not from below.

Formally, a function f: Σ* → N is upper semicomputable if the set {(x, n) : f(x) < n} is computably enumerable. Equivalently, if there is a decreasing computable sequence converging to f(x) for each x.

The set S_n = {x : C(x) < n} is also computably enumerable (you enumerate it by running all programs and outputting x whenever you find a description of length less than n). But the set {x : C(x) ≥ n} is not computably enumerable in general — this is another face of non-computability.

Check: Upper semicomputability of C(x) means...

We can compute C(x) exactly but it takes a long time We can produce a decreasing sequence of upper bounds converging to C(x) We can compute C(x) for most strings

Chapter 6: Enumerable Families of Sets

Theorem 7 is one of the most useful structural results in Kolmogorov complexity theory. To state it, we need the notion of an enumerable family of sets.

A set V of strings is computably enumerable (c.e.) if there is an algorithm that lists all elements of V (possibly with repetitions, possibly never halting if V is infinite). Think of a printer that spits out strings one by one — eventually every element of V appears.

A family of sets V_n (indexed by n) is enumerable if the set {(n, x) : x ∈ V_n} is c.e. This is stronger than each V_n being c.e. individually — we need a uniform enumeration algorithm that works for all n simultaneously.

Example: The family S_n = {x : C(x) < n} is enumerable. To enumerate pairs (n, x) in this set: run all programs in parallel. When program y halts with output x, output the pair (l(y) + 1, x), then (l(y) + 2, x), then (l(y) + 3, x), and so on. Each pair correctly certifies that C(x) < l(y) + 1, hence C(x) < n for all n > l(y).

The key subtlety: knowing that each V_n is c.e. does not make the family enumerable. Consider V_n = {0} if n is in some non-c.e. set S, and V_n = ∅ otherwise. Each V_n is finite (hence c.e.), but the family is not enumerable because enumerating it would solve the membership problem for S.

Check: An enumerable family of sets V_n requires...

Each V_n to be finite A uniform algorithm that enumerates all pairs (n, x) with x ∈ V_n Each V_n to have the same cardinality

Chapter 7: Theorem 7 — Cardinality and Complexity

This theorem says that "having small cardinality" and "having low complexity" are the same thing, for enumerable families.

Theorem 7:
(a) The family S_n = {x : C(x) < n} is enumerable, and |S_n| < 2ⁿ for all n.
(b) If V_n is an enumerable family with |V_n| < 2ⁿ for all n, then there exists c such that C(x) < n + c for all n and all x ∈ V_n.

Part (a) we already know. Part (b) is the powerful direction: any enumerable family of "small" sets is automatically a family of "simple" strings.

Proof of (b): Define a new description mode D'. For strings of length n, use them as descriptions of elements of V_n. Specifically, let x_k be the kth element of V_n as enumerated. Let y_k be the kth binary string of length n. Set D'(y_k) = x_k. Since |V_n| < 2ⁿ, every element of V_n gets a description of length n. Since D' is computable, there exists c with C(x) ≤ n + c for all x ∈ V_n.

Intuition: Small cardinality = low complexity because: if there are few objects in a set, you can assign each a short binary "name" (index). If the set is enumerable, the naming scheme is computable, so these names serve as descriptions.

The theorem generalizes: a function f(x) is upper semicomputable iff f(x) ≥ C(x) − O(1) whenever the level sets {x : f(x) < n} form an enumerable family with |{x : f(x) < n}| < 2ⁿ. In other words, C(x) is the smallest upper semicomputable function (up to O(1)) with the counting property.

Check: Theorem 7 says that for an enumerable family of sets with |V_n| < 2ⁿ...

Every element of V_n has C(x) < n + O(1) Every element of V_n is incompressible V_n is decidable

Chapter 8: Complexity Explorer

Let's build intuition about how Kolmogorov complexity behaves. The simulation below lets you experiment with strings and see estimates of their complexity.

Of course, we can't compute C(x) exactly. But we can compute rough upper bounds by trying simple compression strategies: run-length encoding, repetition detection, and pattern matching. The simulation uses these heuristics to give a (loose) upper bound.

Complexity Explorer

Enter a binary string or use the generators. The bar shows the ratio of estimated complexity to string length. Truly random strings hover near 100%; patterned strings compress significantly.

What to notice: All-zeros and periodic strings compress dramatically — their complexity grows only logarithmically with length. Random strings barely compress at all. Digits of √2 are interesting: they look random but have low complexity because a short program can compute them.

Check: The digits of π form a string that looks random. Is this string incompressible?

No, a short program can compute π to any precision, so its prefixes have low complexity Yes, because the digits pass all statistical randomness tests It depends on the base

Chapter 9: Summary

Result	Statement
Upper bound	C(x) ≤ l(x) + O(1)
Non-growth	C(A(x)) ≤ C(x) + O(1) for computable A
Counting	\|{x : C(x) < n}\| < 2ⁿ
Existence of incompressibles	For every n, some x of length n has C(x) ≥ n
Average complexity	Average C(x) over n-bit strings is n + O(1)
Non-computability	C(x) is not computable
Upper semicomputability	C(x) is enumerable from above
Complexity of numbers	C(n) ≤ log n + O(1)
Theorem 7	Small enumerable families = low complexity

Looking ahead: Chapter 2 extends these ideas to pairs of strings (how much information does (x, y) contain?) and conditional complexity (how much information does x contain given y?). The highlight is the Kolmogorov-Levin theorem: C(x, y) = C(x) + C(y|x) + O(log n) — the chain rule for Kolmogorov complexity.

Final check: What is the average Kolmogorov complexity of n-bit strings?

n + O(1) — most strings are nearly incompressible n/2 log n