Shen, Uspensky & Vereshchagin — Chapter 1

Plain Kolmogorov Complexity

The formal definition, the counting argument, incompressible strings, and the algorithmic properties that make C(x) a powerful proof tool.

Prerequisites: Introduction (description modes, optimal decompressor, non-computability).
10
Chapters
2
Simulations
10
Quizzes

Chapter 0: Recap

In the Introduction, we established three foundational ideas:

  1. A description mode D is a computable function. D(y) = x means y is a description of x.
  2. An optimal description mode D0 exists — one that is at least as good as every other mode, up to an additive constant.
  3. Kolmogorov complexity C(x) = min{|y| : D0(y) = x} is defined up to O(1) and is not computable.

Chapter 1 of the book formalizes this further, establishing the precise definition (the "plain" version, to distinguish it from prefix and monotone versions that come later) and deriving its key properties.

Why "plain"? Later chapters will introduce prefix complexity K(x) (Chapter 4) and monotone complexity KM(x) (Chapter 5). These are refinements needed for deeper theorems about randomness. For now, plain complexity C(x) is our workhorse.
Check: The "plain" in "plain Kolmogorov complexity" distinguishes it from...

Chapter 1: The Formal Definition

Let Σ = {0, 1} be the binary alphabet and Σ* the set of all finite binary strings, including the empty string λ. A description mode (decompressor) is a partial computable function D: Σ* → Σ*.

"Partial" means D may not halt on some inputs — and that's fine. A description mode doesn't need to be defined everywhere. "Computable" means there's an algorithm that, given y in the domain of D, eventually halts and returns D(y).

CD(x) = min{ l(y) : D(y) = x }

Here l(y) denotes the length of y. If no y maps to x under D, we set CD(x) = +∞.

D1 is not worse than D2 if CD1(x) ≤ CD2(x) + O(1). D is optimal if it is not worse than every other description mode.

Construction of optimal D: Take a universal interpreter U(p, x). Define D(⟨p, y⟩) = U(p, y). The encoding ⟨p, y⟩ must let us recover p from the concatenation — a self-delimiting encoding of p suffices. One standard trick: encode p as p̅ = double each bit and add "01" as terminator. Then |p̅| = 2|p| + 2.

From now on, we fix an optimal D and write C(x) instead of CD(x). Everything is "up to O(1)."

Check: What does CD(x) = +∞ mean?

Chapter 2: The Counting Argument

This is the most frequently used tool in Kolmogorov complexity. It's beautifully simple.

Theorem 5 (Counting): The number of binary strings x with C(x) < n is strictly less than 2n.

Proof: A description of length less than n is a binary string of length 0, 1, ..., or n−1. The total number of such strings is 1 + 2 + 4 + ... + 2n−1 = 2n − 1 < 2n. Each description maps to at most one string (D is a function), so at most 2n − 1 strings can have complexity less than n.

Note that this bound has no hidden constant — it holds for any optimal description mode.

Corollary: For any n, there exists a binary string of length n with C(x) ≥ n. In fact, at least 2n − 2n + 1 = 1 string of length n has this property. But actually, the fraction is much larger...

Among all 2n strings of length n:

The counting argument also gives us C(x) ≤ l(x) + O(1), since the identity is a valid description mode. Combining: for most n-bit strings, C(x) is close to n.

Check: How many binary strings of length 20 have C(x) < 15?

Chapter 3: Incompressible Strings

A string x is called incompressible if C(x) ≥ l(x). These are the strings that contain "maximal information" — no program can describe them more concisely than the string itself.

From the counting argument, we know incompressible strings of every length exist. In fact, most strings are incompressible (or nearly so). But which specific strings are incompressible?

Here's the catch: you can never prove a specific string is incompressible (in general). Remember, C(x) is not computable. Even worse, no consistent formal system F can prove "C(x) ≥ n" for n larger than C(F) + O(1) — the complexity of the formal system itself. This is essentially Gödel's incompleteness theorem, rephrased in terms of Kolmogorov complexity.

Chaitin's Incompleteness Theorem: For any consistent formal system F, there is a constant cF such that F cannot prove "C(x) ≥ n" for any n > cF. The constant cF is roughly the complexity of F itself. Powerful axiom systems can prove more strings are complex, but there is always a ceiling.

The average complexity of strings of length n is n + O(1). Here's why: let ak be the fraction of n-bit strings with complexity n − k. The average deficit from n is ∑ k · ak. Since ak ≤ 2−k (counting argument), this sum is bounded by ∑ k/2k, which converges. So the average complexity is n minus a bounded constant.

Check: Can you write a program that determines whether a given string is incompressible?

Chapter 4: Complexity of Numbers

Kolmogorov complexity is defined for binary strings, but we often want to talk about the complexity of other objects: natural numbers, graphs, tuples, sets. The non-growth theorem makes this easy.

A natural number n can be encoded as a binary string in several ways:

EncodingExample (n=5)Length
Binary representation101log n + O(1)
Lexicographic enumeration110 (6th string)log n + O(1)
Unary (n ones)11111n

Since any two of these encodings are related by computable functions, the complexities they induce differ by at most O(1). So we can speak of C(n) without specifying the encoding.

The key bound: C(n) ≤ log n + O(1). The binary representation of n has length log n + O(1), and C(x) ≤ l(x) + O(1).

Deleting a bit: Removing the last bit of a string changes its complexity by at most O(1). Why? The functions x ↦ x0, x ↦ x1, and x ↦ (x without last bit) are all computable. So C(x0), C(x1), and C(x with last bit removed) all lie within O(1) of each other.

But deleting an arbitrary bit (not the last one) can change complexity dramatically. Consider x = 02n (2n zeros). Its complexity is at most C(n) + O(1) ≈ log n. Now flip one bit at some position. There are 2n possible results, and at least one must have complexity ≥ n (by counting). So flipping one bit can increase complexity from log n to n.

Incrementing a number by 1 changes C(n) by at most O(1). This gives a Lipschitz property: |C(m) − C(n)| ≤ c|m − n| for some constant c. In fact, the bound is tighter: |C(m) − C(n)| ≤ 2 log|m − n| + O(1) when m ≠ n.

Check: The Kolmogorov complexity of the number n is at most...

Chapter 5: Upper Semicomputability

While C(x) is not computable, it is upper semicomputable (also called enumerable from above). This means there exists a computable function F(x, k) such that:

F(x, 0) ≥ F(x, 1) ≥ F(x, 2) ≥ ... → C(x)

The sequence F(x, k) decreases monotonically and converges to C(x). You get progressively better upper bounds, but you never know when you've reached the true value.

The algorithm is simple: run all possible programs in parallel. Whenever program y halts and outputs x, you have a new candidate description. Update your best bound: C(x) ≤ l(y). Over time, you discover shorter and shorter programs for x. In the limit, you find the shortest — but the algorithm never announces "I'm done."

Asymmetry of complexity: Low complexity is verifiable (exhibit a short program). High complexity is not verifiable in general (you'd need to check all short programs, but you can't know which ones halt). This is why C(x) is enumerable from above but not from below.

Formally, a function f: Σ* → N is upper semicomputable if the set {(x, n) : f(x) < n} is computably enumerable. Equivalently, if there is a decreasing computable sequence converging to f(x) for each x.

The set Sn = {x : C(x) < n} is also computably enumerable (you enumerate it by running all programs and outputting x whenever you find a description of length less than n). But the set {x : C(x) ≥ n} is not computably enumerable in general — this is another face of non-computability.

Check: Upper semicomputability of C(x) means...

Chapter 6: Enumerable Families of Sets

Theorem 7 is one of the most useful structural results in Kolmogorov complexity theory. To state it, we need the notion of an enumerable family of sets.

A set V of strings is computably enumerable (c.e.) if there is an algorithm that lists all elements of V (possibly with repetitions, possibly never halting if V is infinite). Think of a printer that spits out strings one by one — eventually every element of V appears.

A family of sets Vn (indexed by n) is enumerable if the set {(n, x) : x ∈ Vn} is c.e. This is stronger than each Vn being c.e. individually — we need a uniform enumeration algorithm that works for all n simultaneously.

Example: The family Sn = {x : C(x) < n} is enumerable. To enumerate pairs (n, x) in this set: run all programs in parallel. When program y halts with output x, output the pair (l(y) + 1, x), then (l(y) + 2, x), then (l(y) + 3, x), and so on. Each pair correctly certifies that C(x) < l(y) + 1, hence C(x) < n for all n > l(y).

The key subtlety: knowing that each Vn is c.e. does not make the family enumerable. Consider Vn = {0} if n is in some non-c.e. set S, and Vn = ∅ otherwise. Each Vn is finite (hence c.e.), but the family is not enumerable because enumerating it would solve the membership problem for S.

Check: An enumerable family of sets Vn requires...

Chapter 7: Theorem 7 — Cardinality and Complexity

This theorem says that "having small cardinality" and "having low complexity" are the same thing, for enumerable families.

Theorem 7:
(a) The family Sn = {x : C(x) < n} is enumerable, and |Sn| < 2n for all n.
(b) If Vn is an enumerable family with |Vn| < 2n for all n, then there exists c such that C(x) < n + c for all n and all x ∈ Vn.

Part (a) we already know. Part (b) is the powerful direction: any enumerable family of "small" sets is automatically a family of "simple" strings.

Proof of (b): Define a new description mode D'. For strings of length n, use them as descriptions of elements of Vn. Specifically, let xk be the kth element of Vn as enumerated. Let yk be the kth binary string of length n. Set D'(yk) = xk. Since |Vn| < 2n, every element of Vn gets a description of length n. Since D' is computable, there exists c with C(x) ≤ n + c for all x ∈ Vn.

Intuition: Small cardinality = low complexity because: if there are few objects in a set, you can assign each a short binary "name" (index). If the set is enumerable, the naming scheme is computable, so these names serve as descriptions.

The theorem generalizes: a function f(x) is upper semicomputable iff f(x) ≥ C(x) − O(1) whenever the level sets {x : f(x) < n} form an enumerable family with |{x : f(x) < n}| < 2n. In other words, C(x) is the smallest upper semicomputable function (up to O(1)) with the counting property.

Check: Theorem 7 says that for an enumerable family of sets with |Vn| < 2n...

Chapter 8: Complexity Explorer

Let's build intuition about how Kolmogorov complexity behaves. The simulation below lets you experiment with strings and see estimates of their complexity.

Of course, we can't compute C(x) exactly. But we can compute rough upper bounds by trying simple compression strategies: run-length encoding, repetition detection, and pattern matching. The simulation uses these heuristics to give a (loose) upper bound.

Complexity Explorer

Enter a binary string or use the generators. The bar shows the ratio of estimated complexity to string length. Truly random strings hover near 100%; patterned strings compress significantly.

What to notice: All-zeros and periodic strings compress dramatically — their complexity grows only logarithmically with length. Random strings barely compress at all. Digits of √2 are interesting: they look random but have low complexity because a short program can compute them.
Check: The digits of π form a string that looks random. Is this string incompressible?

Chapter 9: Summary

ResultStatement
Upper boundC(x) ≤ l(x) + O(1)
Non-growthC(A(x)) ≤ C(x) + O(1) for computable A
Counting|{x : C(x) < n}| < 2n
Existence of incompressiblesFor every n, some x of length n has C(x) ≥ n
Average complexityAverage C(x) over n-bit strings is n + O(1)
Non-computabilityC(x) is not computable
Upper semicomputabilityC(x) is enumerable from above
Complexity of numbersC(n) ≤ log n + O(1)
Theorem 7Small enumerable families = low complexity
Looking ahead: Chapter 2 extends these ideas to pairs of strings (how much information does (x, y) contain?) and conditional complexity (how much information does x contain given y?). The highlight is the Kolmogorov-Levin theorem: C(x, y) = C(x) + C(y|x) + O(log n) — the chain rule for Kolmogorov complexity.
Final check: What is the average Kolmogorov complexity of n-bit strings?