Prefix-free codes, semimeasures, and the refined complexity measure K(x) that connects to randomness and probability.
Plain complexity C(x) has a flaw: it doesn't behave well with probability. We'd like to say that "the probability that a random program outputs x" is related to 2−C(x). But this fails because plain descriptions can overlap — a program of length 3 that outputs "hello" and a program of length 5 that outputs "hello" aren't related in any nice probabilistic way.
The fix is to require descriptions to be prefix-free: no description is a prefix of another. This is exactly the constraint that makes Kraft's inequality work, and it connects complexity to probability.
With plain complexity, ∑x 2−C(x) can be infinite (many strings can share the same description length). With prefix complexity K(x), the sum ∑x 2−K(x) is always at most 1 — it defines a semimeasure on strings. This connection to probability is why prefix complexity is the "right" version for randomness.
A semimeasure on the natural numbers is a function m: N → [0, ∞) such that ∑n m(n) ≤ 1. It's like a probability distribution, except the total mass may be less than 1 (some probability "leaks out").
Where do semimeasures come from? Consider a randomized algorithm: a program that flips coins as it runs. If the program always halts, the probability of each output n defines a measure. If it might not halt, some probability is "lost" to non-termination — and we get a semimeasure.
A semimeasure m is lower semicomputable (or enumerable from below) if there is a computable sequence m0(n) ≤ m1(n) ≤ ... converging to m(n). We can approximate each m(n) from below but never know the exact value.
Just as there is an optimal description mode for complexity, there is a maximal lower semicomputable semimeasure. Call it M.
Construction: Enumerate all programs p1, p2, .... Let mi be the semimeasure generated by program pi (with random input). Define M(n) = ∑i 2−i · mi(n). This is a weighted average over all lower semicomputable semimeasures. For any specific mj, M(n) ≥ 2−j · mj(n).
The function M is called the a priori probability or Solomonoff's prior. M(n) is the probability that a universal randomized algorithm outputs n. Strings with high M(n) are those that "many programs" produce — simple strings. Strings with low M(n) are rare outputs — complex strings.
A prefix machine is a partial computable function D: Σ* → Σ* whose domain is prefix-free: if D(p) is defined and D(q) is defined, then neither p nor q is a prefix of the other.
Visually, think of the domain of D as a set of leaves in the infinite binary tree. Each leaf (valid input) maps to an output string. No input is an "ancestor" of another input in the tree.
The prefix complexity of x with respect to a prefix machine D is:
An optimal prefix machine exists by the same universality argument. Fix one and define K(x) = KD(x). Like C(x), K(x) is defined up to O(1) and is not computable.
There's an equivalent way to think about prefix machines: a machine that reads its input bit by bit from a one-way tape and decides on its own when to stop reading. The machine never "looks ahead" — it processes bits one at a time and halts after reading some prefix of the tape.
This model makes the prefix-free condition natural. If the machine halts after reading bits p, it never reads any extension of p. So the set of inputs on which the machine halts forms a prefix-free set automatically.
The two models (prefix-free domain vs. self-delimiting reader) are equivalent. Both give the same complexity K(x) up to O(1).
The connection between prefix complexity and a priori probability:
This theorem has two parts:
K(x) ≤ −log M(x) + O(1): Consider the optimal prefix machine D. Define m(x) = ∑ 2−|p| over all p with D(p) = x. By Kraft's inequality, m is a semimeasure. Its value m(x) = 2−K(x) (up to a constant factor). Since M dominates all semicomputable semimeasures, M(x) ≥ c · m(x) = c · 2−K(x), giving K(x) ≥ −log M(x) − O(1).
K(x) ≥ −log M(x) − O(1): Given the semimeasure M, construct a prefix machine whose description lengths approximate −log M(x). This uses the Kraft inequality in reverse: given a semimeasure (weights summing to at most 1), construct a prefix-free code with matching lengths.
Prefix complexity K(x) shares many properties with plain complexity C(x) but has important differences:
| Property | C(x) | K(x) |
|---|---|---|
| Upper bound | C(x) ≤ |x| + O(1) | K(x) ≤ |x| + 2 log|x| + O(1) |
| Non-growth | C(f(x)) ≤ C(x) + O(1) | K(f(x)) ≤ K(x) + O(1) |
| Counting | |{x: C(x) < n}| < 2n | |{x: K(x) < n}| < 2n |
| Kraft | ∑ 2−C(x) can be ∞ | ∑ 2−K(x) ≤ 1 |
| Relation | C(x) ≤ K(x) ≤ C(x) + 2 log C(x) + O(1) | |
The key difference: K(x) ≤ |x| + 2 log|x| + O(1), not |x| + O(1). The extra 2 log|x| is the cost of self-delimiting: the machine must know when the input ends, and encoding the length costs log|x| + O(1) bits.
The subadditivity property is cleaner for K: K(x, y) ≤ K(x) + K(y) + O(1). No logarithmic overhead! This is because prefix descriptions are self-delimiting — you can concatenate them without a separator.
For most applications, C(x) and K(x) give the same results up to logarithmic terms. But the distinction matters crucially for randomness.
The reason K works better: Kraft's inequality ∑ 2−K(x) ≤ 1 means that 2−K(x) defines a semimeasure. This lets us use K as a test for randomness against arbitrary computable measures. Plain complexity C doesn't have this property.
In the incompressibility method (proofs about combinatorics), C is usually sufficient because the logarithmic difference between C and K is absorbed into the O(log n) error terms.
The simulation below visualizes prefix-free codes as leaves of a binary tree. Each leaf represents a valid description. The prefix-free constraint means no leaf is an ancestor of another — once you reach a codeword, you stop.
Click nodes to toggle them as codewords (leaves). The tree enforces the prefix-free constraint: selecting a node blocks its descendants. The Kraft sum is displayed below.
| Concept | Definition/Result |
|---|---|
| Semimeasure | m: Σ* → [0,∞) with ∑ m(x) ≤ 1 |
| A priori probability M | Maximal lower semicomputable semimeasure |
| Prefix machine | Computable D with prefix-free domain |
| K(x) | Prefix complexity: shortest prefix-free description |
| Main theorem | K(x) = −log M(x) + O(1) |
| Kraft | ∑ 2−K(x) ≤ 1 |
| K vs C | C(x) ≤ K(x) ≤ C(x) + 2 log C(x) + O(1) |
| Subadditivity | K(x,y) ≤ K(x) + K(y) + O(1) |