Shen, Uspensky & Vereshchagin — Chapter 10

Inequalities for Entropy,
Complexity, and Size

Uniform sets, typization, and the deep correspondence between information inequalities and combinatorial bounds.

Prerequisites: Chapters 2, 7 (pairs, conditional complexity, Shannon entropy). Comfort with combinatorics helps.
10
Chapters
2
Simulations
10
Quizzes

Chapter 0: Overview

This chapter explores a remarkable three-way correspondence between:

  1. Shannon entropy inequalities for random variables
  2. Kolmogorov complexity inequalities for individual strings
  3. Combinatorial size inequalities for finite sets

Many inequalities hold (or fail) simultaneously in all three worlds. This is not a coincidence — it reflects a deep structural connection.

The grand picture: Consider a linear inequality like H(X,Y) ≤ H(X) + H(Y). This holds for Shannon entropy (always), for Kolmogorov complexity (up to O(log n)), and for set cardinalities (|A × B| ≤ |A| · |B|). All three are manifestations of the same underlying combinatorial fact.
Check: The three-way correspondence connects inequalities about...

Chapter 1: Three Worlds

Let's see the three worlds side by side for a concrete example: the subadditivity inequality.

WorldInequalityInterpretation
EntropyH(X,Y) ≤ H(X) + H(Y)Joint entropy ≤ sum of marginals
ComplexityC(x,y) ≤ C(x) + C(y) + O(log n)Pair complexity ≤ sum (up to log)
Cardinality|A| ≤ |proj1(A)| · |proj2(A)|Set of pairs ≤ product of projections

All three are essentially the same statement in different languages. The entropy version talks about random variables, the complexity version about strings, and the cardinality version about sets.

Transfer principle: Any linear inequality that is true for Shannon entropy is also true for Kolmogorov complexity (up to O(log n)). The converse also holds. This means that to prove (or disprove) a complexity inequality, we can work in whichever world is most convenient.

But some non-linear inequalities behave differently in the three worlds. The Ingleton inequality, for example, holds for linear random variables but fails in general. Understanding these differences is the frontier of the theory.

Check: The subadditivity H(X,Y) ≤ H(X) + H(Y) corresponds in set theory to...

Chapter 2: Uniform Sets

A finite set S is uniform (or combinatorially regular) if all its elements have approximately the same complexity, and the set itself has a short description.

Definition: A set S of binary strings is uniform if:
(1) S is decidable (membership can be tested algorithmically),
(2) All elements x ∈ S have C(x) ≈ log|S| (up to O(log n) terms),
(3) C(S) is small (the set itself has a short description).

Uniform sets are the Kolmogorov analogues of "typical sets" in Shannon theory. Just as typical messages have entropy ≈ H, elements of uniform sets have complexity ≈ log|S|.

Why do uniform sets matter? Because they provide the "witnesses" for complexity inequalities. To prove C(x) + C(y) ≥ C(x,y) + ..., we construct a uniform set that achieves the bound.

Check: Elements of a uniform set S all have Kolmogorov complexity approximately...

Chapter 3: Constructing Uniform Sets

Given a string x of complexity k, we want to find a uniform set S containing x with |S| ≈ 2k. The construction uses the set of all strings with approximately the same complexity:

Sk = { x : C(x) ≈ k }

By the counting argument, |Sk| ≈ 2k. Every x with C(x) = k belongs to Sk. The set Sk is "almost" uniform — its elements have similar complexity by construction.

The challenge: Sk is not decidable (since C(x) is not computable). To get a decidable uniform set, we need more work: we approximate Sk using time-bounded or resource-bounded versions of complexity, or we use the "typization trick" to extract a decidable core.

For the purposes of proving inequalities, the non-decidability is often not a problem because we work with the complexity version of the inequality directly, and the uniform set serves as a conceptual tool.

Check: The set Sk = {x : C(x) ≈ k} is not directly usable as a uniform set because...

Chapter 4: Uniform Sets and Orbits

A clean source of uniform sets: the orbit of an element under a group action. If a finite group G acts on a set X, the orbit of x is O(x) = {g · x : g ∈ G}.

Orbits are uniform: If the group G is "computably presented" (its elements and action are computable), then O(x) is a decidable set. If the action is transitive on O(x), all elements of O(x) have similar complexity. The orbit size |O(x)| = |G|/|Stab(x)|, where Stab(x) is the stabilizer of x.

Example: the symmetric group Sn acts on binary strings of length n by permuting bit positions. The orbit of x consists of all rearrangements of x's bits. If x has k ones, the orbit has C(n, k) elements. All elements of the orbit have the same complexity (up to O(log n)).

Check: The orbit of a binary string under bit permutations contains all strings with...

Chapter 5: Almost Uniform Sets

In practice, perfect uniformity is hard to achieve. Almost uniform sets relax the requirement: elements have complexity within O(log n) of log|S|, and a small fraction of "outliers" is allowed.

Almost uniform suffices: For proving information inequalities (which have O(log n) error terms anyway), almost uniform sets work just as well as perfectly uniform ones. The small errors get absorbed into the O(log n) terms.

The key technical lemma: given any finite set A and a string x ∈ A, we can find an almost uniform subset A' ⊆ A containing x with |A'| ≥ |A|/poly(n) and all elements having complexity within O(log n) of each other.

Check: Almost uniform sets are sufficient for proving information inequalities because...

Chapter 6: The Typization Trick

The typization trick is a technique for converting complexity statements into combinatorial ones. Given strings x, y with specified complexities C(x), C(y), C(x,y), we construct sets A, B, R such that:

The trick: Group strings by complexity level. All strings of complexity ≈ k form a "type class." Within each type class, elements are interchangeable for the purpose of the inequality. By fixing the type (the complexity profile), we reduce the problem to a purely combinatorial one about set sizes.

This is the central technique for transferring between the three worlds. Any complexity inequality can be proved by: (1) typize to get sets with prescribed sizes, (2) prove the corresponding combinatorial inequality, (3) translate back.

Check: Typization converts a complexity inequality into...

Chapter 7: Combinatorial Interpretations

Let's see some concrete examples of the correspondence:

Example 1 (Chain rule):
Entropy: H(X,Y) = H(X) + H(Y|X)
Complexity: C(x,y) = C(x) + C(y|x) + O(log n)
Combinatorial: A set of pairs with |A| ≤ pq can be split into P (projection size ≤ p) and Q (max section size ≤ q).
Example 2 (Submodularity):
Entropy: H(X,Y) + H(X,Z) ≥ H(X) + H(X,Y,Z)
Complexity: C(x,y) + C(x,z) ≥ C(x) + C(x,y,z) − O(log n)
Combinatorial: For a set A ⊆ X × Y × Z, |projXY(A)| · |projXZ(A)| ≥ |projX(A)| · |A|.

The submodularity inequality is the Shannon analogue of the Loomis-Whitney inequality in combinatorics. Each version implies the others (up to the appropriate error terms).

Check: The chain rule for complexity corresponds combinatorially to splitting a set of pairs into...

Chapter 8: Information Region Visualizer

For two random variables (or strings), the Shannon entropy region is determined by H(X), H(Y), H(X,Y). These three values must satisfy certain inequalities. The simulation below shows the feasible region.

Two-Variable Information Region

The Venn diagram shows the decomposition of H(X,Y) into H(X|Y), I(X;Y), and H(Y|X). Drag the sliders to change the entropies and see how the mutual information changes.

H(X)60
H(Y)40
I(X;Y)20
What to notice: Mutual information I(X;Y) is the overlap. H(X|Y) = H(X) − I(X;Y) is the "exclusive" information in X. H(X,Y) = H(X) + H(Y) − I(X;Y) is the total. When I = 0 (independent), there's no overlap. When I = min(H(X), H(Y)) (one determines the other), one circle is inside the other.
Check: If I(X;Y) = H(X), then...

Chapter 9: Summary

ConceptKey idea
Three-way correspondenceEntropy, complexity, and cardinality inequalities are equivalent (up to log terms)
Uniform setsSets where all elements have the same complexity ≈ log|S|
OrbitsGroup orbits provide natural uniform sets
TypizationGroup by complexity level, then use combinatorics
TransferShannon inequalities transfer to Kolmogorov and back
The big picture: This chapter reveals that information theory, algorithmic information theory, and combinatorics are three perspectives on the same underlying structure. The information inequalities — subadditivity, submodularity, the chain rule — are not specific to any one framework. They are universal laws of information, expressed in whichever language suits the problem at hand.

And with that, we reach the end of the textbook's main arc. From the simple idea of "shortest program" in the Introduction, through the formal theory of plain, prefix, and monotone complexity, through the characterization of randomness via Martin-Löf tests and martingales, to the deep connections between individual complexity and ensemble entropy — the theory of Kolmogorov complexity provides a unified foundation for understanding information, randomness, and computation.

Final check: The transfer principle between entropy and complexity says that any Shannon inequality...