Shen, Uspensky & Vereshchagin — Chapter 10

Inequalities for Entropy,
Complexity, and Size

Uniform sets, typization, and the deep correspondence between information inequalities and combinatorial bounds.

Prerequisites: Chapters 2, 7 (pairs, conditional complexity, Shannon entropy). Comfort with combinatorics helps.

Chapters

Simulations

Quizzes

Chapter 0: Overview

This chapter explores a remarkable three-way correspondence between:

Shannon entropy inequalities for random variables
Kolmogorov complexity inequalities for individual strings
Combinatorial size inequalities for finite sets

Many inequalities hold (or fail) simultaneously in all three worlds. This is not a coincidence — it reflects a deep structural connection.

The grand picture: Consider a linear inequality like H(X,Y) ≤ H(X) + H(Y). This holds for Shannon entropy (always), for Kolmogorov complexity (up to O(log n)), and for set cardinalities (|A × B| ≤ |A| · |B|). All three are manifestations of the same underlying combinatorial fact.

Check: The three-way correspondence connects inequalities about...

Shannon entropy, Kolmogorov complexity, and set cardinalities Primes, composites, and perfect numbers Time complexity, space complexity, and circuit size

Chapter 1: Three Worlds

Let's see the three worlds side by side for a concrete example: the subadditivity inequality.

World	Inequality	Interpretation
Entropy	H(X,Y) ≤ H(X) + H(Y)	Joint entropy ≤ sum of marginals
Complexity	C(x,y) ≤ C(x) + C(y) + O(log n)	Pair complexity ≤ sum (up to log)
Cardinality	\|A\| ≤ \|proj₁(A)\| · \|proj₂(A)\|	Set of pairs ≤ product of projections

All three are essentially the same statement in different languages. The entropy version talks about random variables, the complexity version about strings, and the cardinality version about sets.

Transfer principle: Any linear inequality that is true for Shannon entropy is also true for Kolmogorov complexity (up to O(log n)). The converse also holds. This means that to prove (or disprove) a complexity inequality, we can work in whichever world is most convenient.

But some non-linear inequalities behave differently in the three worlds. The Ingleton inequality, for example, holds for linear random variables but fails in general. Understanding these differences is the frontier of the theory.

Check: The subadditivity H(X,Y) ≤ H(X) + H(Y) corresponds in set theory to...

A set of pairs has cardinality at most the product of its projections The union is at most the sum of cardinalities Every set is finite

Chapter 2: Uniform Sets

A finite set S is uniform (or combinatorially regular) if all its elements have approximately the same complexity, and the set itself has a short description.

Definition: A set S of binary strings is uniform if:
(1) S is decidable (membership can be tested algorithmically),
(2) All elements x ∈ S have C(x) ≈ log|S| (up to O(log n) terms),
(3) C(S) is small (the set itself has a short description).

Uniform sets are the Kolmogorov analogues of "typical sets" in Shannon theory. Just as typical messages have entropy ≈ H, elements of uniform sets have complexity ≈ log|S|.

Why do uniform sets matter? Because they provide the "witnesses" for complexity inequalities. To prove C(x) + C(y) ≥ C(x,y) + ..., we construct a uniform set that achieves the bound.

Check: Elements of a uniform set S all have Kolmogorov complexity approximately...

log|S| (matching the information content of a random element) |S| (proportional to the set's size) 0 (they are all simple)

Chapter 3: Constructing Uniform Sets

Given a string x of complexity k, we want to find a uniform set S containing x with |S| ≈ 2^k. The construction uses the set of all strings with approximately the same complexity:

S_k = { x : C(x) ≈ k }

By the counting argument, |S_k| ≈ 2^k. Every x with C(x) = k belongs to S_k. The set S_k is "almost" uniform — its elements have similar complexity by construction.

The challenge: S_k is not decidable (since C(x) is not computable). To get a decidable uniform set, we need more work: we approximate S_k using time-bounded or resource-bounded versions of complexity, or we use the "typization trick" to extract a decidable core.

For the purposes of proving inequalities, the non-decidability is often not a problem because we work with the complexity version of the inequality directly, and the uniform set serves as a conceptual tool.

Check: The set S_k = {x : C(x) ≈ k} is not directly usable as a uniform set because...

C(x) is not computable, so S_k is not decidable It is too large Its elements don't have the same length

Chapter 4: Uniform Sets and Orbits

A clean source of uniform sets: the orbit of an element under a group action. If a finite group G acts on a set X, the orbit of x is O(x) = {g · x : g ∈ G}.

Orbits are uniform: If the group G is "computably presented" (its elements and action are computable), then O(x) is a decidable set. If the action is transitive on O(x), all elements of O(x) have similar complexity. The orbit size |O(x)| = |G|/|Stab(x)|, where Stab(x) is the stabilizer of x.

Example: the symmetric group S_n acts on binary strings of length n by permuting bit positions. The orbit of x consists of all rearrangements of x's bits. If x has k ones, the orbit has C(n, k) elements. All elements of the orbit have the same complexity (up to O(log n)).

Check: The orbit of a binary string under bit permutations contains all strings with...

The same number of ones (rearrangements of the same multiset) The same length The same complexity

Chapter 5: Almost Uniform Sets

In practice, perfect uniformity is hard to achieve. Almost uniform sets relax the requirement: elements have complexity within O(log n) of log|S|, and a small fraction of "outliers" is allowed.

Almost uniform suffices: For proving information inequalities (which have O(log n) error terms anyway), almost uniform sets work just as well as perfectly uniform ones. The small errors get absorbed into the O(log n) terms.

The key technical lemma: given any finite set A and a string x ∈ A, we can find an almost uniform subset A' ⊆ A containing x with |A'| ≥ |A|/poly(n) and all elements having complexity within O(log n) of each other.

Check: Almost uniform sets are sufficient for proving information inequalities because...

The inequalities already have O(log n) error, which absorbs the non-uniformity All sets are almost uniform The proof doesn't use uniform sets

Chapter 6: The Typization Trick

The typization trick is a technique for converting complexity statements into combinatorial ones. Given strings x, y with specified complexities C(x), C(y), C(x,y), we construct sets A, B, R such that:

|A| ≈ 2^C(x), |B| ≈ 2^C(y)
R ⊆ A × B with |R| ≈ 2^C(x,y)
(x, y) ∈ R
A, B, R are decidable (given O(log n) bits of advice)

The trick: Group strings by complexity level. All strings of complexity ≈ k form a "type class." Within each type class, elements are interchangeable for the purpose of the inequality. By fixing the type (the complexity profile), we reduce the problem to a purely combinatorial one about set sizes.

This is the central technique for transferring between the three worlds. Any complexity inequality can be proved by: (1) typize to get sets with prescribed sizes, (2) prove the corresponding combinatorial inequality, (3) translate back.

Check: Typization converts a complexity inequality into...

A combinatorial inequality about set cardinalities A probability inequality A computational complexity bound

Chapter 7: Combinatorial Interpretations

Let's see some concrete examples of the correspondence:

Example 1 (Chain rule):
Entropy: H(X,Y) = H(X) + H(Y|X)
Complexity: C(x,y) = C(x) + C(y|x) + O(log n)
Combinatorial: A set of pairs with |A| ≤ pq can be split into P (projection size ≤ p) and Q (max section size ≤ q).

Example 2 (Submodularity):
Entropy: H(X,Y) + H(X,Z) ≥ H(X) + H(X,Y,Z)
Complexity: C(x,y) + C(x,z) ≥ C(x) + C(x,y,z) − O(log n)
Combinatorial: For a set A ⊆ X × Y × Z, |proj_XY(A)| · |proj_XZ(A)| ≥ |proj_X(A)| · |A|.

The submodularity inequality is the Shannon analogue of the Loomis-Whitney inequality in combinatorics. Each version implies the others (up to the appropriate error terms).

Check: The chain rule for complexity corresponds combinatorially to splitting a set of pairs into...

A part with small projection and a part with small sections Two equal halves Rows and columns of a matrix

Chapter 8: Information Region Visualizer

For two random variables (or strings), the Shannon entropy region is determined by H(X), H(Y), H(X,Y). These three values must satisfy certain inequalities. The simulation below shows the feasible region.

Two-Variable Information Region

The Venn diagram shows the decomposition of H(X,Y) into H(X|Y), I(X;Y), and H(Y|X). Drag the sliders to change the entropies and see how the mutual information changes.

H(X)60

H(Y)40

I(X;Y)20

What to notice: Mutual information I(X;Y) is the overlap. H(X|Y) = H(X) − I(X;Y) is the "exclusive" information in X. H(X,Y) = H(X) + H(Y) − I(X;Y) is the total. When I = 0 (independent), there's no overlap. When I = min(H(X), H(Y)) (one determines the other), one circle is inside the other.

Check: If I(X;Y) = H(X), then...

X is completely determined by Y (H(X|Y) = 0) X and Y are independent X is constant

Chapter 9: Summary

Concept	Key idea
Three-way correspondence	Entropy, complexity, and cardinality inequalities are equivalent (up to log terms)
Uniform sets	Sets where all elements have the same complexity ≈ log\|S\|
Orbits	Group orbits provide natural uniform sets
Typization	Group by complexity level, then use combinatorics
Transfer	Shannon inequalities transfer to Kolmogorov and back

The big picture: This chapter reveals that information theory, algorithmic information theory, and combinatorics are three perspectives on the same underlying structure. The information inequalities — subadditivity, submodularity, the chain rule — are not specific to any one framework. They are universal laws of information, expressed in whichever language suits the problem at hand.

And with that, we reach the end of the textbook's main arc. From the simple idea of "shortest program" in the Introduction, through the formal theory of plain, prefix, and monotone complexity, through the characterization of randomness via Martin-Löf tests and martingales, to the deep connections between individual complexity and ensemble entropy — the theory of Kolmogorov complexity provides a unified foundation for understanding information, randomness, and computation.

Final check: The transfer principle between entropy and complexity says that any Shannon inequality...

Holds for Kolmogorov complexity too, with O(log n) additive error Fails for Kolmogorov complexity Holds exactly, with no error

Inequalities for Entropy,Complexity, and Size