Fundamental barriers to building powerful AI: prediction complexity, hard sequences, and Godel's shadow.
AIXI is the theoretical ideal, but it requires infinite computation. Can we build practical agents that approximate its power? Or are there fundamental barriers?
This chapter delivers sobering news. We study computable sequence predictors — the simplest possible version of the problem (no actions, just prediction). Even here, fundamental limitations emerge. Since sequence prediction lies at the heart of reinforcement learning, these limits apply to all computable AI agents.
We consider computable binary sequences: infinite sequences that can be generated by some program on a universal Turing machine. The Kolmogorov complexity K(ω) is the length of the shortest such program.
A computable predictor p takes the sequence so far and outputs its prediction for the next bit. We say p learns to predict ω if there exists a point after which p always predicts correctly: ∀n ≥ m : p(x1:n) = xn+1.
An important distinction: we give the predictor unlimited computation time and memory. The only constraint is that it must be a computable function. Even with these generous assumptions, fundamental limitations arise.
Good news first: every computable sequence can be predicted by at least one predictor, and that predictor need not be significantly more complex than the sequence.
The proof is constructive: take the program q that generates ω, build a predictor that runs q to generate x1:n+1 and returns xn+1. Simple.
Remarkably, even very simple predictors can learn arbitrarily complex sequences. A predictor that searches for repeating patterns (an "eventually periodic" predictor) can learn sequences of any Kolmogorov complexity, as long as the sequence eventually becomes periodic.
But the reverse is also true: for every predictor, there exists a computable sequence it cannot predict.
This means no single predictor can handle all computable sequences. Universal prediction is impossible for computable predictors — unlike for the incomputable Solomonoff predictor, which handles all of them.
Since predicting all computable sequences is impossible, what about predicting all simple ones — those with Kolmogorov complexity at most n?
Define Cn = {ω : K(ω) ≤ n} and Pn = the set of predictors that can learn to predict every sequence in Cn.
This is a deeply negative result. It rules out the upper-left corner of the "complexity vs. power" diagram: you cannot have a simple algorithm that is also powerful. The most general prediction algorithms must themselves be very complex.
Legg introduces a new complexity measure: the prediction complexity K̇(ω) of a sequence ω is the length of the shortest predictor that can learn to predict it:
This is different from Kolmogorov complexity K(ω), which measures the length of the shortest generator. A sequence might be hard to generate (high K) but easy to predict (low K̇), or vice versa.
From the results above: 0 ≤ K̇(ω) ≤ K(ω) + c. The prediction complexity never exceeds the Kolmogorov complexity by more than a constant. But it can be much lower: a repeating string has bounded K̇ but potentially unbounded K.
Do sequences with high prediction complexity actually exist? Yes. For every n, there exist computable sequences where K̇(ω) is close to K(ω), and K(ω) > n.
Furthermore, sequences with high K̇ are extremely expensive to compute. If a sequence can be generated quickly (in time polynomial in n), then K̇ is essentially 0 — a simple predictor that simulates all fast programs will learn it.
Only sequences that require exponential (or worse) computation time to generate can have high prediction complexity. These sequences exist, but they push the limits of what can be practically computed, even with unlimited time.
The most devastating result: even if powerful prediction algorithms exist, we may not be able to find them or prove they work.
In other words: beyond a certain complexity, you cannot mathematically prove that a predictor is powerful. This is a direct analogue of Godel's incompleteness theorem, applied to prediction algorithms.
The practical implications are profound. Even if we write a very powerful AI algorithm, we may not be able to prove it works. Mathematical analysis, our most powerful tool for understanding algorithms, has a hard ceiling. Beyond that ceiling, algorithm development must become mainly experimental.
These results carry over directly to reinforcement learning. At the heart of RL lies prediction: an agent must predict future rewards to choose good actions. The limits on predictors are therefore limits on RL agents.
Specifically: very powerful RL algorithms must be complex (no simple general learner), some environments require complex agents to learn (no universally easy problems), and beyond moderate power, we cannot prove convergence (the theory breaks down).
What can we do? Several paths forward:
These results constrain our ambitions but do not destroy them. Practical AI can and does make progress — just not through simple, provably optimal algorithms. The path forward is through clever, statistically principled heuristics. Chapter 6 provides one example.