Legg, Chapter 1

Nature and Measurement of Intelligence

What is intelligence, really? Before building superintelligent machines, we need a definition that works for humans, animals, and algorithms alike.

Prerequisites: None. This is the starting point.
9
Chapters
2
Simulations
9
Quizzes

Chapter 0: The Question

What is intelligence? You use the word casually every day. Your friend aced a calculus test — "she's so intelligent." Your cat learned to hide when it hears the word "vet" — maybe that's intelligent too. But try to write down a precise definition and the concept dissolves into a tangle of competing ideas.

Is intelligence the ability to learn quickly? The total sum of knowledge? The ability to reason abstractly? To solve new problems? To communicate? To be creative? Intelligence involves a perplexing mixture of concepts, many of which are themselves difficult to define.

Why this matters: This thesis builds theoretical models of systems that claim to be extremely intelligent. Before we can evaluate those claims, we need to know what "intelligence" actually means. This chapter surveys the landscape: theories, definitions, and tests — for humans, animals, and machines.

The debate is especially interesting for machines. A machine might have physical forms, sensors, actuators, and information processing abilities that exist in environments totally unlike anything we experience. How do you compare the intelligence of a chess engine with a dolphin? A chatbot with a crow?

Our goal is ambitious: find a definition of intelligence that applies to any system — biological, mechanical, or mathematical. Not limited to any particular set of senses, environments, goals, or hardware. A definition based on principles so fundamental that it will still make sense when the technology of tomorrow looks nothing like today's.

The Problem
Intelligence is easy to recognise, hard to define
↓ survey existing approaches
Definitions
Human psychology, AI research, philosophy
↓ extract common threads
Working Definition
An agent's ability to achieve goals in a wide range of environments

By the end of this chapter, we will have a working definition that guides the rest of the thesis. In Chapter 4, we will formalise it into an actual equation.

Check: Why is defining intelligence harder for machines than for humans?

Chapter 1: Theories of Intelligence

A central question: should intelligence be viewed as one ability, or many? This debate has shaped psychology for over a century.

Multiple-factors theories break intelligence into components. Thurstone (1938) proposed seven "primary mental abilities": verbal comprehension, word fluency, number facility, spatial visualisation, associative memory, perceptual speed, and reasoning. Sternberg's "Triarchic Mind" splits it into analytical, creative, and practical intelligence. Guilford (1967) took this to an extreme: three fundamental dimensions (contents, operations, products) producing 120 categories, later expanded to 150.

Gardner's multiple intelligences (1993) argues the components are sufficiently separate that they are actually different intelligences: linguistic, logical-mathematical, musical, spatial, bodily kinaesthetic, intra-personal, and inter-personal. This has captured the public imagination, but its lasting impact in professional circles remains debated.

At the other end is Spearman's g-factor: a single general mental ability that underlies and contributes to all cognitive abilities. The evidence? Performance levels in reasoning, association, linguistic tasks, spatial thinking, pattern identification, and more are all positively correlated. Spearman called this statistical correlation the g-factor ("g" for general intelligence).

Key insight: Because standard IQ tests measure a range of cognitive abilities, they estimate an individual's g-factor. Some consider the generality of intelligence to be primary, defining g as intelligence itself.

A useful refinement comes from Cattell (1987): distinguish between fluid intelligence (a flexible innate ability to deal with new problems) and crystallised intelligence (knowledge and abilities acquired over time). An adolescent may have similar fluid intelligence to an adult, but lower crystallised intelligence due to less life experience.

The g-factor is a statistical correlation, not a claim that intelligence has no components. A synthesis of both views: intelligence as a hierarchy, with g at the apex and increasing specialisation forming branches below. An individual with a high g-factor has strong cognitive abilities overall, but might also have especially well-developed musical sense. This hierarchical view is now quite popular.

TheoryViewKey Figure
Multiple FactorsMany separate abilitiesThurstone (1938)
Structure of Intellect120-150 categoriesGuilford (1967)
Multiple Intelligences7+ distinct intelligencesGardner (1993)
g-factorOne general abilitySpearman (1927)
Fluid/CrystallisedTwo types of gCattell (1987)
Hierarchicalg at top, specialisations belowCarroll (1993)
Check: What does the g-factor represent?

Chapter 2: Definitions of Human Intelligence

There are almost as many definitions of intelligence as there are experts asked to define it. But despite the variety, certain themes keep recurring. Let's look at ten influential definitions and extract what they share.

Binet and Simon (1905): "It seems to us that in intelligence there is a fundamental faculty — judgement, otherwise called good sense, practical sense, initiative, the faculty of adapting oneself to circumstances."

Adaptation to environments appears in nearly every definition. Dearborn says intelligence is "the capacity to learn or to profit by experience." Pinter calls it the "ability to adapt oneself adequately to relatively new situations in life." Wechsler (1958) defines it as "a global concept that involves an individual's ability to act purposefully, think rationally, and deal effectively with the environment."

The American Psychological Association's consensus definition (1996) combines many of these threads: intelligence involves the ability to "understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought."

Bringing these common features together, Legg arrives at a working definition in its most general form:

Working definition: Intelligence measures an agent's ability to achieve goals in a wide range of environments.

Let's unpack each piece:

ComponentWhy it matters
AgentAn entity that interacts with an external world — human, animal, robot, algorithm
GoalsIntelligence is about pursuing objectives, not just existing — there must be something to achieve
Ability to achievePerformance-oriented — not about internal architecture, but measurable outcomes
Wide range of environmentsThe crucial qualifier — intelligence is not mastery of one task but flexibility across many

This definition deliberately does not specify which capacities the agent must have. It does not require reasoning, planning, language, or learning. It says nothing about specific hardware. Instead, it focuses on the effect: the ability to succeed across diverse environments. Intelligence could be the result of many different internal mechanisms, some of which humans may not possess.

Some definitions emphasise efficiency — Kurzweil (2000): "Intelligence is the ability to use optimally limited resources — including time — to achieve goals." Legg considers efficiency important practically, but not part of the core definition. A rat with human-level learning would not be called more intelligent than a human just because its brain is smaller.

Check: In Legg's working definition, what is the crucial qualifier that distinguishes intelligence from mere competence?

Chapter 3: Machine Definitions

What about defining intelligence specifically for machines? The challenge is even harder. Machines exist in environments totally unlike ours, with different senses and action capabilities. Some definitions from AI researchers:

McCarthy (2004): "Intelligence is the computational part of the ability to achieve goals in the world. Varying kinds and degrees of intelligence occur in people, many animals and some machines."

Fogel (1995): "Any system that generates adaptive behaviour to meet goals in a range of environments can be said to be intelligent." Albus (1991): intelligence is "the ability of a system to act appropriately in an uncertain environment, where appropriate action is that which increases the probability of success."

These all align closely with our working definition. The key features keep appearing: agents, goals, environments, adaptation, and breadth of capability.

Some definitions add an efficiency requirement: Newell and Simon (1976) say intelligence involves appropriate behaviour "within some limits of speed and complexity." We choose not to include efficiency in our core definition, though it matters greatly in practice. If a computational breakthrough suddenly made vastly more powerful machines possible, it would be odd to say those machines are not intelligent just because they use more resources.

A critical distinction appears between two views of machine intelligence:

PerspectiveDescription
Native intelligenceComplexity inherent in the system's information content
Performance intelligenceSuccess in achieving goals in complicated environments

This mirrors Cattell's fluid vs. crystallised distinction. Our definition takes the performance perspective: we don't care whether an agent looks intelligent on the inside. If it performs well in a wide range of environments, that is all that matters. This is a deliberately black-box approach — a functionalist perspective focused on external behaviour.

Key insight: Intelligence is not the ability to deal with a fully known environment (that just requires following instructions). It is the ability to deal with novel situations that cannot be wholly anticipated. The emphasis on learning, adaptation, and experience implies the environment is not fully known.
Check: Why does Legg's definition take a "black-box" approach?

Chapter 4: Intelligence Testing

Having explored what intelligence is, we now turn to how it is measured. What makes a good intelligence test?

Repeatability. The test should consistently return about the same score for a given individual. Statistical variability is a problem in short tests; longer tests help but cost more.

Low bias. Cultural bias is a fundamental challenge. Different cultures emphasise different cognitive abilities, making it difficult to compare scores. Language-based tests are particularly susceptible. The most culture-neutral tests, like Raven's Progressive Matrices, focus on abstract pattern recognition.

Validity. The test should actually measure what it claims to measure. One check: do test results predict other manifestations of intelligence, such as academic performance?

Predictive power. Standard IQ tests are among the most statistically stable and reliable psychological tests. They predict future academic performance and other cognitively demanding tasks well.

Static vs. dynamic tests: Standard tests are "static" — they measure knowledge and ability to solve one-off problems. They don't directly measure learning ability. A "dynamic test" would present problems, give feedback, and measure how quickly the individual adapts. Dynamic tests are theoretically powerful but practically difficult, requiring extensive tester-subject interaction.

An important tension: is intelligence the current state of knowledge, or the capacity to learn? A child who can learn quickly might have limited knowledge due to less education. If we define learning capacity as central, classifying that child as unintelligent based on current knowledge would be a mistake.

For machines, the situation is even more complex. Machine performance can vary by orders of magnitude, making relative IQ-style scoring problematic. For machine intelligence, an absolute measure is more meaningful than a relative one.

Check: Why are "dynamic tests" theoretically better for measuring intelligence?

Chapter 5: Human Intelligence Tests

The first modern intelligence test was developed by Alfred Binet in 1905. He believed intelligence was best studied by looking at relatively complex mental tasks, unlike earlier tests by Francis Galton that focused on reaction times and physical coordination.

Binet's test consisted of 30 short tasks: naming body parts, comparing lengths and weights, counting coins, remembering digits, defining words. Each category had problems of increasing difficulty. A child's score was normalised against peers of the same age.

Lewis Terman at Stanford adapted Binet's test for English speakers, creating the famous Stanford-Binet test (1950). This became the basis for many other tests, including the Army Alpha and Army Beta tests used to classify recruits in World War I.

David Wechsler felt Binet's tests were too verbally focused. He created tests combining both verbal and nonverbal problems, with a profile showing performance across areas. The Wechsler Adult Intelligence Scale (WAIS-III) tests knowledge, arithmetic, comprehension, vocabulary, short-term memory (verbal), plus picture completion, spatial perception, problem solving, symbol search, and object assembly (nonverbal).

Raven's Progressive Matrices: Perhaps the most culture-neutral intelligence test. Each problem shows a short sequence of basic shapes (a circle in a box, then a circle with a cross, then a circle with a triangle). The test subject selects the image that best continues the pattern. The key skill is recognising patterns and evaluating the complexity of explanations — essentially applying Occam's razor. This makes Raven's tests potentially useful for machine intelligence too.

The intelligence quotient (IQ) was introduced by Stern in 1912. Originally computed as (mental age / biological age) × 100. Modern IQ scores are normalised to a Gaussian distribution with mean 100 and standard deviation 15 (US) or 25 (Europe).

Check: What makes Raven's Progressive Matrices potentially useful for machine intelligence testing?

Chapter 6: Animal Intelligence Tests

Testing animal intelligence pushes us beyond human-centric thinking. Animals have different perceptual and cognitive capacities, different senses, and we cannot simply explain what their task is. This mirrors the challenge of testing machine intelligence.

Difficult problems arise immediately. Rats learn some relationships much more easily through smell than through other senses. An IQ test for children might be validated by predicting academic performance — but for animals, what counts as success? If survival or number of offspring were the measure, bacteria would be the most intelligent life on earth!

For simpler animals, researchers focus on basic information processing: short and long-term memory, forming associations, generalising simple patterns, simple counting, basic communication. Only with relatively intelligent social animals (birds, apes) do more sophisticated properties become relevant: deception, imitation, self-recognition.

Key insight: We cannot tell an animal what its goal is. Instead, we use rewards (food) to guide behaviour. This is exactly the approach used in reinforcement learning, and it maps directly onto the agent-environment framework we will build in Chapter 2. The test subject is the agent; the experimental setup is the environment; the food reward is the reward signal.

Animal intelligence testing teaches us something crucial for machine intelligence: we need tests that work without assuming the test subject understands language, shares our sensory modalities, or has human-like motivations. The reward-based, environment-based approach is the most general.

Check: What fundamental problem in animal intelligence testing also applies to machine intelligence?

Chapter 7: Machine Intelligence Tests

How do we test the intelligence of a machine? This question is fundamental to AI, yet remarkably few researchers have addressed it seriously. Let's survey the major proposals.

The Turing Test (1950): If human judges cannot reliably distinguish between a computer and a human in teletyped conversation, the computer is intelligent. Simple and clever, but problematic. Block and Searle argue that a giant lookup table could theoretically pass the test without real intelligence. The test also measures humanness more than intelligence — you need to model human knowledge, quirks, and even typing errors.

Compression tests (Mahoney, 1999): Replace the Turing test with text compression. Predict missing words in text passages. If you can compress text to about 1 bit per character (humans achieve this; best algorithms get ~1.5 bits), you must have extensive world knowledge. The Hutter Prize awards cash for compressing a 100 MB Wikipedia extract, testing world knowledge directly.

Why compression = prediction = intelligence: If you can accurately predict what comes next in a sequence, you can compress it (by encoding only the surprises). If you can compress it, you understand its structure. Compression and prediction are mathematically equivalent. This deep connection between prediction, compression, and intelligence will recur throughout the thesis.

The C-Test (Hernandez-Orallo): Sequence prediction and abduction problems based on formal complexity measures. Each question has an unambiguous answer with significantly lower complexity than alternatives. Uses Levin's Kt complexity instead of Kolmogorov complexity to make it computable. Correlates well with human IQ scores. This is currently the only formal definition of intelligence that has produced a usable test.

Smith's Test (2006): An agent faces problems generated by an algorithm, tries to solve them, receives a score, and can resubmit. Intelligence = cumulative score over time. Criticised for restricting problems to complexity class P and for limited agent-environment interaction.

TestApproachKey Limitation
Turing TestImitation gameTests humanness, not intelligence
CompressionText predictionUnclear if it generalises to action
C-TestFormal complexityNo environmental interaction
Psychometric AIStandard IQ testsAnthropocentric, gameable
Smith's TestAlgorithm-generated problemsLimited to complexity class P

Each proposal has strengths, but none fully captures our informal definition. What we need is a test based on environmental interaction (not just static problems), grounded in formal complexity theory (not human cultural norms), and applicable to any system. Chapter 4 will provide exactly that.

Check: Why is the Turing test a test of humanness rather than intelligence?

Chapter 8: Summary

We set out to answer a deceptively simple question: what is intelligence? Here's what we found.

Theories
Intelligence has components, but a general factor (g) underlies them all
Definitions
Common threads: adaptation, learning, goal achievement, across diverse environments
Working definition
Intelligence = an agent's ability to achieve goals in a wide range of environments
Testing
No existing test is fully satisfactory for machines — we need something based on formal principles

The key insight is that intelligence is not about any particular skill, but about generality. A chess engine with an Elo rating of 3000 is a marvel of engineering, but it has zero intelligence in our sense if it cannot do anything else. A much simpler agent that can learn to navigate diverse environments, even modestly, is more intelligent by this measure.

What's next: In Chapter 2, we will formalise the agent-environment interaction model and build up to AIXI — a theoretical agent that is optimal in the strongest possible sense. In Chapter 4, we will turn our informal definition into a precise mathematical equation. The connection between compression, prediction, and intelligence that we glimpsed with the C-Test will become the cornerstone of the entire theory.
Check: According to Legg's definition, which agent is more intelligent: a chess engine that plays at superhuman level but can do nothing else, or a simple agent that can learn to navigate 100 different environments moderately well?