What is intelligence, really? Before building superintelligent machines, we need a definition that works for humans, animals, and algorithms alike.
What is intelligence? You use the word casually every day. Your friend aced a calculus test — "she's so intelligent." Your cat learned to hide when it hears the word "vet" — maybe that's intelligent too. But try to write down a precise definition and the concept dissolves into a tangle of competing ideas.
Is intelligence the ability to learn quickly? The total sum of knowledge? The ability to reason abstractly? To solve new problems? To communicate? To be creative? Intelligence involves a perplexing mixture of concepts, many of which are themselves difficult to define.
The debate is especially interesting for machines. A machine might have physical forms, sensors, actuators, and information processing abilities that exist in environments totally unlike anything we experience. How do you compare the intelligence of a chess engine with a dolphin? A chatbot with a crow?
Our goal is ambitious: find a definition of intelligence that applies to any system — biological, mechanical, or mathematical. Not limited to any particular set of senses, environments, goals, or hardware. A definition based on principles so fundamental that it will still make sense when the technology of tomorrow looks nothing like today's.
By the end of this chapter, we will have a working definition that guides the rest of the thesis. In Chapter 4, we will formalise it into an actual equation.
A central question: should intelligence be viewed as one ability, or many? This debate has shaped psychology for over a century.
Multiple-factors theories break intelligence into components. Thurstone (1938) proposed seven "primary mental abilities": verbal comprehension, word fluency, number facility, spatial visualisation, associative memory, perceptual speed, and reasoning. Sternberg's "Triarchic Mind" splits it into analytical, creative, and practical intelligence. Guilford (1967) took this to an extreme: three fundamental dimensions (contents, operations, products) producing 120 categories, later expanded to 150.
Gardner's multiple intelligences (1993) argues the components are sufficiently separate that they are actually different intelligences: linguistic, logical-mathematical, musical, spatial, bodily kinaesthetic, intra-personal, and inter-personal. This has captured the public imagination, but its lasting impact in professional circles remains debated.
At the other end is Spearman's g-factor: a single general mental ability that underlies and contributes to all cognitive abilities. The evidence? Performance levels in reasoning, association, linguistic tasks, spatial thinking, pattern identification, and more are all positively correlated. Spearman called this statistical correlation the g-factor ("g" for general intelligence).
A useful refinement comes from Cattell (1987): distinguish between fluid intelligence (a flexible innate ability to deal with new problems) and crystallised intelligence (knowledge and abilities acquired over time). An adolescent may have similar fluid intelligence to an adult, but lower crystallised intelligence due to less life experience.
The g-factor is a statistical correlation, not a claim that intelligence has no components. A synthesis of both views: intelligence as a hierarchy, with g at the apex and increasing specialisation forming branches below. An individual with a high g-factor has strong cognitive abilities overall, but might also have especially well-developed musical sense. This hierarchical view is now quite popular.
| Theory | View | Key Figure |
|---|---|---|
| Multiple Factors | Many separate abilities | Thurstone (1938) |
| Structure of Intellect | 120-150 categories | Guilford (1967) |
| Multiple Intelligences | 7+ distinct intelligences | Gardner (1993) |
| g-factor | One general ability | Spearman (1927) |
| Fluid/Crystallised | Two types of g | Cattell (1987) |
| Hierarchical | g at top, specialisations below | Carroll (1993) |
There are almost as many definitions of intelligence as there are experts asked to define it. But despite the variety, certain themes keep recurring. Let's look at ten influential definitions and extract what they share.
Adaptation to environments appears in nearly every definition. Dearborn says intelligence is "the capacity to learn or to profit by experience." Pinter calls it the "ability to adapt oneself adequately to relatively new situations in life." Wechsler (1958) defines it as "a global concept that involves an individual's ability to act purposefully, think rationally, and deal effectively with the environment."
The American Psychological Association's consensus definition (1996) combines many of these threads: intelligence involves the ability to "understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought."
Bringing these common features together, Legg arrives at a working definition in its most general form:
Let's unpack each piece:
| Component | Why it matters |
|---|---|
| Agent | An entity that interacts with an external world — human, animal, robot, algorithm |
| Goals | Intelligence is about pursuing objectives, not just existing — there must be something to achieve |
| Ability to achieve | Performance-oriented — not about internal architecture, but measurable outcomes |
| Wide range of environments | The crucial qualifier — intelligence is not mastery of one task but flexibility across many |
This definition deliberately does not specify which capacities the agent must have. It does not require reasoning, planning, language, or learning. It says nothing about specific hardware. Instead, it focuses on the effect: the ability to succeed across diverse environments. Intelligence could be the result of many different internal mechanisms, some of which humans may not possess.
Some definitions emphasise efficiency — Kurzweil (2000): "Intelligence is the ability to use optimally limited resources — including time — to achieve goals." Legg considers efficiency important practically, but not part of the core definition. A rat with human-level learning would not be called more intelligent than a human just because its brain is smaller.
What about defining intelligence specifically for machines? The challenge is even harder. Machines exist in environments totally unlike ours, with different senses and action capabilities. Some definitions from AI researchers:
Fogel (1995): "Any system that generates adaptive behaviour to meet goals in a range of environments can be said to be intelligent." Albus (1991): intelligence is "the ability of a system to act appropriately in an uncertain environment, where appropriate action is that which increases the probability of success."
These all align closely with our working definition. The key features keep appearing: agents, goals, environments, adaptation, and breadth of capability.
Some definitions add an efficiency requirement: Newell and Simon (1976) say intelligence involves appropriate behaviour "within some limits of speed and complexity." We choose not to include efficiency in our core definition, though it matters greatly in practice. If a computational breakthrough suddenly made vastly more powerful machines possible, it would be odd to say those machines are not intelligent just because they use more resources.
A critical distinction appears between two views of machine intelligence:
| Perspective | Description |
|---|---|
| Native intelligence | Complexity inherent in the system's information content |
| Performance intelligence | Success in achieving goals in complicated environments |
This mirrors Cattell's fluid vs. crystallised distinction. Our definition takes the performance perspective: we don't care whether an agent looks intelligent on the inside. If it performs well in a wide range of environments, that is all that matters. This is a deliberately black-box approach — a functionalist perspective focused on external behaviour.
Having explored what intelligence is, we now turn to how it is measured. What makes a good intelligence test?
Repeatability. The test should consistently return about the same score for a given individual. Statistical variability is a problem in short tests; longer tests help but cost more.
Low bias. Cultural bias is a fundamental challenge. Different cultures emphasise different cognitive abilities, making it difficult to compare scores. Language-based tests are particularly susceptible. The most culture-neutral tests, like Raven's Progressive Matrices, focus on abstract pattern recognition.
Validity. The test should actually measure what it claims to measure. One check: do test results predict other manifestations of intelligence, such as academic performance?
Predictive power. Standard IQ tests are among the most statistically stable and reliable psychological tests. They predict future academic performance and other cognitively demanding tasks well.
An important tension: is intelligence the current state of knowledge, or the capacity to learn? A child who can learn quickly might have limited knowledge due to less education. If we define learning capacity as central, classifying that child as unintelligent based on current knowledge would be a mistake.
For machines, the situation is even more complex. Machine performance can vary by orders of magnitude, making relative IQ-style scoring problematic. For machine intelligence, an absolute measure is more meaningful than a relative one.
The first modern intelligence test was developed by Alfred Binet in 1905. He believed intelligence was best studied by looking at relatively complex mental tasks, unlike earlier tests by Francis Galton that focused on reaction times and physical coordination.
Binet's test consisted of 30 short tasks: naming body parts, comparing lengths and weights, counting coins, remembering digits, defining words. Each category had problems of increasing difficulty. A child's score was normalised against peers of the same age.
Lewis Terman at Stanford adapted Binet's test for English speakers, creating the famous Stanford-Binet test (1950). This became the basis for many other tests, including the Army Alpha and Army Beta tests used to classify recruits in World War I.
David Wechsler felt Binet's tests were too verbally focused. He created tests combining both verbal and nonverbal problems, with a profile showing performance across areas. The Wechsler Adult Intelligence Scale (WAIS-III) tests knowledge, arithmetic, comprehension, vocabulary, short-term memory (verbal), plus picture completion, spatial perception, problem solving, symbol search, and object assembly (nonverbal).
The intelligence quotient (IQ) was introduced by Stern in 1912. Originally computed as (mental age / biological age) × 100. Modern IQ scores are normalised to a Gaussian distribution with mean 100 and standard deviation 15 (US) or 25 (Europe).
Testing animal intelligence pushes us beyond human-centric thinking. Animals have different perceptual and cognitive capacities, different senses, and we cannot simply explain what their task is. This mirrors the challenge of testing machine intelligence.
Difficult problems arise immediately. Rats learn some relationships much more easily through smell than through other senses. An IQ test for children might be validated by predicting academic performance — but for animals, what counts as success? If survival or number of offspring were the measure, bacteria would be the most intelligent life on earth!
For simpler animals, researchers focus on basic information processing: short and long-term memory, forming associations, generalising simple patterns, simple counting, basic communication. Only with relatively intelligent social animals (birds, apes) do more sophisticated properties become relevant: deception, imitation, self-recognition.
Animal intelligence testing teaches us something crucial for machine intelligence: we need tests that work without assuming the test subject understands language, shares our sensory modalities, or has human-like motivations. The reward-based, environment-based approach is the most general.
How do we test the intelligence of a machine? This question is fundamental to AI, yet remarkably few researchers have addressed it seriously. Let's survey the major proposals.
The Turing Test (1950): If human judges cannot reliably distinguish between a computer and a human in teletyped conversation, the computer is intelligent. Simple and clever, but problematic. Block and Searle argue that a giant lookup table could theoretically pass the test without real intelligence. The test also measures humanness more than intelligence — you need to model human knowledge, quirks, and even typing errors.
Compression tests (Mahoney, 1999): Replace the Turing test with text compression. Predict missing words in text passages. If you can compress text to about 1 bit per character (humans achieve this; best algorithms get ~1.5 bits), you must have extensive world knowledge. The Hutter Prize awards cash for compressing a 100 MB Wikipedia extract, testing world knowledge directly.
The C-Test (Hernandez-Orallo): Sequence prediction and abduction problems based on formal complexity measures. Each question has an unambiguous answer with significantly lower complexity than alternatives. Uses Levin's Kt complexity instead of Kolmogorov complexity to make it computable. Correlates well with human IQ scores. This is currently the only formal definition of intelligence that has produced a usable test.
Smith's Test (2006): An agent faces problems generated by an algorithm, tries to solve them, receives a score, and can resubmit. Intelligence = cumulative score over time. Criticised for restricting problems to complexity class P and for limited agent-environment interaction.
| Test | Approach | Key Limitation |
|---|---|---|
| Turing Test | Imitation game | Tests humanness, not intelligence |
| Compression | Text prediction | Unclear if it generalises to action |
| C-Test | Formal complexity | No environmental interaction |
| Psychometric AI | Standard IQ tests | Anthropocentric, gameable |
| Smith's Test | Algorithm-generated problems | Limited to complexity class P |
Each proposal has strengths, but none fully captures our informal definition. What we need is a test based on environmental interaction (not just static problems), grounded in formal complexity theory (not human cultural norms), and applicable to any system. Chapter 4 will provide exactly that.
We set out to answer a deceptively simple question: what is intelligence? Here's what we found.
The key insight is that intelligence is not about any particular skill, but about generality. A chess engine with an Elo rating of 3000 is a marvel of engineering, but it has zero intelligence in our sense if it cannot do anything else. A much simpler agent that can learn to navigate diverse environments, even modestly, is more intelligent by this measure.