Kochenderfer & Wheeler, Chapter 9

Population Methods

Maintain a crowd of candidate solutions. Let competition, combination, and movement drive the whole population toward the optimum.

Prerequisites: Chapter 8 (Stochastic Methods).
10
Chapters
2
Simulations
10
Quizzes

Chapter 0: Why Populations?

So far, every method has tracked a single design point (or a single distribution). A single point can only be in one valley at a time. If that valley is not the global minimum, you are stuck.

The core idea: Population methods maintain many candidate solutions simultaneously, spread across the design space. Different individuals can explore different valleys. Information sharing between individuals — through combination, competition, or attraction — guides the whole population toward the best regions. This parallelism makes population methods naturally resistant to local minima.

The population at each iteration is called a generation. Methods differ in how they produce the next generation from the current one: genetic algorithms use selection and crossover, particle swarm uses velocity and attraction, differential evolution uses vector differences.

Population methods avoid local minima by:

Chapter 1: Population Iteration

All population methods follow the same high-level structure:

pseudocode
initialize population = {x(1), ..., x(m)}
for k = 1, 2, ..., k_max
  population ← step(population)  # method-specific update
return best individual

The initial population should be spread broadly over the design space. Common initialization strategies:

DistributionCharacterWhen to Use
UniformEven coverage within boundsKnown box constraints
GaussianConcentrated near a centerPrior knowledge of good region
CauchyHeavy tails, occasional far samplesNeed exploration beyond Gaussian
Think of it this way: The initial population is like sending scouts across a landscape. You want them spread out enough to find all the valleys, but not so sparse that no scout is near the actual minimum.
Why should the initial population be spread broadly?

Chapter 2: Genetic Algorithms

Inspired by biological evolution: fitter individuals are more likely to reproduce. In optimization, "fitness" means a lower objective function value. Each individual's design vector is called a chromosome, and each design variable is a gene.

Selection
Choose parents based on fitness. Fitter individuals have higher probability of being selected.
Crossover
Combine two parents' chromosomes to create a child. Mix genes from both.
Mutation
Randomly perturb some genes. Prevents the population from becoming too uniform.

Some variants include elitism: the best individuals survive unchanged to the next generation, guaranteeing that the best solution never gets worse.

Key insight: Genetic algorithms are powerful because crossover lets good "building blocks" from different individuals combine. If one individual has found a good value for x1 and another has found a good value for x2, crossover can combine them into a child with good values for both.
In a genetic algorithm, crossover serves to:

Chapter 3: Selection & Crossover

Selection methods decide which individuals become parents:

MethodMechanismCharacter
TruncationKeep the top k individualsStrong selection pressure
TournamentBest of k random individualsTunable pressure via k
Roulette wheelProbability proportional to fitnessEvery individual has a chance

Crossover methods combine parent chromosomes:

MethodHow It Works
Single-pointSplit at one random position; child = first half of A + second half of B
Two-pointSplit at two positions; child = A-B-A segments
UniformEach gene independently from A or B with probability p
Interpolationx = (1−λ)xA + λxB for continuous variables
Think of it this way: Selection is the filter — who gets to reproduce. Crossover is the mixer — how their traits combine. Together, they implement the evolutionary pressure that drives the population toward better solutions over generations.
Tournament selection with k=3 means:

Chapter 4: Mutation

Without mutation, the population can only contain traits that were present in the initial generation. Crossover recombines existing genes but cannot create new ones. Mutation introduces novelty by randomly perturbing genes.

A common approach: each gene has a small probability λ of being mutated, typically λ = 1/m for m genes (so on average one mutation per child). Mutations are usually drawn from a Gaussian distribution.

Key insight: Mutation rate is the exploration-exploitation knob for genetic algorithms. Too little mutation and the population converges prematurely to a suboptimal solution — the "gene pool" becomes too uniform. Too much mutation and the algorithm degenerates into random search, unable to exploit good solutions.

If you suspect the current best is not the global optimum, increase the mutation rate. If the population is making steady progress, keep mutation low and let selection and crossover do their work.

Without mutation, a genetic algorithm cannot:

Chapter 5: Differential Evolution

Differential evolution (DE) is elegant in its simplicity. For each individual x in the population:

Pick three
Choose three random distinct individuals a, b, c from the population.
Construct
Create an interim design z = a + w·(b − c), where w is a weight.
Crossover + Select
Mix z with x via uniform crossover. Keep the better of x and the candidate.

The perturbation w·(b − c) naturally scales with the population's spread: early on, individuals are far apart so perturbations are large (exploration). As the population concentrates, perturbations shrink (exploitation).

Why DE works: The vector difference (b − c) captures information about the landscape's scale and direction. The base point a allows individuals to "teleport" near good solutions found by others. Typically w ∈ [0, 2] and crossover probability p ≈ 0.5–1.0.
In differential evolution, perturbations naturally decrease over time because:

Chapter 6: Particle Swarm Optimization

Particle swarm optimization (PSO) gives each individual a velocity. Particles are accelerated toward two attractors: their own personal best position and the global best position found by any particle.

x(i) ← x(i) + v(i)
v(i) ← w·v(i) + c1r1(xbest(i) − x(i)) + c2r2(xbest − x(i))

where w is inertia, c1 is the cognitive coefficient (personal best pull), c2 is the social coefficient (global best pull), and r1, r2 are random vectors in [0,1].

Key insight: PSO introduces momentum into population methods. A particle moving in a good direction will continue moving that way, like a ball rolling downhill. The inertia w controls how much the particle remembers its previous velocity. Letting w decay over time shifts from exploration to exploitation.
In PSO, each particle is attracted toward:

Chapter 7: Other Bio-Inspired Methods

Several other population methods draw inspiration from nature:

MethodInspirationMechanism
Firefly algorithmFirefly light attractionIndividuals move toward brighter (fitter) individuals; attraction decays with distance
Cuckoo searchCuckoo brood parasitismLévy flights from nests; best nests survive; worst are abandoned

Hybrid methods combine population methods with local search. Two flavors:

ApproachWhat HappensEffect
LamarckianReplace each individual with its locally-optimized versionFast convergence, risk of premature clustering
BaldwinianUse locally-optimized value for fitness only; keep original positionBetter exploration, slower convergence
A note of caution: There has been criticism of the proliferation of nature-inspired algorithms (bat algorithm, gray wolf optimizer, etc.) that make metaphorical analogies without contributing fundamentally new algorithmic ideas. Focus on understanding the core mechanisms — selection pressure, perturbation, and information sharing — rather than the biological metaphors.
Lamarckian hybrid methods risk premature convergence because:

Chapter 8: Particle Swarm Simulator

Watch a swarm of particles converge on the minimum of a 2D function. Each particle is pulled toward its personal best (small dot) and the global best (white dot).

Particle Swarm Optimization

Click Step to advance the swarm. Particles (orange) are attracted to the global best (white).

Gen 0
The inertia parameter w in PSO controls:

Chapter 9: Summary

MethodUpdate MechanismKey Advantage
Genetic algorithmSelection + crossover + mutationCombines good building blocks
Differential evolutionVector differences + crossoverSelf-scaling perturbations
Particle swarmVelocity + attraction to bestsMomentum-based exploration
FireflyDistance-based attractionNatural clustering around optima
Cuckoo searchLévy flights + nest replacementHeavy-tailed exploration
Looking ahead: Chapter 10 introduces constraints — when designs must satisfy certain conditions beyond just minimizing f. We will learn Lagrange multipliers, penalty methods, and interior point methods to handle equality and inequality constraints.
Which population method automatically scales its perturbations as the population concentrates?