Kochenderfer & Wheeler, Chapter 9

Population Methods

Maintain a crowd of candidate solutions. Let competition, combination, and movement drive the whole population toward the optimum.

Prerequisites: Chapter 8 (Stochastic Methods).

Chapters

Simulations

Quizzes

Chapter 0: Why Populations?

So far, every method has tracked a single design point (or a single distribution). A single point can only be in one valley at a time. If that valley is not the global minimum, you are stuck.

The core idea: Population methods maintain many candidate solutions simultaneously, spread across the design space. Different individuals can explore different valleys. Information sharing between individuals — through combination, competition, or attraction — guides the whole population toward the best regions. This parallelism makes population methods naturally resistant to local minima.

The population at each iteration is called a generation. Methods differ in how they produce the next generation from the current one: genetic algorithms use selection and crossover, particle swarm uses velocity and attraction, differential evolution uses vector differences.

Population methods avoid local minima by:

Exploring multiple regions simultaneously with many individuals Computing second-order derivatives Using a single point with momentum

Chapter 1: Population Iteration

All population methods follow the same high-level structure:

pseudocode
initialize population = {x⁽¹⁾, ..., x^(m)}
for k = 1, 2, ..., k_max
  population ← step(population)  # method-specific update
return best individual

The initial population should be spread broadly over the design space. Common initialization strategies:

Distribution	Character	When to Use
Uniform	Even coverage within bounds	Known box constraints
Gaussian	Concentrated near a center	Prior knowledge of good region
Cauchy	Heavy tails, occasional far samples	Need exploration beyond Gaussian

Think of it this way: The initial population is like sending scouts across a landscape. You want them spread out enough to find all the valleys, but not so sparse that no scout is near the actual minimum.

Why should the initial population be spread broadly?

To increase the chance that some individuals start near the global minimum To make the algorithm slower but more deterministic To ensure all individuals converge to the same point

Chapter 2: Genetic Algorithms

Inspired by biological evolution: fitter individuals are more likely to reproduce. In optimization, "fitness" means a lower objective function value. Each individual's design vector is called a chromosome, and each design variable is a gene.

Selection

Choose parents based on fitness. Fitter individuals have higher probability of being selected.

↓

Crossover

Combine two parents' chromosomes to create a child. Mix genes from both.

↓

Mutation

Randomly perturb some genes. Prevents the population from becoming too uniform.

Some variants include elitism: the best individuals survive unchanged to the next generation, guaranteeing that the best solution never gets worse.

Key insight: Genetic algorithms are powerful because crossover lets good "building blocks" from different individuals combine. If one individual has found a good value for x₁ and another has found a good value for x₂, crossover can combine them into a child with good values for both.

In a genetic algorithm, crossover serves to:

Combine useful traits from two parent solutions into a child Randomly generate entirely new solutions Remove the worst individuals

Chapter 3: Selection & Crossover

Selection methods decide which individuals become parents:

Method	Mechanism	Character
Truncation	Keep the top k individuals	Strong selection pressure
Tournament	Best of k random individuals	Tunable pressure via k
Roulette wheel	Probability proportional to fitness	Every individual has a chance

Crossover methods combine parent chromosomes:

Method	How It Works
Single-point	Split at one random position; child = first half of A + second half of B
Two-point	Split at two positions; child = A-B-A segments
Uniform	Each gene independently from A or B with probability p
Interpolation	x = (1−λ)x_A + λx_B for continuous variables

Think of it this way: Selection is the filter — who gets to reproduce. Crossover is the mixer — how their traits combine. Together, they implement the evolutionary pressure that drives the population toward better solutions over generations.

Tournament selection with k=3 means:

Choose 3 random individuals and select the fittest as a parent Keep only the top 3 individuals in the population Run the algorithm for 3 generations

Chapter 4: Mutation

Without mutation, the population can only contain traits that were present in the initial generation. Crossover recombines existing genes but cannot create new ones. Mutation introduces novelty by randomly perturbing genes.

A common approach: each gene has a small probability λ of being mutated, typically λ = 1/m for m genes (so on average one mutation per child). Mutations are usually drawn from a Gaussian distribution.

Key insight: Mutation rate is the exploration-exploitation knob for genetic algorithms. Too little mutation and the population converges prematurely to a suboptimal solution — the "gene pool" becomes too uniform. Too much mutation and the algorithm degenerates into random search, unable to exploit good solutions.

If you suspect the current best is not the global optimum, increase the mutation rate. If the population is making steady progress, keep mutation low and let selection and crossover do their work.

Without mutation, a genetic algorithm cannot:

Introduce gene values not present in the initial population Select parents for reproduction Evaluate the objective function

Chapter 5: Differential Evolution

Differential evolution (DE) is elegant in its simplicity. For each individual x in the population:

Pick three

Choose three random distinct individuals a, b, c from the population.

↓

Construct

Create an interim design z = a + w·(b − c), where w is a weight.

↓

Crossover + Select

Mix z with x via uniform crossover. Keep the better of x and the candidate.

The perturbation w·(b − c) naturally scales with the population's spread: early on, individuals are far apart so perturbations are large (exploration). As the population concentrates, perturbations shrink (exploitation).

Why DE works: The vector difference (b − c) captures information about the landscape's scale and direction. The base point a allows individuals to "teleport" near good solutions found by others. Typically w ∈ [0, 2] and crossover probability p ≈ 0.5–1.0.

In differential evolution, perturbations naturally decrease over time because:

As the population concentrates, the difference (b−c) between individuals shrinks The weight w is reduced each iteration The mutation rate decreases

Chapter 6: Particle Swarm Optimization

Particle swarm optimization (PSO) gives each individual a velocity. Particles are accelerated toward two attractors: their own personal best position and the global best position found by any particle.

x⁽ⁱ⁾ ← x⁽ⁱ⁾ + v⁽ⁱ⁾

v⁽ⁱ⁾ ← w·v⁽ⁱ⁾ + c₁r₁(x_best⁽ⁱ⁾ − x⁽ⁱ⁾) + c₂r₂(x_best − x⁽ⁱ⁾)

where w is inertia, c₁ is the cognitive coefficient (personal best pull), c₂ is the social coefficient (global best pull), and r₁, r₂ are random vectors in [0,1].

Key insight: PSO introduces momentum into population methods. A particle moving in a good direction will continue moving that way, like a ball rolling downhill. The inertia w controls how much the particle remembers its previous velocity. Letting w decay over time shifts from exploration to exploitation.

In PSO, each particle is attracted toward:

Its personal best and the global best found by any particle Only the global best The centroid of the population

Chapter 7: Other Bio-Inspired Methods

Several other population methods draw inspiration from nature:

Method	Inspiration	Mechanism
Firefly algorithm	Firefly light attraction	Individuals move toward brighter (fitter) individuals; attraction decays with distance
Cuckoo search	Cuckoo brood parasitism	Lévy flights from nests; best nests survive; worst are abandoned

Hybrid methods combine population methods with local search. Two flavors:

Approach	What Happens	Effect
Lamarckian	Replace each individual with its locally-optimized version	Fast convergence, risk of premature clustering
Baldwinian	Use locally-optimized value for fitness only; keep original position	Better exploration, slower convergence

A note of caution: There has been criticism of the proliferation of nature-inspired algorithms (bat algorithm, gray wolf optimizer, etc.) that make metaphorical analogies without contributing fundamentally new algorithmic ideas. Focus on understanding the core mechanisms — selection pressure, perturbation, and information sharing — rather than the biological metaphors.

Lamarckian hybrid methods risk premature convergence because:

Local optimization pulls all individuals toward nearby local minima They remove mutation entirely They use gradient information

Chapter 8: Particle Swarm Simulator

Watch a swarm of particles converge on the minimum of a 2D function. Each particle is pulled toward its personal best (small dot) and the global best (white dot).

Particle Swarm Optimization

Click Step to advance the swarm. Particles (orange) are attracted to the global best (white).

Gen 0

The inertia parameter w in PSO controls:

How much a particle retains its previous velocity The number of particles The objective function value

Chapter 9: Summary

Method	Update Mechanism	Key Advantage
Genetic algorithm	Selection + crossover + mutation	Combines good building blocks
Differential evolution	Vector differences + crossover	Self-scaling perturbations
Particle swarm	Velocity + attraction to bests	Momentum-based exploration
Firefly	Distance-based attraction	Natural clustering around optima
Cuckoo search	Lévy flights + nest replacement	Heavy-tailed exploration

Looking ahead: Chapter 10 introduces constraints — when designs must satisfy certain conditions beyond just minimizing f. We will learn Lagrange multipliers, penalty methods, and interior point methods to handle equality and inequality constraints.

Which population method automatically scales its perturbations as the population concentrates?

Genetic algorithm Differential evolution Particle swarm