Maintain a crowd of candidate solutions. Let competition, combination, and movement drive the whole population toward the optimum.
So far, every method has tracked a single design point (or a single distribution). A single point can only be in one valley at a time. If that valley is not the global minimum, you are stuck.
The population at each iteration is called a generation. Methods differ in how they produce the next generation from the current one: genetic algorithms use selection and crossover, particle swarm uses velocity and attraction, differential evolution uses vector differences.
All population methods follow the same high-level structure:
pseudocode initialize population = {x(1), ..., x(m)} for k = 1, 2, ..., k_max population ← step(population) # method-specific update return best individual
The initial population should be spread broadly over the design space. Common initialization strategies:
| Distribution | Character | When to Use |
|---|---|---|
| Uniform | Even coverage within bounds | Known box constraints |
| Gaussian | Concentrated near a center | Prior knowledge of good region |
| Cauchy | Heavy tails, occasional far samples | Need exploration beyond Gaussian |
Inspired by biological evolution: fitter individuals are more likely to reproduce. In optimization, "fitness" means a lower objective function value. Each individual's design vector is called a chromosome, and each design variable is a gene.
Some variants include elitism: the best individuals survive unchanged to the next generation, guaranteeing that the best solution never gets worse.
Selection methods decide which individuals become parents:
| Method | Mechanism | Character |
|---|---|---|
| Truncation | Keep the top k individuals | Strong selection pressure |
| Tournament | Best of k random individuals | Tunable pressure via k |
| Roulette wheel | Probability proportional to fitness | Every individual has a chance |
Crossover methods combine parent chromosomes:
| Method | How It Works |
|---|---|
| Single-point | Split at one random position; child = first half of A + second half of B |
| Two-point | Split at two positions; child = A-B-A segments |
| Uniform | Each gene independently from A or B with probability p |
| Interpolation | x = (1−λ)xA + λxB for continuous variables |
Without mutation, the population can only contain traits that were present in the initial generation. Crossover recombines existing genes but cannot create new ones. Mutation introduces novelty by randomly perturbing genes.
A common approach: each gene has a small probability λ of being mutated, typically λ = 1/m for m genes (so on average one mutation per child). Mutations are usually drawn from a Gaussian distribution.
If you suspect the current best is not the global optimum, increase the mutation rate. If the population is making steady progress, keep mutation low and let selection and crossover do their work.
Differential evolution (DE) is elegant in its simplicity. For each individual x in the population:
The perturbation w·(b − c) naturally scales with the population's spread: early on, individuals are far apart so perturbations are large (exploration). As the population concentrates, perturbations shrink (exploitation).
Particle swarm optimization (PSO) gives each individual a velocity. Particles are accelerated toward two attractors: their own personal best position and the global best position found by any particle.
where w is inertia, c1 is the cognitive coefficient (personal best pull), c2 is the social coefficient (global best pull), and r1, r2 are random vectors in [0,1].
Several other population methods draw inspiration from nature:
| Method | Inspiration | Mechanism |
|---|---|---|
| Firefly algorithm | Firefly light attraction | Individuals move toward brighter (fitter) individuals; attraction decays with distance |
| Cuckoo search | Cuckoo brood parasitism | Lévy flights from nests; best nests survive; worst are abandoned |
Hybrid methods combine population methods with local search. Two flavors:
| Approach | What Happens | Effect |
|---|---|---|
| Lamarckian | Replace each individual with its locally-optimized version | Fast convergence, risk of premature clustering |
| Baldwinian | Use locally-optimized value for fitness only; keep original position | Better exploration, slower convergence |
Watch a swarm of particles converge on the minimum of a 2D function. Each particle is pulled toward its personal best (small dot) and the global best (white dot).
Click Step to advance the swarm. Particles (orange) are attracted to the global best (white).
| Method | Update Mechanism | Key Advantage |
|---|---|---|
| Genetic algorithm | Selection + crossover + mutation | Combines good building blocks |
| Differential evolution | Vector differences + crossover | Self-scaling perturbations |
| Particle swarm | Velocity + attraction to bests | Momentum-based exploration |
| Firefly | Distance-based attraction | Natural clustering around optima |
| Cuckoo search | Lévy flights + nest replacement | Heavy-tailed exploration |