Shiffman, Chapter 9

The Evolution of Code

Genetic algorithms, fitness functions, selection, crossover, and mutation — letting evolution discover solutions we never could design by hand.

Prerequisites: Arrays and loops. No prior biology knowledge needed.

Chapters

Simulations

Quizzes

Chapter 0: Why Evolution?

So far, we have been the intelligent designers. We chose the variables, the forces, the rules. But what if we could let a process found in nature — evolution — decide for us?

The infinite monkey theorem: a monkey hitting keys randomly on a typewriter will eventually type Shakespeare. But the probability of typing even "to be or not to be" is (1/27)³⁹ — a number so astronomically small that the monkey would need to type for longer than the age of the universe.

Genetic algorithms to the rescue. Instead of pure randomness, what if we could evaluate each attempt and use the best ones to produce the next generation? Warm, warmer, hot! Genetic algorithms use the power of evolution to search vast solution spaces efficiently.

We'll start by evolving a phrase (a problem with a known answer, to test our algorithm), then discuss how to apply the same technique to any problem.

Why is brute force (trying every random possibility) impractical for many problems?

The solution space is so vast it would take astronomically long to find the answer Computers are too slow Random numbers are not truly random

Chapter 1: Darwinian Principles

Three principles must be present for natural selection to work:

Principle	What it means	In our code
Heredity	Children inherit properties from parents	DNA is copied from parent to child
Variation	Diversity must exist in the population	Random initial DNA + mutation
Selection	Some organisms are more fit and more likely to reproduce	Fitness function + mating pool

Fitness does not mean "better." It means "better adapted to the environment." A faster gazelle survives lions. A monkey that typed "to bf or not to be" is more fit than one that typed "xjq#wmz". Fitness is context-dependent.

Which three Darwinian principles must be present for evolution to occur?

Heredity, variation, and selection Speed, strength, and size Input, processing, and output

Chapter 2: Creating a Population

Step 1: Create a population of N elements, each with randomly generated DNA.

Each member has a genotype (the raw data — an array of characters) and a phenotype (how that data is expressed — the string displayed on screen).

pseudocode
class DNA:
  genes = new array[targetLength]
  for i in genes.length:
    genes[i] = randomChar(32, 128)  # ASCII printable

# Create population
population = new DNA[200]

Genotype vs. phenotype: The genotype is the digital information. The phenotype is its expression. For typing monkeys, they're the same thing (a string). But for a vehicle, the genotype might be [0.5, 0.8, 0.3] and the phenotype could be maxspeed=2.5, maxforce=0.4, size=30.

What is the difference between genotype and phenotype?

Genotype is the raw data (DNA); phenotype is how that data is expressed (appearance/behavior) They are the same thing Genotype is for plants, phenotype is for animals

Chapter 3: Fitness

Step 2a: Evaluate each member's fitness. This is a numeric score that says "how good is this solution?"

For our typing monkey: fitness = correct characters / total characters

pseudocode
function calcFitness():
  score = 0
  for i in genes.length:
    if genes[i] == target[i]:
      score += 1
  fitness = score / target.length

The fitness function is everything. It's the part you must customize for every project. Evolving a racing vehicle? Fitness = speed to finish line. Evolving a soccer player? Fitness = goals scored. Evolving a creature? Fitness = time alive. If you can't define fitness numerically, you can't evolve.

Exponential fitness scales better. Linear fitness (80% vs 80.1%) barely distinguishes good from great. Try fitness = (correct/total)² or even 2^correct. This rewards perfection exponentially and speeds convergence.

Why might an exponential fitness function work better than a linear one?

It creates a larger gap between good and great solutions, increasing selection pressure It uses less memory It is always faster

Chapter 4: Selection

Step 2b: Build a mating pool. High-fitness members get more "tickets" in the pool. Selection is probabilistic — the wheel of fortune method.

pseudocode
matingPool = []
for member in population:
  n = member.fitness * 100
  for j in n:
    matingPool.add(member)

Now if you pick two random members from the mating pool, the probability of selecting any given member is proportional to its fitness.

Why not just pick the top 2? (The "elitist" method.) Because it kills variation. Even a low-scoring member might have one gene that's crucial. Element C might score 5% overall but have the only correct letter for position 17. We want C to have a small chance, not zero chance.

Why is probabilistic selection (wheel of fortune) preferred over picking only the top two?

It preserves genetic diversity; even low-scoring members may carry useful genes It's easier to code The top two always have the same DNA

Chapter 5: Crossover

Step 3a: Pick two parents from the mating pool. Combine their DNA to create a child.

The random midpoint method: pick a random index. Everything before it comes from parent A; everything after comes from parent B.

pseudocode
function crossover(partnerA, partnerB):
  child = new DNA()
  midpoint = random(0, genes.length)
  for i in genes.length:
    if i < midpoint:
      child.genes[i] = partnerA.genes[i]
    else:
      child.genes[i] = partnerB.genes[i]
  return child

Alternative: coin flipping. For each gene, flip a coin — heads from A, tails from B. This produces more variety but is essentially equivalent over many generations.

In the "random midpoint" crossover method, how is a child's DNA constructed?

Genes before a random midpoint come from parent A; genes after come from parent B All genes come from parent A Genes are generated randomly

Chapter 6: Mutation

Step 3b: After crossover, each gene has a small probability of being replaced with a random value. This is mutation.

pseudocode
mutationRate = 0.01   # 1%
for i in genes.length:
  if random(1) < mutationRate:
    genes[i] = randomChar(32, 128)

Mutation rate matters enormously. At 0%: if the initial population doesn't have all needed genes, evolution stalls forever. At 1%: optimal for most cases. At 10%+: too much randomness negates the benefits of selection. You're back to random typing monkeys.

Mutation Rate	Effect
0%	May never solve; depends on initial luck
1%	Sweet spot — new variation without destroying progress
5%	Works but slower convergence
10%+	Effectively random — evolution breaks down

Why is a 0% mutation rate problematic?

If the initial population lacks a needed gene, there's no way to introduce it later It makes the algorithm run too fast All children become identical to their parents

Chapter 7: Putting It Together

The full genetic algorithm loop:

Step 1: Initialize

Create N members with random DNA

↓

Step 2a: Fitness

Score every member

↓

Step 2b: Selection

Build mating pool (wheel of fortune)

↓

Step 3a: Crossover

Pick 2 parents, combine DNA

↓

Step 3b: Mutation

Randomly alter genes at low probability

↓

Step 4: Replace

New children become current population. Go to Step 2.

Three things you customize: (1) The variables — population size and mutation rate. (2) The fitness function — what does "good" mean for your problem? (3) Genotype/phenotype mapping — how does an array of numbers become a creature, image, or behavior?

What are the three key components you must customize for each GA application?

Population/mutation variables, fitness function, and genotype-to-phenotype mapping Colors, sizes, and positions Input, output, and display

Chapter 8: GA Phrase Simulation

Watch a genetic algorithm evolve a target phrase in real time. Each generation, the best phrases reproduce and the population converges on the answer.

Evolving Shakespeare

The algorithm starts with random phrases and evolves toward the target using selection, crossover, and mutation.

Target: Pop:

Watch the generations fly by. The first few generations look like gibberish. But within dozens of generations, recognizable fragments emerge. Soon entire words lock in. The fitness climbs exponentially as the phrase converges.

In the GA phrase evolution, why do correct characters "lock in" over generations?

Members with correct characters score higher fitness, get selected more often, and pass those characters to children The algorithm memorizes previous answers Characters are never mutated once correct

Chapter 9: Summary

Topic	Key Takeaway
Genetic algorithms	Evolutionary search: population → fitness → selection → reproduction → repeat
Three principles	Heredity, variation, selection
Fitness function	The numeric score you must design for each problem
Selection	Probabilistic (wheel of fortune), not elitist
Crossover	Random midpoint or coin flip to combine parent DNA
Mutation	Small random changes to maintain diversity (~1%)
Genotype/Phenotype	DNA is data; the creature/image/behavior is expression

The lesson: Genetic algorithms are not magic — they're guided randomness. The fitness function is the guide. Without a good fitness function, there's no evolution. With one, even the most complex solution spaces can be navigated by simple rules of selection and reproduction.

What we covered:

• Darwinian natural selection
• Population, fitness, mating pool
• Crossover and mutation
• Population size and mutation rate
• Genotype vs. phenotype

What comes next:

Chapter 10: Neural Networks. The perceptron, weighted inputs, supervised learning, and the beginnings of artificial intelligence. Our objects will learn from experience.

What is the single most important component to customize when applying a GA to a new problem?

The fitness function — without it, evolution has no direction The crossover method The display function