Shiffman, Chapter 9

The Evolution of Code

Genetic algorithms, fitness functions, selection, crossover, and mutation — letting evolution discover solutions we never could design by hand.

Prerequisites: Arrays and loops. No prior biology knowledge needed.
10
Chapters
2
Simulations
10
Quizzes

Chapter 0: Why Evolution?

So far, we have been the intelligent designers. We chose the variables, the forces, the rules. But what if we could let a process found in nature — evolution — decide for us?

The infinite monkey theorem: a monkey hitting keys randomly on a typewriter will eventually type Shakespeare. But the probability of typing even "to be or not to be" is (1/27)39 — a number so astronomically small that the monkey would need to type for longer than the age of the universe.

Genetic algorithms to the rescue. Instead of pure randomness, what if we could evaluate each attempt and use the best ones to produce the next generation? Warm, warmer, hot! Genetic algorithms use the power of evolution to search vast solution spaces efficiently.

We'll start by evolving a phrase (a problem with a known answer, to test our algorithm), then discuss how to apply the same technique to any problem.

Why is brute force (trying every random possibility) impractical for many problems?

Chapter 1: Darwinian Principles

Three principles must be present for natural selection to work:

PrincipleWhat it meansIn our code
HeredityChildren inherit properties from parentsDNA is copied from parent to child
VariationDiversity must exist in the populationRandom initial DNA + mutation
SelectionSome organisms are more fit and more likely to reproduceFitness function + mating pool
Fitness does not mean "better." It means "better adapted to the environment." A faster gazelle survives lions. A monkey that typed "to bf or not to be" is more fit than one that typed "xjq#wmz". Fitness is context-dependent.
Which three Darwinian principles must be present for evolution to occur?

Chapter 2: Creating a Population

Step 1: Create a population of N elements, each with randomly generated DNA.

Each member has a genotype (the raw data — an array of characters) and a phenotype (how that data is expressed — the string displayed on screen).

pseudocode
class DNA:
  genes = new array[targetLength]
  for i in genes.length:
    genes[i] = randomChar(32, 128)  # ASCII printable

# Create population
population = new DNA[200]
Genotype vs. phenotype: The genotype is the digital information. The phenotype is its expression. For typing monkeys, they're the same thing (a string). But for a vehicle, the genotype might be [0.5, 0.8, 0.3] and the phenotype could be maxspeed=2.5, maxforce=0.4, size=30.
What is the difference between genotype and phenotype?

Chapter 3: Fitness

Step 2a: Evaluate each member's fitness. This is a numeric score that says "how good is this solution?"

For our typing monkey: fitness = correct characters / total characters

pseudocode
function calcFitness():
  score = 0
  for i in genes.length:
    if genes[i] == target[i]:
      score += 1
  fitness = score / target.length
The fitness function is everything. It's the part you must customize for every project. Evolving a racing vehicle? Fitness = speed to finish line. Evolving a soccer player? Fitness = goals scored. Evolving a creature? Fitness = time alive. If you can't define fitness numerically, you can't evolve.
Exponential fitness scales better. Linear fitness (80% vs 80.1%) barely distinguishes good from great. Try fitness = (correct/total)2 or even 2correct. This rewards perfection exponentially and speeds convergence.
Why might an exponential fitness function work better than a linear one?

Chapter 4: Selection

Step 2b: Build a mating pool. High-fitness members get more "tickets" in the pool. Selection is probabilistic — the wheel of fortune method.

pseudocode
matingPool = []
for member in population:
  n = member.fitness * 100
  for j in n:
    matingPool.add(member)

Now if you pick two random members from the mating pool, the probability of selecting any given member is proportional to its fitness.

Why not just pick the top 2? (The "elitist" method.) Because it kills variation. Even a low-scoring member might have one gene that's crucial. Element C might score 5% overall but have the only correct letter for position 17. We want C to have a small chance, not zero chance.
Why is probabilistic selection (wheel of fortune) preferred over picking only the top two?

Chapter 5: Crossover

Step 3a: Pick two parents from the mating pool. Combine their DNA to create a child.

The random midpoint method: pick a random index. Everything before it comes from parent A; everything after comes from parent B.

pseudocode
function crossover(partnerA, partnerB):
  child = new DNA()
  midpoint = random(0, genes.length)
  for i in genes.length:
    if i < midpoint:
      child.genes[i] = partnerA.genes[i]
    else:
      child.genes[i] = partnerB.genes[i]
  return child
Alternative: coin flipping. For each gene, flip a coin — heads from A, tails from B. This produces more variety but is essentially equivalent over many generations.
In the "random midpoint" crossover method, how is a child's DNA constructed?

Chapter 6: Mutation

Step 3b: After crossover, each gene has a small probability of being replaced with a random value. This is mutation.

pseudocode
mutationRate = 0.01   # 1%
for i in genes.length:
  if random(1) < mutationRate:
    genes[i] = randomChar(32, 128)
Mutation rate matters enormously. At 0%: if the initial population doesn't have all needed genes, evolution stalls forever. At 1%: optimal for most cases. At 10%+: too much randomness negates the benefits of selection. You're back to random typing monkeys.
Mutation RateEffect
0%May never solve; depends on initial luck
1%Sweet spot — new variation without destroying progress
5%Works but slower convergence
10%+Effectively random — evolution breaks down
Why is a 0% mutation rate problematic?

Chapter 7: Putting It Together

The full genetic algorithm loop:

Step 1: Initialize
Create N members with random DNA
Step 2a: Fitness
Score every member
Step 2b: Selection
Build mating pool (wheel of fortune)
Step 3a: Crossover
Pick 2 parents, combine DNA
Step 3b: Mutation
Randomly alter genes at low probability
Step 4: Replace
New children become current population. Go to Step 2.
Three things you customize: (1) The variables — population size and mutation rate. (2) The fitness function — what does "good" mean for your problem? (3) Genotype/phenotype mapping — how does an array of numbers become a creature, image, or behavior?
What are the three key components you must customize for each GA application?

Chapter 8: GA Phrase Simulation

Watch a genetic algorithm evolve a target phrase in real time. Each generation, the best phrases reproduce and the population converges on the answer.

Evolving Shakespeare

The algorithm starts with random phrases and evolves toward the target using selection, crossover, and mutation.

Target: Pop:
Watch the generations fly by. The first few generations look like gibberish. But within dozens of generations, recognizable fragments emerge. Soon entire words lock in. The fitness climbs exponentially as the phrase converges.
In the GA phrase evolution, why do correct characters "lock in" over generations?

Chapter 9: Summary

TopicKey Takeaway
Genetic algorithmsEvolutionary search: population → fitness → selection → reproduction → repeat
Three principlesHeredity, variation, selection
Fitness functionThe numeric score you must design for each problem
SelectionProbabilistic (wheel of fortune), not elitist
CrossoverRandom midpoint or coin flip to combine parent DNA
MutationSmall random changes to maintain diversity (~1%)
Genotype/PhenotypeDNA is data; the creature/image/behavior is expression
The lesson: Genetic algorithms are not magic — they're guided randomness. The fitness function is the guide. Without a good fitness function, there's no evolution. With one, even the most complex solution spaces can be navigated by simple rules of selection and reproduction.

What we covered:

• Darwinian natural selection
• Population, fitness, mating pool
• Crossover and mutation
• Population size and mutation rate
• Genotype vs. phenotype

What comes next:

Chapter 10: Neural Networks. The perceptron, weighted inputs, supervised learning, and the beginnings of artificial intelligence. Our objects will learn from experience.

What is the single most important component to customize when applying a GA to a new problem?