Why heuristics fail, why proofs matter, and how to think about correctness before you write a single line of code.
You have a problem. It looks simple. You come up with a reasonable strategy that seems obvious. You implement it. It works on your test cases. You ship it. Then it fails spectacularly on someone else's input.
This chapter is about why that happens. The fundamental lesson of algorithm design is that reasonable-looking algorithms can easily be incorrect. An algorithm that works on the examples you tried is not the same as an algorithm that works on every possible input.
The difference between an algorithm and a heuristic is the difference between a guarantee and a hope. An algorithm always produces the correct result. A heuristic usually does a good job but provides no guarantee. Both have their place, but you need to know which one you're using.
Click "Nearest Neighbor" to see a greedy tour, then "Optimal" to see the best possible tour. Drag points to rearrange them. Notice how the heuristic can produce wildly different tour lengths.
Imagine a robot arm soldering contact points on a circuit board. It must visit every point and return to where it started. You want to minimize travel time. This is the Traveling Salesman Problem (TSP), and it comes up everywhere: delivery routes, circuit design, DNA sequencing.
The most natural idea is the nearest-neighbor heuristic: from wherever you are, walk to the closest unvisited point. It sounds perfect. Why waste time going far when there's something close? But this greedy logic has a fatal flaw.
A second idea: the closest-pair heuristic. Repeatedly connect the two closest endpoints that won't create a premature cycle. This avoids the zigzag problem above, but it too can fail. When points form two rows slightly closer together than they are spaced within each row, closest-pair connects across the gap instead of around the boundary, producing a tour over 20% longer than optimal.
Both heuristics can look at the same points and produce dramatically different (and wrong) tours. The only way to guarantee the shortest tour is to try all n! orderings. But 20! is already 2.4 quintillion. There must be a better way. Spoiler: for TSP, finding the provably optimal solution efficiently remains one of computer science's greatest open problems.
Points on a line. Watch nearest-neighbor zigzag while optimal just sweeps. Click "Step" to see each move.
You're a sought-after actor with offers for n movies. Each movie has a start day and end day. You can't film two movies at the same time. You want to star in as many movies as possible. How do you choose?
First instinct: Earliest Start First. Accept the movie that starts soonest. But what if that movie is "War and Peace" and it runs for six months, blocking everything else? One long movie kills your schedule.
Second instinct: Shortest Job First. Accept the shortest movie, since banging them out quickly maximizes throughput. But a short movie in the middle can block two jobs that don't overlap each other. You lose half your potential.
Watch three strategies compete. Green bars = selected jobs. The "Earliest Finish" strategy always finds the maximum set.
How do you prove an algorithm is wrong? You find a single input where it gives the wrong answer. That input is called a counterexample, and it instantly kills any algorithm's credibility.
Good counterexamples have two properties. First, verifiability: you can compute what the algorithm gives and show a better answer exists. Second, simplicity: boil away all unnecessary details until the failure is crystal clear.
Click to place points. The nearest-neighbor heuristic builds a tour (orange). Can you arrange points so the tour is clearly suboptimal? Press "Show Optimal" to compare.
A proof by induction works like a chain of dominoes. You show the first domino falls (the base case). Then you show that if any domino falls, the next one must fall too (the inductive step). Together, these guarantee every domino falls.
For algorithms, induction appears as recursion. Insertion sort can be proved correct by induction: a one-element array is sorted (base case). If we can sort n-1 elements correctly, then inserting the nth element into its proper position gives a sorted n-element array (inductive step).
The most common mistake in inductive proofs is using too weak an inductive hypothesis. Consider trying to prove that a tree on n nodes has n-1 edges. The hypothesis "every tree on n-1 nodes has n-2 edges" doesn't obviously help, because removing a node from a tree might disconnect it. You need the stronger hypothesis "every tree with fewer than n nodes has one fewer edge than node."
Watch insertion sort prove itself correct by induction. At each step k, the first k elements are sorted (the inductive invariant, shown in teal).
Analyzing algorithms means counting operations. And counting operations usually means evaluating summations. Two formulas cover most of what you'll ever need.
This arithmetic series tells you that 1+2+3+...+n grows like n2/2. It's why two nested loops over n elements give O(n2) work.
This geometric series is dominated by its largest term when a > 1. The total of 1 + 2 + 4 + 8 + ... + 2n = 2n+1 − 1 ≈ 2 · 2n. This explains why algorithms that halve the problem at each step (like binary search) run in O(log n) time.
Arithmetic series (left) vs. geometric series (right). Drag n to see how they grow.
Modeling is the art of translating a messy real-world problem into a clean mathematical structure that an algorithm can attack. It is arguably the single most important skill in algorithm design. The right model makes the problem easy. The wrong model makes it impossible.
Most problems can be modeled using one of these fundamental structures:
| Structure | Models | Example |
|---|---|---|
| Permutations | Arrangements, orderings, tours | TSP tour, scheduling sequence |
| Subsets | Selections, committees, clusters | Knapsack, vertex cover |
| Trees | Hierarchies, dominance, ancestry | Parse trees, search trees |
| Graphs | Relationships, networks | Road maps, social networks |
| Points | Locations, data records | Nearest neighbor, clustering |
| Polygons | Shapes, boundaries, regions | Convex hull, motion planning |
| Strings | Text, sequences, patterns | DNA matching, search engines |
Time to put it all together. This is the showcase: an interactive Traveling Salesman explorer. Place cities, try heuristics, watch them fail, then see how close they get to optimal on small instances.
Click to place up to 10 cities. Compare three strategies: Nearest Neighbor, Closest Pair, and Brute-Force Optimal (only for ≤10 cities). Drag cities to explore new configurations.
This chapter introduced the philosophy of algorithm design. Here is how the key ideas connect to the rest of the book:
| Concept | Where It Leads |
|---|---|
| Correctness proofs | Chapter 2: Algorithm Analysis formalizes efficiency |
| Greedy heuristics | Chapter 6: some greedy algorithms (Prim's, Dijkstra's) are provably optimal |
| Exhaustive search | Chapter 7: backtracking makes exhaustive search practical for small instances |
| Modeling | Chapters 5-6: graph models unlock BFS, DFS, shortest paths |
| TSP | Chapter 9: TSP is NP-complete, one of the hardest problems in CS |