Reductions, satisfiability, NP-completeness, P vs NP, and approximation algorithms. Why some problems are provably hard, and what to do about it.
You've been trying to find an efficient algorithm for a problem, and you keep failing. Is the problem actually hard, or are you just not clever enough?
NP-completeness theory gives you the answer. If you can prove your problem is NP-complete, you can stop looking for a polynomial-time algorithm -- because thousands of the smartest people in the world have collectively failed to find one for any NP-complete problem. Your failure is shared by the best minds in computer science.
The theory also reveals what makes a problem hard. Understanding the source of hardness helps you model the problem differently, exploit benign structural properties, or develop good approximation algorithms.
The key idea behind NP-completeness is reduction -- translating one problem into another. If problem A can be reduced to problem B, then B is at least as hard as A. A fast algorithm for B would give a fast algorithm for A.
Think of it as a schoolyard fight. Adam beats Bill, Bill beats Chuck. If Chuck turns out to be tough (Chuck Norris), then Adam and Bill must be at least as tough. If Adam turns out to be a pushover, then Bill and Chuck might also be pushovers.
Reductions must be efficient (polynomial-time translation) and truth-preserving (the answer to B matches the answer to A for every instance).
Not all reductions prove hardness. Some give us efficient algorithms by showing that a new problem is really a known easy problem in disguise.
Closest pair via sorting: Given n numbers, find the pair with smallest difference. Reduction: sort the numbers in O(n log n), then scan adjacent pairs in O(n). The closest pair must be neighbors after sorting.
LIS via edit distance: The longest increasing subsequence of S can be found by computing the edit distance between S and its sorted version T. The longest common subsequence (with no substitutions) gives the LIS.
LCM via GCD: The least common multiple of x and y equals xy / gcd(x, y). Euclid's algorithm computes GCD efficiently, giving us LCM for free.
| New Problem | Reduced To | Complexity |
|---|---|---|
| Closest pair | Sorting | O(n log n) |
| LIS | Edit distance | O(n2) |
| LCM | GCD (Euclid's) | O(log n) |
| Convex hull lower bound | Sorting reduces TO convex hull | Ω(n log n) |
Now for the real action. We show problems are hard by reducing known hard problems to them. The key chain starts with the Hamiltonian cycle:
Hamiltonian cycle → TSP: Does graph G have a tour visiting every vertex exactly once? Construct a complete weighted graph G': edges in G get weight 1, all others get weight 2. G has a Hamiltonian cycle if and only if G' has a TSP tour of weight n. So TSP is at least as hard as Hamiltonian cycle.
Vertex cover ↔ Independent set: A vertex cover S touches every edge. The complement V - S is an independent set (no edges between its vertices). So finding the minimum vertex cover is equivalent to finding the maximum independent set.
Independent set → Clique: Complement the graph (edges become non-edges, non-edges become edges). An independent set in G becomes a clique in the complement. So clique is as hard as independent set.
Satisfiability (SAT) is the mother of all NP-complete problems. Given Boolean variables v1, ..., vn and a set of clauses (each a disjunction of literals), is there a truth assignment satisfying every clause?
Example: C = {{v1, ~v2}, {~v1, v2}} over V = {v1, v2}. Setting v1 = v2 = true satisfies both clauses (v1 is true in clause 1, v2 is true in clause 2).
Example: C = {{v1, v2}, {v1, ~v2}, {~v1}}. The third clause forces v1 = false. Then the second clause forces v2 = false. But then the first clause is unsatisfied. No satisfying assignment exists.
SAT is accepted as hard based on overwhelming evidence: every top algorist in history has failed to find a fast algorithm, and many bizarre consequences would follow if one existed.
3-SAT restricts each clause to exactly 3 literals. Surprisingly, this restricted version is still NP-complete. We prove it by reducing general SAT to 3-SAT -- transforming any clause of any length into a set of 3-literal clauses that is satisfiable if and only if the original clause was.
The transformation handles each clause based on its length:
| Clause Length | Transformation |
|---|---|
| k = 1: {z1} | Create 2 new variables, 4 clauses ensuring z1 must be true |
| k = 2: {z1, z2} | Create 1 new variable, 2 clauses: {v, z1, z2}, {~v, z1, z2} |
| k = 3: {z1, z2, z3} | Copy unchanged |
| k > 3: {z1, ..., zk} | Create k-3 new variables, chain of k-2 clauses |
For the long-clause case (k > 3), the chain works like a zipper: if no original literal is true, the new variables cannot satisfy all subclauses (each forces the next). But if any original literal is true, the remaining new variables have enough freedom to satisfy everything.
From 3-SAT, we can reduce to vertex cover, integer programming, and many other problems. These reductions form a tree rooted at SAT:
The fundamental question: is verification really easier than discovery?
Consider the traveling salesman problem. Given a proposed tour, you can verify it's valid and under budget by simply adding up the edge weights -- that takes O(n) time. But finding the optimal tour seems to require trying all possibilities.
| Class | Definition | Examples |
|---|---|---|
| P | Problems solvable in polynomial time | Sorting, shortest path, MST, matching |
| NP | Problems whose solutions are verifiable in polynomial time | Everything in P, plus TSP, SAT, vertex cover, clique |
| NP-complete | The hardest problems in NP (every NP problem reduces to them) | SAT, 3-SAT, vertex cover, TSP decision, Hamiltonian cycle |
| NP-hard | At least as hard as NP-complete (may not be in NP) | Chess, halting problem |
P ⊆ NP is trivially true: if you can solve a problem fast, you can verify a solution fast. The question is whether NP ⊆ P -- whether there are problems in NP that are NOT in P.
If ANY single NP-complete problem is shown to be in P, then ALL NP-complete problems are in P (because they all reduce to each other), and P = NP. Our collective inability to find such an algorithm for thousands of problems is strong evidence that P ≠ NP.
Proving a problem NP-complete isn't the end. You still need to solve it. Three practical options remain:
Vertex cover 2-approximation: Repeatedly pick any edge, add BOTH endpoints to the cover, and delete all their edges. This simple greedy produces a cover at most twice optimal. Why? The selected edges form a matching (no shared vertices), so any cover must include at least one endpoint per matching edge -- meaning the optimal cover is at least half as large as ours.
Euclidean TSP 2-approximation: Find the MST (a lower bound on the optimal tour). Do a DFS traversal of the MST, visiting each edge twice. Take shortcuts to avoid revisiting vertices. The triangle inequality ensures shortcuts don't increase cost. Total: at most 2 × MST weight ≤ 2 × optimal tour.
Maximum acyclic subgraph: Take any vertex ordering. Edges pointing left-to-right form an acyclic subgraph. So do edges pointing right-to-left. The larger set has at least half the edges -- a 2-approximation, from an algorithm so simple it seems stupid.
Visualize the chain of NP-completeness reductions. Click on any problem to see what reduces to what. Every arrow represents a polynomial-time reduction proving that the target is at least as hard as the source.
Click a node to highlight its reduction chain from SAT. Warm = source problem, teal = target proved hard.
This chapter is the capstone of Part I. Everything you've learned comes together here:
| Earlier Concept | Connection to Chapter 9 |
|---|---|
| BFS/DFS (Ch 5) | Used within reductions and to solve 2-SAT in linear time |
| MST (Ch 6) | Lower bound for TSP approximation; greedy works for MST but not TSP |
| Backtracking (Ch 7) | Exact solution for NP-hard problems on small instances with good pruning |
| Simulated annealing (Ch 7) | Heuristic approach when approximation guarantees are insufficient |
| DP (Ch 8) | Pseudopolynomial algorithms for weakly NP-complete problems (partition, knapsack) |