Intractable Problems — Skiena, Chapter 9

Chapter 0: Why Prove Hardness?

You've been trying to find an efficient algorithm for a problem, and you keep failing. Is the problem actually hard, or are you just not clever enough?

NP-completeness theory gives you the answer. If you can prove your problem is NP-complete, you can stop looking for a polynomial-time algorithm -- because thousands of the smartest people in the world have collectively failed to find one for any NP-complete problem. Your failure is shared by the best minds in computer science.

Why this matters practically: Proving hardness is not defeatism. It tells you to stop wasting time on impossible exact solutions and start pursuing approximations, heuristics, or special-case algorithms. Two of Skiena's "war stories" describe happy results that came from discovering a supposed-hard problem was actually easy.

The theory also reveals what makes a problem hard. Understanding the source of hardness helps you model the problem differently, exploit benign structural properties, or develop good approximation algorithms.

What is the practical value of proving a problem is NP-complete?

It tells you to stop searching for a polynomial-time exact algorithm and instead focus on approximations or heuristics -- redirecting effort productively It means the problem is unsolvable It has no practical value

Chapter 1: Reductions

The key idea behind NP-completeness is reduction -- translating one problem into another. If problem A can be reduced to problem B, then B is at least as hard as A. A fast algorithm for B would give a fast algorithm for A.

Think of it as a schoolyard fight. Adam beats Bill, Bill beats Chuck. If Chuck turns out to be tough (Chuck Norris), then Adam and Bill must be at least as tough. If Adam turns out to be a pushover, then Bill and Chuck might also be pushovers.

Reduction from A to B

Translate any instance of A into an instance of B, preserving the answer.

↓

If B is easy...

Then A is also easy: solve A by translating to B and using B's fast algorithm.

↓

If A is hard...

Then B must also be hard: a fast algorithm for B would imply one for A, contradicting A's hardness.

Direction matters! To prove problem X is hard, you reduce a known hard problem TO X (not the other way). Reducing X to a hard problem only gives you a slow algorithm for X. The direction is perpetually confusing. Remember: reduce FROM the known hard problem TO the target.

Reductions must be efficient (polynomial-time translation) and truth-preserving (the answer to B matches the answer to A for every instance).

To prove that problem X is NP-hard, which direction must the reduction go?

Reduce a known NP-complete problem TO X: this shows X is at least as hard as the known-hard problem Reduce X to a known NP-complete problem Either direction works

Chapter 2: Easy Reductions

Not all reductions prove hardness. Some give us efficient algorithms by showing that a new problem is really a known easy problem in disguise.

Closest pair via sorting: Given n numbers, find the pair with smallest difference. Reduction: sort the numbers in O(n log n), then scan adjacent pairs in O(n). The closest pair must be neighbors after sorting.

LIS via edit distance: The longest increasing subsequence of S can be found by computing the edit distance between S and its sorted version T. The longest common subsequence (with no substitutions) gives the LIS.

LCM via GCD: The least common multiple of x and y equals xy / gcd(x, y). Euclid's algorithm computes GCD efficiently, giving us LCM for free.

New Problem	Reduced To	Complexity
Closest pair	Sorting	O(n log n)
LIS	Edit distance	O(n²)
LCM	GCD (Euclid's)	O(log n)
Convex hull lower bound	Sorting reduces TO convex hull	Ω(n log n)

Convex hull lower bound: We can reduce sorting to convex hull by mapping each number x to the point (x, x²) on a parabola. Since every point is on the hull, reading them off gives the sorted order. Since sorting requires Ω(n log n), convex hull must also require Ω(n log n).

How does reducing sorting to convex hull prove an n log n lower bound for convex hull?

If convex hull were faster than n log n, we could sort faster than n log n by mapping numbers to points and reading them off the hull -- contradicting the sorting lower bound Because convex hull is harder than sorting Because parabolas are convex

Chapter 3: Hardness Reductions

Now for the real action. We show problems are hard by reducing known hard problems to them. The key chain starts with the Hamiltonian cycle:

Hamiltonian cycle → TSP: Does graph G have a tour visiting every vertex exactly once? Construct a complete weighted graph G': edges in G get weight 1, all others get weight 2. G has a Hamiltonian cycle if and only if G' has a TSP tour of weight n. So TSP is at least as hard as Hamiltonian cycle.

Vertex cover ↔ Independent set: A vertex cover S touches every edge. The complement V - S is an independent set (no edges between its vertices). So finding the minimum vertex cover is equivalent to finding the maximum independent set.

Independent set → Clique: Complement the graph (edges become non-edges, non-edges become edges). An independent set in G becomes a clique in the complement. So clique is as hard as independent set.

The chain of hardness: SAT → 3-SAT → Vertex Cover ↔ Independent Set → Clique. Each link in the chain transfers hardness from one problem to the next. All these problems are NP-complete: if any one of them has a polynomial-time algorithm, they ALL do.

If you find a polynomial-time algorithm for vertex cover, what happens?

Every NP-complete problem (independent set, clique, TSP, SAT, etc.) becomes solvable in polynomial time -- because they are all reducible to each other. This would prove P = NP Only vertex cover becomes easy Nothing special -- it's just one problem

Chapter 4: Satisfiability

Satisfiability (SAT) is the mother of all NP-complete problems. Given Boolean variables v₁, ..., v_n and a set of clauses (each a disjunction of literals), is there a truth assignment satisfying every clause?

Example: C = {{v₁, ~v₂}, {~v₁, v₂}} over V = {v₁, v₂}. Setting v₁ = v₂ = true satisfies both clauses (v₁ is true in clause 1, v₂ is true in clause 2).

Example: C = {{v₁, v₂}, {v₁, ~v₂}, {~v₁}}. The third clause forces v₁ = false. Then the second clause forces v₂ = false. But then the first clause is unsatisfied. No satisfying assignment exists.

Why SAT is special: Cook's theorem proves that EVERY problem in NP can be reduced to SAT. This makes SAT the hardest problem in NP -- if SAT has a polynomial-time algorithm, then every problem in NP does. The entire edifice of NP-completeness rests on the hardness of SAT.

SAT is accepted as hard based on overwhelming evidence: every top algorist in history has failed to find a fast algorithm, and many bizarre consequences would follow if one existed.

What makes SAT special among NP-complete problems?

Cook's theorem proves that every problem in NP reduces to SAT, making it the universal "hardest" problem in NP and the root of the entire NP-completeness reduction tree It was the first problem ever studied It has the most variables

Chapter 5: 3-SAT and Reduction Chains

3-SAT restricts each clause to exactly 3 literals. Surprisingly, this restricted version is still NP-complete. We prove it by reducing general SAT to 3-SAT -- transforming any clause of any length into a set of 3-literal clauses that is satisfiable if and only if the original clause was.

The transformation handles each clause based on its length:

Clause Length	Transformation
k = 1: {z₁}	Create 2 new variables, 4 clauses ensuring z₁ must be true
k = 2: {z₁, z₂}	Create 1 new variable, 2 clauses: {v, z₁, z₂}, {~v, z₁, z₂}
k = 3: {z₁, z₂, z₃}	Copy unchanged
k > 3: {z₁, ..., z_k}	Create k-3 new variables, chain of k-2 clauses

For the long-clause case (k > 3), the chain works like a zipper: if no original literal is true, the new variables cannot satisfy all subclauses (each forces the next). But if any original literal is true, the remaining new variables have enough freedom to satisfy everything.

The critical boundary: 1-SAT and 2-SAT are easy (polynomial time). 3-SAT is NP-complete. This is a sharp phase transition -- adding just one more literal per clause makes the problem jump from polynomial to (probably) exponential.

From 3-SAT, we can reduce to vertex cover, integer programming, and many other problems. These reductions form a tree rooted at SAT:

SAT

Cook's theorem: all of NP reduces here

↓

3-SAT

Restriction to 3 literals per clause

↓

Vertex Cover / Integer Programming

Graph and optimization problems

↓

Independent Set / Clique / HAM Cycle / TSP

Chains of reductions extend indefinitely

2-SAT is solvable in polynomial time, but 3-SAT is NP-complete. Why?

2-SAT can be solved via DFS on an implication graph in linear time; 3-SAT has enough structure to encode arbitrary NP computations, making it as hard as general SAT 3-SAT has more variables 3-SAT has longer clauses so it takes more time to read

Chapter 6: P vs NP

The fundamental question: is verification really easier than discovery?

Consider the traveling salesman problem. Given a proposed tour, you can verify it's valid and under budget by simply adding up the edge weights -- that takes O(n) time. But finding the optimal tour seems to require trying all possibilities.

Class	Definition	Examples
P	Problems solvable in polynomial time	Sorting, shortest path, MST, matching
NP	Problems whose solutions are verifiable in polynomial time	Everything in P, plus TSP, SAT, vertex cover, clique
NP-complete	The hardest problems in NP (every NP problem reduces to them)	SAT, 3-SAT, vertex cover, TSP decision, Hamiltonian cycle
NP-hard	At least as hard as NP-complete (may not be in NP)	Chess, halting problem

The million-dollar question: Does P = NP? If yes, every problem whose solution can be quickly verified can also be quickly solved. If no, some problems are inherently harder to solve than to verify. Almost all experts believe P ≠ NP, but nobody can prove it. This is the most important open problem in computer science, with a $1,000,000 Clay Millennium Prize for a proof either way.

P ⊆ NP is trivially true: if you can solve a problem fast, you can verify a solution fast. The question is whether NP ⊆ P -- whether there are problems in NP that are NOT in P.

If ANY single NP-complete problem is shown to be in P, then ALL NP-complete problems are in P (because they all reduce to each other), and P = NP. Our collective inability to find such an algorithm for thousands of problems is strong evidence that P ≠ NP.

What does it mean for a problem to be "in NP"?

That a proposed solution can be verified in polynomial time -- NOT that the problem is hard, but that solutions have a short "proof" that can be quickly checked That the problem cannot be solved That the problem requires exponential time

Chapter 7: Approximation Algorithms

Proving a problem NP-complete isn't the end. You still need to solve it. Three practical options remain:

Fast average case

Backtracking with good pruning often solves practical instances fast.

↓

Heuristics

Simulated annealing, genetic algorithms -- good solutions with no guarantee.

↓

Approximation algorithms

Guaranteed near-optimal solutions in polynomial time.

Vertex cover 2-approximation: Repeatedly pick any edge, add BOTH endpoints to the cover, and delete all their edges. This simple greedy produces a cover at most twice optimal. Why? The selected edges form a matching (no shared vertices), so any cover must include at least one endpoint per matching edge -- meaning the optimal cover is at least half as large as ours.

Euclidean TSP 2-approximation: Find the MST (a lower bound on the optimal tour). Do a DFS traversal of the MST, visiting each edge twice. Take shortcuts to avoid revisiting vertices. The triangle inequality ensures shortcuts don't increase cost. Total: at most 2 × MST weight ≤ 2 × optimal tour.

Maximum acyclic subgraph: Take any vertex ordering. Edges pointing left-to-right form an acyclic subgraph. So do edges pointing right-to-left. The larger set has at least half the edges -- a 2-approximation, from an algorithm so simple it seems stupid.

Guarantees vs heuristics: An approximation algorithm says "my answer is at most 2x worse than optimal, guaranteed, on every input." A heuristic says "my answer is usually pretty good." You can run both and take the better result -- getting both a guarantee and a chance to do even better.

The vertex cover 2-approximation adds BOTH endpoints of selected edges. Why not just add one?

Adding only one endpoint can give arbitrarily bad results: on a star graph, always picking the leaf gives a cover of size n-1 instead of the optimal 1. Adding both guarantees a 2-approximation Adding one endpoint is always better It doesn't matter -- both give the same result

Chapter 8: The Reduction Explorer

Visualize the chain of NP-completeness reductions. Click on any problem to see what reduces to what. Every arrow represents a polynomial-time reduction proving that the target is at least as hard as the source.

NP-Completeness Reduction Tree

Click a node to highlight its reduction chain from SAT. Warm = source problem, teal = target proved hard.

Click a button to explore reduction chains

Every chain starts at SAT. Cook's theorem proves that every problem in NP reduces to SAT. From SAT, we reduce to 3-SAT, and from there the tree branches: to vertex cover, independent set, clique, Hamiltonian cycle, TSP, integer programming, and hundreds more. Any domino falling (a poly-time algorithm for any NP-complete problem) knocks them all down.

There are hundreds of known NP-complete problems. What would happen if someone found a polynomial-time algorithm for just ONE of them?

EVERY NP-complete problem would become solvable in polynomial time (P = NP), because they all reduce to each other Only that one problem would become easy Nothing -- the other problems might still be hard

Chapter 9: Connections

This chapter is the capstone of Part I. Everything you've learned comes together here:

Earlier Concept	Connection to Chapter 9
BFS/DFS (Ch 5)	Used within reductions and to solve 2-SAT in linear time
MST (Ch 6)	Lower bound for TSP approximation; greedy works for MST but not TSP
Backtracking (Ch 7)	Exact solution for NP-hard problems on small instances with good pruning
Simulated annealing (Ch 7)	Heuristic approach when approximation guarantees are insufficient
DP (Ch 8)	Pseudopolynomial algorithms for weakly NP-complete problems (partition, knapsack)

Skiena's take-home lessons from Chapter 9:
• Reductions show that two problems are essentially identical. A fast algorithm (or lack thereof) for one implies the same for the other.
• NP-completeness is the theory that connects thousands of hard problems into one equivalence class. Solving any one solves them all.
• P vs NP asks whether verification is truly easier than discovery. Almost certainly yes (P ≠ NP), but nobody can prove it.
• Proving hardness is useful: it redirects your effort toward approximation algorithms and heuristics.
• Approximation algorithms give guaranteed bounds on solution quality in polynomial time. Run them alongside heuristics and take the better result.
• Designing novel graph algorithms is hard. Instead, design graphs that let you use classical algorithms to model your problem.

You discover that your optimization problem is NP-complete. What are your three practical options?

(1) Backtracking with pruning for small instances, (2) heuristics like simulated annealing for good-enough solutions, (3) approximation algorithms for guaranteed near-optimal solutions Give up and choose a different problem Wait for quantum computers