When exact is impossible, good enough with a guarantee is gold. Vertex cover, TSP, set cover, randomized rounding, PTAS, and inapproximability.
You are a delivery driver. You have 15 stops to make today, and you want the shortest route that visits all of them and returns home. This is the Travelling Salesman Problem (TSP). In Chapter 34 you learned it is NP-hard: nobody knows a polynomial-time algorithm that finds the exact optimal tour, and most complexity theorists believe no such algorithm exists.
So you are stuck. Your boss does not care about P versus NP. He wants a route by 8 AM. What do you do?
You find a route that is pretty good. Not perfect, but provably close. Instead of examining all 15! = 1,307,674,368,000 possible orderings (that would take longer than the age of the universe on any computer), you run a fast algorithm that guarantees a tour at most 1.5 times longer than the optimal one. It finishes in milliseconds. Your boss is happy. Complexity theory is satisfied. Everyone wins.
This is the domain of approximation algorithms. When an optimization problem is NP-hard, we abandon the quest for the exact optimum and instead seek polynomial-time algorithms that come with a mathematical guarantee: the answer will be within a known factor of optimal.
Let C be the cost of the solution your algorithm produces, and let C* be the cost of the optimal solution. For a minimization problem, the approximation ratio ρ(n) satisfies:
For a maximization problem, we flip it:
In both cases, ρ(n) ≥ 1. A ratio of 1 means exact. A ratio of 2 means your answer is at most twice as bad as optimal. A ratio of ln(n) means it degrades logarithmically with input size. The closer to 1, the better.
An algorithm with ratio ρ(n) is called a ρ(n)-approximation algorithm. The holy grail is a polynomial-time approximation scheme (PTAS): for any ε > 0, a (1 + ε)-approximation in polynomial time. Some NP-hard problems have a PTAS. Many do not. Understanding which problems allow how-good an approximation is one of the deepest questions in theoretical computer science.
Let us work through a concrete example to make the ratio tangible. Suppose you are solving minimum vertex cover on a graph with 50 vertices. The optimal cover has 12 vertices (you do not know this — computing it is NP-hard). You run a 2-approximation and get a cover of 20 vertices. The actual ratio: 20/12 ≈ 1.67. The guaranteed worst-case ratio: 2. So the guarantee says at most 24 vertices, and you did better. This is typical — the guarantee is a ceiling, not a prediction.
The canvas below shows a TSP instance: cities scattered on a plane. The blue tour is the optimal solution (computed by brute force for this small instance). The orange tour is a 2-approximation computed by the MST-based algorithm you will learn in Chapter 2. Watch how close the approximate tour is to optimal — and how much faster it was to compute.
Blue = optimal tour (brute force). Orange = 2-approx via MST doubling. Click "New Cities" to regenerate. The status shows tour lengths and the actual ratio.
Notice two things. First, the approximate tour is always within 2x of the optimal length — often much closer. Second, the brute-force optimal computation takes visibly longer even for just 8 cities. At 20 cities, brute force would take hours. At 50, it would outlast the heat death of the universe. The approximation runs in milliseconds regardless.
The definition of approximation ratio differs slightly depending on whether you are minimizing or maximizing. Let us be precise.
For a minimization problem (e.g., TSP, vertex cover, set cover): your algorithm produces a solution of cost C, the optimum is C*. The ratio is C / C*. Since C ≥ C*, the ratio is ≥ 1. A 2-approximation means C ≤ 2 · C*. You are paying at most twice the optimal cost.
For a maximization problem (e.g., MAX-SAT, MAX-CUT, knapsack): your algorithm produces value C, the optimum is C*. The ratio is C* / C. Since C ≤ C*, the ratio is ≥ 1. A 2-approximation means C ≥ C* / 2 — you are getting at least half the optimal value. Some authors instead define the ratio as C / C* ∈ (0, 1] for maximization and call it the "performance guarantee." We will use the ≥ 1 convention throughout, as CLRS does.
In both cases, a ratio closer to 1 is better. A ratio of 1 means exact. A ratio of ∞ means no useful guarantee.
Every approximation proof has the same skeleton: (1) find a lower bound on OPT that you can compute efficiently, and (2) show your algorithm's cost is at most ρ times that lower bound. The art is finding the right lower bound.
Common lower bounds used in this chapter:
| Problem | Lower Bound on OPT | Why It Works |
|---|---|---|
| Vertex Cover | Size of maximal matching | OPT must cover each matching edge, using distinct vertices |
| TSP | MST cost | Removing one edge from any tour gives a spanning tree |
| Set Cover | LP relaxation value | Relaxing integrality can only decrease the optimum |
| Knapsack | Fractional knapsack value | Allowing fractional items can only increase the value |
Not all NP-hard problems are equally hard to approximate. Here is a preview of what we will learn:
| Problem | Best Approximation | Hardness |
|---|---|---|
| Vertex Cover | 2-approximation | No (2 − ε) unless UGC |
| TSP (with Δ-ineq) | 3/2-approximation (Christofides) | No PTAS unless P=NP |
| TSP (general) | No constant ratio | No ρ(n) for any computable ρ |
| Set Cover | (ln n + 1)-approximation | No (1 − ε)ln n unless P=NP |
| Knapsack | FPTAS: (1 + ε) for any ε | Fully poly-time approx. scheme exists |
| MAX-3SAT | 7/8-approximation | No (7/8 + ε) unless P=NP |
This table is the roadmap for the chapter. By the end, you will understand every entry: the algorithm, the proof, and why the hardness barrier exists.
Every approximation algorithm in this chapter rests on one of three techniques. Understanding which technique to reach for is half the battle.
Pillar 1: Combinatorial algorithms. Directly construct a solution using a simple rule (greedy, matching, etc.) and prove its quality by comparing to a combinatorial lower bound. Examples: vertex cover via maximal matching, set cover via greedy, TSP via MST. These are the simplest to implement and the first thing to try.
Pillar 2: LP relaxation and rounding. Formulate the problem as an integer linear program, relax to a linear program (which gives a lower/upper bound), solve, and round the fractional solution to integers. The LP value serves as the bound on OPT. Examples: weighted vertex cover, MAX-SAT, set cover (the LP gives a tighter analysis than the combinatorial proof).
Pillar 3: Scaling and DP. Scale the input to reduce the problem size, solve the scaled problem exactly with DP, and bound the error from scaling. This is how FPTAS works for knapsack and other problems with pseudo-polynomial exact algorithms.
In Chapter 0 we established the goal: polynomial-time algorithms with provable guarantees. Now let us see the simplest such algorithm — so simple it fits in five lines, yet its proof is surprisingly elegant.
Given an undirected graph G = (V, E), a vertex cover is a subset S ⊆ V such that every edge in E has at least one endpoint in S. The optimization problem: find the smallest vertex cover. This is NP-hard (it was one of Karp's original 21 NP-complete problems).
Think of it as a security problem. You have a building with hallways (edges) connecting rooms (vertices). You need to place security cameras in rooms such that every hallway is watched by at least one camera. You want to use as few cameras as possible.
Vertex cover is the canonical "easy to approximate, hard to solve exactly" problem. The 2-approximation is trivially simple (five lines of code). But finding the exact minimum cover is NP-hard. And improving the ratio below 2 is a 50-year open problem — one of the most famous in theoretical computer science.
Your first instinct might be: "Pick the vertex with the most edges, add it to the cover, remove its edges, repeat." This greedy-by-degree strategy is intuitive and often works well in practice. But it is NOT a 2-approximation. On certain graphs (specifically, graphs constructed from set-cover-like instances), greedy-by-degree can produce covers of size Ω(ln n) · OPT. It degrades like set cover, not like vertex cover.
The correct algorithm is shockingly different — and counterintuitively wasteful.
Here is the entire algorithm:
That is it. Pick an edge, add both endpoints, remove all now-covered edges, repeat. The set of picked edges forms a maximal matching — no two picked edges share an endpoint, and no remaining edge can extend the matching.
Let A be the set of edges we picked. No two edges in A share an endpoint (we removed all incident edges after picking), so |A| edges contribute 2|A| vertices to our cover C. Thus |C| = 2|A|.
Now consider the optimal cover C*. For each edge (u, v) in A, at least one of u or v must be in C* (otherwise that edge would be uncovered). Here is the crucial observation: since no two edges in A share an endpoint (the matching property), the vertices forced into C* by different matching edges are all distinct. Edge (u1, v1) forces at least one of u1, v1 into C*. Edge (u2, v2) forces at least one of u2, v2. Since {u1, v1} and {u2, v2} are disjoint (no shared endpoints), these are different vertices being added to C*. Therefore |C*| ≥ |A|.
Combining: |C| = 2|A| ≤ 2|C*|. The approximation ratio is 2.
Read that proof again slowly. It has only four sentences and three inequalities, yet it establishes a result that has withstood 50 years of attempts at improvement. The simplicity of the proof matches the simplicity of the algorithm — this is a hallmark of beautiful mathematics.
The proof structure is called the lower-bound method: find something computable (|A|) that (a) bounds your cost from above: |C| = 2|A|, and (b) bounds OPT from below: |C*| ≥ |A|. The ratio ρ = upper/lower = 2|A| / |A| = 2. Every approximation proof in this chapter follows this template.
python def approx_vertex_cover(adj): # adj: dict of {node: set of neighbors} cover = set() edges = set() for u in adj: for v in adj[u]: if (u, v) not in edges and (v, u) not in edges: edges.add((u, v)) remaining = set(edges) matching = [] while remaining: u, v = remaining.pop() # pick any edge cover.add(u) cover.add(v) # add BOTH endpoints matching.append((u, v)) # remove edges incident to u or v remaining = {(a, b) for (a, b) in remaining if a != u and a != v and b != u and b != v} return cover, matching # Example: triangle with pendant adj = {0: {1,2}, 1: {0,2,3}, 2: {0,1}, 3: {1}} cover, matching = approx_vertex_cover(adj) print("Cover:", cover) # e.g., {0, 1, 2, 3} or {0, 1} + {2, 3} print("Matching:", matching) # edges we picked print("|C| =", len(cover), "<= 2 * OPT")
The algorithm runs in O(V + E) time. For each edge picked, we scan incident edges to remove them. The total work is proportional to the sum of degrees of picked vertices, which is at most 2|E|.
Consider a graph with vertices {0, 1, 2, 3, 4, 5} and edges {(0,1), (0,2), (1,2), (1,3), (2,4), (3,4), (4,5)}. Let us trace APPROX-VERTEX-COVER step by step.
| Step | Edge Picked | Added to Cover | Edges Removed | Remaining Edges |
|---|---|---|---|---|
| 1 | (0, 1) | {0, 1} | (0,1), (0,2), (1,2), (1,3) | {(2,4), (3,4), (4,5)} |
| 2 | (2, 4) | {0, 1, 2, 4} | (2,4), (3,4), (4,5) | {} |
The algorithm picked 2 edges, producing a cover of size 4: {0, 1, 2, 4}. The matching has 2 edges, so OPT ≥ 2. In fact, OPT = 3 (the cover {1, 2, 4} works — verify: edge (0,1) is covered by 1, (0,2) by 2, (1,2) by both, (1,3) by 1, (2,4) by both, (3,4) by 4, (4,5) by 4). Our ratio on this instance: 4/3 ≈ 1.33, well within the guaranteed bound of 2.
Could we do worse? Yes. Consider a star graph: one central vertex connected to n leaves. The algorithm might pick any edge, adding both the center and a leaf (2 vertices). But OPT = 1 (just the center). As we add more leaves, the ratio approaches 2 but never exceeds it.
The canvas below shows the algorithm running step by step. Green nodes are in the cover. Red edges are the ones we picked (the maximal matching). Gray edges are already covered. Click "Step" to advance one edge pick at a time, or "Run" to animate.
Green = in cover. Red edge = just picked. Gray = already covered. Watch both endpoints get added.
We can also get a 2-approximation via LP relaxation. Formulate vertex cover as an ILP: minimize ∑ xv subject to xu + xv ≥ 1 for each edge, xv ∈ {0, 1}. Relax to xv ∈ [0, 1]. Solve the LP. Round: if xv ≥ 0.5, include v in the cover.
Why is this valid? For each edge (u, v), the LP says xu + xv ≥ 1. So at least one of them is ≥ 0.5, and at least one endpoint is in the rounded cover. The cost? Each xv that rounds to 1 had LP value ≥ 0.5, so the rounded cost ≤ 2 × LP* ≤ 2 × OPT. Same ratio, different technique.
The LP approach is more general: it extends naturally to weighted vertex cover (where vertices have costs and you minimize total cost), while the matching-based approach handles only the unweighted case cleanly.
LP relaxation for approximation connects to LP duality, one of the deepest ideas in optimization. The dual of the vertex cover LP is the maximum matching LP. Strong duality says LP* = dual LP*. This means the LP-based 2-approximation is really saying: "the optimal vertex cover is at least as large as the maximum fractional matching, and our rounded solution is at most twice the fractional matching." The duality gives us the lower bound for free.
Many advanced approximation algorithms use the primal-dual method: simultaneously build a feasible primal solution (the cover) and a feasible dual solution (the matching/packing), and show their costs are within a factor of ρ. This is the workhorse technique behind approximation algorithms for network design, facility location, and Steiner tree problems.
It is instructive to see that both approaches give ratio 2 via different lower bounds on OPT. The matching approach uses |A| (matching size) as the lower bound. The LP approach uses LP* (fractional optimum). In general, LP* ≥ |A| (the LP is a tighter relaxation), so LP rounding can sometimes produce a smaller cover than the matching approach on the same instance. But both guarantee ratio 2.
A remarkable structural property: the vertex cover LP always has an optimal solution where every variable is 0, 1/2, or 1. This is called half-integrality. It means the LP relaxation is not as "continuous" as you might expect — it naturally gravitates toward half-integer values.
Why does this matter? It means the rounding step (threshold at 0.5) rounds every variable by at most 0.5, giving a cleaner analysis. It also means that if the LP happens to have an integer optimum (all 0s and 1s), the rounding is exact and the approximation matches OPT perfectly. This happens surprisingly often on structured graphs like bipartite graphs (where vertex cover is solvable in polynomial time via maximum matching + Konig's theorem).
Half-integrality is special to vertex cover. Most LP relaxations do not have this property, which is why rounding other LPs requires more sophisticated techniques (randomized rounding, iterative rounding, etc.).
The Travelling Salesman Problem is perhaps the most famous NP-hard optimization problem. Given n cities and pairwise distances, find the shortest tour that visits every city exactly once and returns to the start. We cannot solve it exactly in polynomial time (unless P = NP). But under a natural assumption, we can approximate it well.
We say distances satisfy the triangle inequality if for all cities u, v, w:
This is just "the direct path is never longer than a detour." It holds for Euclidean distances, road networks, shortest-path distances in any graph, and most real-world settings. With this assumption, we get good approximations. Without it, as we will see in Chapter 7, TSP is inapproximable — no polynomial-time algorithm can achieve any constant ratio, or even any computable ratio.
Formally, a TSP instance satisfying the triangle inequality is called metric TSP. A special case is Euclidean TSP, where cities are points in Rd and distances are Euclidean (straight-line). Euclidean TSP admits even better approximations than general metric TSP — it has a PTAS (see Chapter 6).
The idea is beautifully simple. A Hamiltonian cycle (tour) visits every vertex. If you remove one edge from the tour, you get a spanning tree. Therefore the cost of the minimum spanning tree (MST) is a lower bound on the optimal tour cost: MST ≤ OPT.
Now run the algorithm:
The doubled MST has cost 2 · MST ≤ 2 · OPT. The shortcutting step can only decrease cost (by the triangle inequality, skipping a vertex via a direct edge is never longer). So the final tour costs at most 2 · OPT. We have a 2-approximation.
Christofides (1976) improved the ratio to 3/2 with a clever observation. Instead of doubling all MST edges, he added only enough edges to make the MST Eulerian. A graph has an Eulerian circuit if and only if every vertex has even degree. The MST has some odd-degree vertices. We need to add edges that make all degrees even.
The key step: find a minimum-weight perfect matching on the odd-degree vertices. Adding these matching edges to the MST creates a multigraph where every vertex has even degree. Then find an Euler tour, shortcut, done.
Why 3/2? The MST costs at most OPT. The matching costs at most OPT/2 (we can show this by considering the optimal tour restricted to odd-degree vertices — it forms a Hamiltonian cycle on them, and the cheaper of its two perfect matchings costs at most OPT/2). Total: OPT + OPT/2 = 3/2 · OPT.
Consider 5 cities at coordinates: A=(0,0), B=(1,0), C=(2,0), D=(1,1), E=(0,1). The complete distance matrix (Euclidean):
| A | B | C | D | E | |
|---|---|---|---|---|---|
| A | 0 | 1.00 | 2.00 | 1.41 | 1.00 |
| B | 1.00 | 0 | 1.00 | 1.00 | 1.41 |
| C | 2.00 | 1.00 | 0 | 1.41 | 2.24 |
| D | 1.41 | 1.00 | 1.41 | 0 | 1.00 |
| E | 1.00 | 1.41 | 2.24 | 1.00 | 0 |
Step 1: MST. Using Prim's from A: edges A-B (1.0), B-C (1.0), B-D (1.0), A-E (1.0). MST cost = 4.0.
Step 2: Double. Every MST edge appears twice. Total cost = 8.0. Every vertex now has even degree.
Step 3: Euler tour. One possible tour: A → B → C → B → D → B → A → E → A. (Visits every edge twice.)
Step 4: Shortcut. DFS preorder on MST (equivalent): A → B → C → D → E → A. Cost: 1.0 + 1.0 + 1.41 + 1.0 + 1.0 = 5.41.
Check: Tour cost 5.41 ≤ 2 × MST cost 4.0 = 8.0. And the optimal tour is A → B → C → D → E → A = 5.41 (this happens to be optimal for this configuration). Ratio: 5.41/5.41 = 1.0 on this instance. We got lucky — on this symmetric input, the approximation equals the optimum.
The shortcutting step — replacing a path A → B → C with a direct edge A → C — relies on the triangle inequality: d(A,C) ≤ d(A,B) + d(B,C). If this fails, the shortcut could be longer than the original path, and the 2× guarantee collapses.
Concrete counterexample: three cities A, B, C with d(A,B) = 1, d(B,C) = 1, d(A,C) = 1000. The triangle inequality is violated (1000 > 1 + 1). The MST is A-B-C, cost 2. The doubled MST tour A-B-C-B-A costs 4. But shortcutting to A-B-C-A costs 1 + 1 + 1000 = 1002. The optimal tour is A-B-C-B-A = 4. The shortcut made things catastrophically worse.
This is not just a toy example. In real networks, triangle inequality violations are common when distances represent costs (not physical distance). For instance, a direct flight from A to C might cost $1000 while connecting through B costs $50 + $50 = $100. The "shortcut" (direct flight) is far more expensive than the "detour" (connecting flight). Any TSP algorithm assuming triangle inequality would give garbage results on airline pricing data.
The canvas below shows the MST approach step by step. First the MST (green edges), then the doubled edges (dashed), then the Euler tour, then the final shortcut tour (orange).
Step through the MST-based 2-approximation for TSP. Green = MST. Orange = final Hamiltonian tour.
python import heapq from itertools import combinations def tsp_mst_approx(cities): """2-approx for metric TSP via MST doubling + shortcut.""" n = len(cities) # Prim's MST dist = lambda i,j: ((cities[i][0]-cities[j][0])**2+(cities[i][1]-cities[j][1])**2)**0.5 in_mst = [False]*n adj = [[] for _ in range(n)] heap = [(0, 0, -1)] # (cost, node, parent) mst_cost = 0 while heap: c, u, p = heapq.heappop(heap) if in_mst[u]: continue in_mst[u] = True; mst_cost += c if p >= 0: adj[u].append(p); adj[p].append(u) for v in range(n): if not in_mst[v]: heapq.heappush(heap, (dist(u,v), v, u)) # DFS preorder traversal = shortcut of doubled MST Euler tour visited = [False]*n tour = [] def dfs(u): visited[u] = True; tour.append(u) for v in adj[u]: if not visited[v]: dfs(v) dfs(0) tour.append(tour[0]) # return to start tour_cost = sum(dist(tour[i], tour[i+1]) for i in range(len(tour)-1)) return tour, tour_cost, mst_cost
Christofides improves the ratio from 2 to 3/2 by being smarter about which edges to add. Instead of doubling ALL MST edges (wasteful), add only enough edges to make the MST Eulerian. The key insight: a multigraph has an Eulerian circuit if and only if every vertex has even degree.
The MST has some vertices with odd degree. By the handshaking lemma, the number of odd-degree vertices is even. Find a minimum-weight perfect matching on these odd-degree vertices (this can be done in O(n3) time using Edmonds' algorithm). Add the matching edges to the MST. Now every vertex has even degree. Find an Euler tour, shortcut, done.
The cost analysis:
Why is the matching cost at most OPT/2? Consider the optimal tour restricted to the odd-degree vertices (in MST order). This restricted tour visits all odd-degree vertices and forms a Hamiltonian cycle on them. This cycle can be decomposed into two perfect matchings (alternate edges). The cheaper matching costs at most half the cycle cost, which is at most OPT/2 (since the cycle uses only edges of the optimal tour, and the triangle inequality means shortcuts do not increase cost).
| Algorithm | Ratio | Time | Key Idea |
|---|---|---|---|
| Nearest neighbor | O(log n) | O(n2) | Greedy, no guarantee better than log n |
| MST doubling | 2 | O(n2) | Double MST edges, Euler tour, shortcut |
| Christofides | 3/2 | O(n3) | MST + min matching on odd-degree vertices |
| Arora PTAS | 1 + ε | O(n logO(1/ε) n) | Recursive dissection (Euclidean only) |
In practice, Christofides combined with 2-opt local search (repeatedly swap two edges if it shortens the tour) produces tours within 1-5% of optimal on most real-world instances. The approximation algorithm provides the starting point; local search refines it.
After computing an approximate tour, a simple local search improves it: pick two edges, remove them (splitting the tour into two paths), and reconnect in the only other possible way. If the new tour is shorter, keep it. Repeat until no improving swap exists. This is 2-opt, and it typically improves a Christofides tour by 5-15% on real instances.
python def two_opt(tour, dist_fn): """Improve a tour using 2-opt swaps. Returns improved tour.""" n = len(tour) - 1 # tour[0] == tour[n] improved = True while improved: improved = False for i in range(1, n - 1): for j in range(i + 1, n): # Cost of removing edges (i-1,i) and (j,j+1) # vs adding edges (i-1,j) and (i,j+1) d1 = dist_fn(tour[i-1], tour[i]) + dist_fn(tour[j], tour[j+1]) d2 = dist_fn(tour[i-1], tour[j]) + dist_fn(tour[i], tour[j+1]) if d2 < d1: tour[i:j+1] = tour[i:j+1][::-1] # reverse segment improved = True return tour
2-opt has no approximation guarantee (it can get stuck in local optima), but starting from a Christofides tour gives it a strong head start. The combination of "provably good start + local refinement" is the industry standard for TSP and vehicle routing.
You run a radio station and want every town in your region to receive your signal. You have a menu of transmitter locations, each covering a different set of towns. Each transmitter costs money to build. You want to cover all towns at minimum cost. This is the Set Cover problem.
Given a universe U of n elements and a collection S = {S1, S2, ..., Sm} of subsets of U, find the smallest sub-collection C ⊆ S such that ∪S ∈ C S = U. (The weighted version assigns costs to sets and minimizes total cost.)
Set cover is NP-hard (reducible from vertex cover). But the greedy algorithm gives a surprisingly good approximation — and it is essentially the best possible.
Set cover is a "meta-problem" — dozens of practical problems are set cover in disguise:
| Real Problem | Universe (elements) | Sets |
|---|---|---|
| Cell tower placement | Houses to cover | Coverage area of each tower |
| Test suite selection | Code paths to test | Code paths covered by each test |
| Hiring a team | Skills needed | Skills of each candidate |
| Feature selection | Data points to classify | Points correctly classified by each feature |
| Server placement | Users to serve | Users reachable from each location |
If you can model your problem as set cover, you immediately get the O(ln n) greedy approximation with zero algorithmic creativity required. This is the power of reduction: one algorithm solves dozens of problems.
The connection to vertex cover is also worth noting. Minimum vertex cover is a special case of set cover where each set contains exactly the edges incident to a vertex. The greedy set cover algorithm applied to vertex cover gives an O(ln Δ) ratio (where Δ is the max degree). For dense graphs this can be worse than the simple 2-approximation, but for sparse graphs it can be better. Knowing both algorithms lets you pick the better one for your instance.
The greedy strategy is obvious: at each step, pick the set that covers the most uncovered elements. Repeat until everything is covered.
python def greedy_set_cover(universe, sets): """Greedy set cover: always pick the set covering the most uncovered elements.""" uncovered = set(universe) chosen = [] available = list(range(len(sets))) while uncovered: # Pick set with maximum intersection with uncovered best = max(available, key=lambda i: len(sets[i] & uncovered)) newly = sets[best] & uncovered if not newly: break # no progress possible chosen.append(best) uncovered -= newly available.remove(best) return chosen # Example U = {1,2,3,4,5,6,7,8,9,10} S = [{1,2,3,4}, {3,4,5,6}, {5,6,7,8}, {7,8,9,10}, {1,5,9}] result = greedy_set_cover(U, S) print("Sets chosen:", result) # e.g., [0, 2, 3] (covers all 10 elements with 3 sets)
The greedy algorithm uses at most H(n) · OPT sets, where H(n) = 1 + 1/2 + 1/3 + ... + 1/n is the n-th harmonic number. Since H(n) ≈ ln n + 0.577..., this is an O(ln n)-approximation.
Here is the proof idea, using the elegant charging argument. Assign a "charge" to each element: when element e is newly covered by the greedy pick, charge it 1/(number of newly covered elements in this step). The total charge across all elements equals the number of sets picked by greedy. Now show that the charge allocated to elements in any single optimal set Sj is at most H(|Sj|) ≤ H(n). Since OPT sets cover all elements, the total greedy cost is at most H(n) · OPT.
Let us trace the charge calculation for a single optimal set Sj of size k. At the time of the greedy algorithm's first pick, suppose Sj still has all k elements uncovered. The greedy algorithm chose the set covering the MOST uncovered elements — so the chosen set covers at least k elements (it covers at least as many as Sj would). The charge per newly-covered element is at most 1/k. After some Sj-elements get covered by greedy picks, Sj has fewer remaining uncovered elements. At each subsequent step, the greedy set covers at least as many uncovered elements as Sj's remaining count. The charges accumulate: at most 1/k + 1/(k-1) + ... + 1/1 = H(k) ≤ H(n).
Universe U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. Sets and their sizes:
| Set | Elements | Size |
|---|---|---|
| S1 | {1, 2, 3, 4, 5, 6} | 6 |
| S2 | {5, 6, 7, 8, 9, 10} | 6 |
| S3 | {1, 2, 3, 7, 8, 9} | 6 |
| S4 | {10, 11, 12} | 3 |
| S5 | {4, 5, 11} | 3 |
Step 1: All three 6-element sets cover the same number of uncovered elements. Greedy picks any — say S1. Covered: {1,2,3,4,5,6}. Uncovered: {7,8,9,10,11,12}.
Step 2: S2 covers {7,8,9,10} = 4 new elements. S3 covers {7,8,9} = 3. S4 covers {10,11,12} = 3. S5 covers {11} = 1. Greedy picks S2. Uncovered: {11,12}.
Step 3: S4 covers {11,12} = 2. S5 covers {11} = 1. Greedy picks S4. All covered.
Result: 3 sets {S1, S2, S4}. The optimal solution also uses 3 sets (e.g., {S3, S2, S5} does not cover 4 or 12, so try {S1, S2, S4} — yes, 3 is optimal). Greedy matched OPT here. It often does for small instances. The logarithmic gap shows up on adversarial constructions designed to exploit the greedy heuristic.
The bound guarantees at most H(12) · 3 ≈ 3.1 × 3 ≈ 9.3 sets in the worst case. So using only 3 is dramatically better than the theoretical ceiling. This is typical: the H(n) ratio is a worst-case bound for adversarial inputs, and real-world instances rarely approach it.
Here is the classic adversarial construction. Take n = 2k elements. Create one "big" set of size n. Then create sets of size n/2, n/4, n/8, ..., 1 that together also cover all elements. The optimal solution uses the one big set (OPT = 1). But greedy, if it picks the half-sized sets first, might use O(log n) sets. This is the worst case, and it achieves the H(n) bound.
Concrete instance (n = 8):
| Set | Elements | Size |
|---|---|---|
| SBIG | {1, 2, 3, 4, 5, 6, 7, 8} | 8 |
| SA | {1, 2, 3, 4} | 4 |
| SB | {5, 6, 7, 8} | 4 |
| SC | {1, 2, 5, 6} | 4 |
| SD | {3, 4, 7, 8} | 4 |
Greedy sees four sets of size 4 tied with the big set of size 8... wait, the big set has size 8 and would be picked first. So the adversary needs to be cleverer. The real construction has the big set cost 1 and many small sets also cost 1. In the weighted version, assign cost 1 to SBIG and cost 1 to a collection of log n overlapping sets of various sizes. Greedy picks the log n small sets (each covering the most uncovered elements at cost 1), paying log n total, while OPT picks SBIG for cost 1.
For weighted set cover, greedy does not pick the set covering the most elements. Instead, it picks the set with the best cost-effectiveness: the set Si minimizing cost(Si) / |Si ∩ uncovered|. This "bang for the buck" greedy still achieves the H(n) ratio. The proof follows the same charging argument but with costs instead of counts.
This is exactly the principle behind coupon collectors, ad placement, and A/B test selection in tech companies. When you have a budget and want maximum "coverage" (user reach, feature testing, code path coverage), the cost-effective greedy is your go-to algorithm. It is also the algorithm behind the classical "coupon collector's problem" bound: to collect n distinct coupons by buying random ones, you need about n · H(n) ≈ n ln n purchases on average. The H(n) factor appears everywhere.
python def greedy_weighted_set_cover(universe, sets, costs): """Greedy weighted set cover: pick most cost-effective set each round.""" uncovered = set(universe) chosen = [] total_cost = 0 available = list(range(len(sets))) while uncovered: # Cost per newly covered element best = min( available, key=lambda i: costs[i] / max(1, len(sets[i] & uncovered)) ) newly = sets[best] & uncovered if not newly: break chosen.append(best) total_cost += costs[best] uncovered -= newly available.remove(best) return chosen, total_cost
The canvas below shows elements as dots and sets as colored regions. At each step, the greedy algorithm picks the set covering the most gray (uncovered) elements, turning them to the set's color. Watch how coverage accelerates with each pick.
Gray dots = uncovered. Colored dots = covered. Each step picks the set covering the most gray dots.
So far our approximation algorithms have been combinatorial — direct greedy constructions. Now we introduce a radically different technique: solve a relaxed version of the problem (allowing fractional solutions), then round the fractions to integers. This is LP relaxation and randomized rounding.
Many NP-hard problems can be formulated as integer linear programs (ILPs): minimize a linear objective subject to linear constraints, where variables must be 0 or 1. The NP-hardness comes from the integrality constraint. If we drop it — allowing variables to be any real number in [0, 1] — we get a linear program (LP) that can be solved in polynomial time.
The LP solution is a lower bound on the optimal integer solution (for minimization). The question is: how do we convert the fractional LP solution into a valid integer solution without losing too much?
The vertex cover problem as an ILP:
Relax xv ∈ {0, 1} to xv ∈ [0, 1]. Solve the LP. Now round: set xv = 1 if the LP value is ≥ 0.5, else xv = 0.
This is a valid cover: for each edge (u, v), the LP constraint says xu + xv ≥ 1, so at least one of them is ≥ 0.5, so at least one gets rounded to 1. The cost? Each variable at most doubles (from 0.5 to 1), so the rounded cost ≤ 2 · LP* ≤ 2 · OPT. Another 2-approximation for vertex cover, from a completely different angle.
Consider a triangle graph: vertices {A, B, C}, edges {(A,B), (B,C), (A,C)}. The ILP formulation:
The LP optimum: xA = xB = xC = 0.5. Cost = 1.5. (Verify: each constraint 0.5 + 0.5 = 1 ≥ 1. No feasible solution has lower cost — if any variable is below 0.5, one of its two constraints forces the other endpoint above 0.5.)
Rounding at 0.5: all three round to 1. Rounded cover = {A, B, C}. Cost = 3.
Optimal integer solution: any 2 of the 3 vertices (say {A, B}). Cost = 2.
Ratio: 3/2 = 1.5 on this instance. Within the guaranteed bound of 2. The LP* = 1.5 was a better lower bound than the matching bound (1 edge ⇒ OPT ≥ 1), which is why LP rounding can sometimes give tighter analysis.
For maximization problems, randomized rounding shines. Consider MAX-SAT: given a boolean formula in CNF, find an assignment that satisfies as many clauses as possible.
LP relaxation: for each variable xi, let yi ∈ [0, 1] represent the "probability" of setting xi = true. For each clause Cj, let zj ∈ [0, 1] represent the probability of satisfying it. Maximize ∑ zj subject to constraints that link zj to the yi's in clause j.
After solving the LP, round randomly: set xi = true with probability yi (the LP value). Each clause Cj with k literals is satisfied with probability at least 1 − ∏(1 − yi) for positive literals and ∏ yi for negative literals. Using the AM-GM inequality and the LP constraint, this probability is at least (1 − (1 − zj/k)k). Since (1 − 1/k)k ≤ 1/e, the probability is at least (1 − 1/e) · zj ≈ 0.632 · zj. Summing over all clauses: the expected number of satisfied clauses is at least (1 − 1/e) times the LP optimum.
For clauses of size exactly k = 1, randomized rounding gives probability yi = zj (perfect). For k = 2, it gives at least (3/4) · zj. The overall guarantee of 3/4 comes from combining LP rounding with random assignment (see below).
This bound appears repeatedly in randomized rounding. Suppose a clause has k literals, each set to true independently with probability at least 1/k. The probability the clause is unsatisfied is at most (1 − 1/k)k. By calculus, (1 − 1/k)k is an increasing function of k that converges to 1/e ≈ 0.368 as k → ∞. So the probability of satisfaction is at least 1 − 1/e ≈ 0.632.
For k = 1 (unit clause): probability = 1/1 = 1.0. For k = 2: 1 − (1/2)2 = 3/4 = 0.75. For k = 3: 1 − (2/3)3 ≈ 0.704. For large k: approaches 0.632. The worst case is large clauses, where randomized rounding still gives a constant-factor guarantee.
A remarkable result: if you take the better of (a) randomized rounding using LP values and (b) a completely random assignment (each variable true with probability 1/2), the expected number of satisfied clauses is at least (3/4) · OPT. This is because LP rounding performs well on large clauses, and random assignment performs well on small clauses. Their maximum always achieves the 3/4 bound.
Let us verify with a concrete example. Consider the clause (x1 ∨ x2 ∨ x3). Random assignment satisfies it with probability 1 − (1/2)3 = 7/8. LP rounding (assuming each variable has LP value ≥ 1/3 from the clause constraint): probability ≥ 1 − (2/3)3 = 19/27 ≈ 0.704. Random assignment wins for this 3-literal clause.
Now consider a unit clause (x1). Random assignment satisfies it with probability 1/2. LP rounding (LP value = 1 from the constraint x1 ≥ 1): probability 1.0. LP rounding wins for unit clauses.
The combined algorithm takes the better of the two on each instance, achieving 3/4 overall. This 3/4 ratio for MAX-SAT is tight: Hastad proved that no polynomial-time algorithm can do better than 7/8 for MAX-3SAT (where every clause has exactly 3 literals), and the general MAX-SAT ratio of 3/4 is also tight.
Randomized rounding gives an expected approximation ratio. Can we make it deterministic? Yes, via the method of conditional expectations. Instead of flipping coins, set each variable greedily to maximize the conditional expectation of the objective. At each step, compute E[satisfied clauses | x1=T, ..., xi=v] for v ∈ {T, F} and pick the better one. This deterministic algorithm achieves the same guarantee as the randomized version, no luck required.
The ratio between the optimal integer solution and the optimal LP relaxation is called the integrality gap. For vertex cover, the integrality gap is exactly 2 (achieved by the complete graph Kn for odd n, where the LP sets every xv = 0.5 for a cost of n/2, but the optimal integer cover needs n − 1 vertices). This means LP-rounding cannot give a ratio better than 2 for vertex cover — the LP itself is too loose.
To beat ratio 2 for vertex cover (if possible at all), one would need a technique that goes beyond LP relaxation — perhaps semidefinite programming (SDP) or a cleverer combinatorial approach. No one has succeeded in 50+ years of trying.
The canvas below shows a small vertex cover instance. Left: the fractional LP solution (each vertex shows its LP value, circle radius proportional to xv). Right: the rounded integer solution. Vertices with LP value ≥ 0.5 become solid (in the cover). Notice how the LP "spreads" the cover across vertices, and rounding concentrates it.
Left: fractional LP solution (circle size = LP value). Right: rounded integer solution (solid = in cover). Edge constraints shown.
python def lp_vertex_cover_round(n, edges, weights=None): """LP relaxation + deterministic rounding for weighted vertex cover.""" if weights is None: weights = [1]*n # In practice, use scipy.optimize.linprog or PuLP # Here we simulate the LP solution for illustration # Real LP: minimize sum(w[v]*x[v]), s.t. x[u]+x[v]>=1 for (u,v), 0<=x[v]<=1 # Simple approximation: solve LP, round at 0.5 threshold lp_vals = solve_lp(n, edges, weights) # returns list of floats in [0,1] cover = [v for v in range(n) if lp_vals[v] >= 0.5] # Verify: every edge is covered for u, v in edges: assert u in cover or v in cover return cover, lp_vals
This is the payoff chapter. Below is a unified simulation where you can explore three NP-hard problems and their approximation algorithms side by side. For small instances, we compute the exact optimum via brute force so you can see the actual approximation ratio.
Choose a problem. Generate an instance. Run the approximation and see how it compares to optimal.
Play with the sliders. Notice how the approximation ratio stays within its theoretical bound on every instance, but is usually much better. Try increasing the problem size — at some point "Compute Optimal" becomes noticeably slow (it is exponential), while the approximation stays instant.
As you experiment, test these predictions from the theory:
Vertex Cover: The approximation ratio should always be ≤ 2. On most random instances, it will be closer to 1.3-1.5. The worst case (ratio approaching 2) occurs on graphs that look like a disjoint union of edges — a perfect matching with no other edges. On such graphs, the algorithm adds both endpoints of every edge (2n vertices) while the optimal solution needs only n vertices (one endpoint per edge). Try generating many graphs to find one where the ratio exceeds 1.5 — it requires specific structure.
TSP: The MST-based 2-approximation should always produce a tour ≤ 2× optimal. Christofides would give ≤ 1.5×, but is harder to implement (we use the simpler MST approach here). On random city placements, expect ratios around 1.1-1.3.
Set Cover: The greedy algorithm should use at most (ln n + 1) × OPT sets. For n = 10, that is about 3.3 × OPT. For n = 16, about 3.8 × OPT. On random instances, greedy typically matches or nearly matches OPT. The logarithmic gap only appears on adversarially constructed instances.
In the vertex cover visualization, the colors tell a story:
Orange nodes are in the approximate cover but NOT in the optimal cover. These are the "wasted" vertices — the price of using a simple algorithm. The 2-approximation can include up to twice as many vertices as needed, and the extras show up as orange.
Teal nodes are in the optimal cover but NOT in the approximate cover. These represent opportunities the approximation "missed" — vertices that a smarter algorithm would have included instead of some orange ones.
Yellow nodes are in BOTH covers. These are the essential vertices — both the optimal and approximate algorithms agree they must be included.
The ideal approximation would show all yellow (perfect overlap with optimal). In practice, you will see a mix, with the approximate cover always being at least as large as optimal and at most twice as large.
Try this experiment with TSP. Set cities to 7 and click "Compute Optimal" — it is instant. Set to 8 — still fast. Set to 9 — a slight pause. Set to 10 — noticeable delay. Each additional city multiplies the brute-force time by roughly n (since we iterate over permutations). The MST approximation? Instant at every size. This is the practical argument for approximation algorithms.
Here are the exact counts to make the exponential growth visceral. For n cities, brute-force TSP examines (n−1)! permutations (fixing the start city). The MST approximation does O(n2) work regardless:
| Cities (n) | Permutations | Brute Force Time | MST Approx Time |
|---|---|---|---|
| 7 | 720 | < 1ms | < 1ms |
| 8 | 5,040 | ~1ms | < 1ms |
| 9 | 40,320 | ~10ms | < 1ms |
| 10 | 362,880 | ~100ms | < 1ms |
| 15 | 87 billion | ~hours | < 1ms |
| 20 | 1.2 × 1017 | ~centuries | < 1ms |
The approximation algorithm's O(n2) time barely changes as n grows. The exact algorithm's factorial time explodes beyond any practical limit. At n = 20, the approximation gives a provably-within-2x answer in microseconds, while the exact answer would take longer than the age of the universe. This is the fundamental motivation for approximation algorithms.
Some NP-hard problems admit arbitrarily good approximations. You get to choose how close to optimal you want to be — at the cost of more computation time. This is the territory of polynomial-time approximation schemes.
A PTAS for a problem is a family of algorithms: for every fixed ε > 0, there is a (1 + ε)-approximation algorithm that runs in polynomial time in n. The catch: "polynomial in n" can mean O(n1/ε), which for small ε is a very large polynomial. The running time is polynomial in n but can be exponential in 1/ε.
Example: Euclidean TSP. Arora (1996) and Mitchell (1999) showed that Euclidean TSP has a PTAS: for any ε > 0, you can find a tour within (1 + ε) of optimal in O(n(log n)O(1/ε)) time. Want a tour within 1% of optimal? Set ε = 0.01 and wait. It is polynomial in n, but the constant is astronomical.
Another example: the subset sum problem. Given integers and a target T, find a subset with sum as close to T as possible without exceeding it. The DP runs in O(nT) which is pseudo-polynomial. The PTAS scales T by a factor of ε/n, reducing the DP table to O(n2/ε) entries. The result is within (1 − ε) of the optimal sum.
The critical distinction between PTAS and FPTAS is subtle but important. Consider the running time as a function of n and ε:
| Scheme | Time (example) | ε = 0.1 | ε = 0.01 | ε = 0.001 |
|---|---|---|---|---|
| PTAS | O(n1/ε) | O(n10) | O(n100) | O(n1000) |
| FPTAS | O(n2/ε) | O(10n2) | O(100n2) | O(1000n2) |
The PTAS time explodes exponentially as ε shrinks — n100 is useless even for n = 2. The FPTAS time grows only linearly with 1/ε, remaining practical for small ε. This is why FPTAS is a much stronger guarantee.
An FPTAS is a PTAS where the running time is polynomial in both n and 1/ε. This is the strongest type of approximation guarantee short of an exact polynomial-time algorithm (which would require P = NP for NP-hard problems). With an FPTAS, you get a smooth, efficient knob: turn ε down for better quality, up for faster runtime, and the cost of turning the knob is itself polynomial.
Not every problem with a PTAS has an FPTAS. In fact, if a problem is strongly NP-hard (NP-hard even when all numbers in the input are polynomially bounded), then it cannot have an FPTAS unless P = NP. This is because an FPTAS on a strongly NP-hard problem would give an exact algorithm in polynomial time (set ε = 1/(nW) where W is the max value). The 3-partition problem, bin packing, and multiprocessor scheduling are strongly NP-hard and therefore have no FPTAS.
The canonical example is the 0/1 Knapsack problem. Recall that the DP solution for knapsack runs in O(nW) time, which is pseudo-polynomial — polynomial in the numeric value of W, but exponential in the number of bits needed to represent W. If W = 264, the DP table has 264 entries, which is infeasible even on modern hardware.
The FPTAS cleverly reduces the DP table size by rounding the input values. The idea: we do not need exact values; we just need to get within (1 − ε) of optimal. By coarsening the values, we shrink the DP table from O(nW) to O(n2/ε) — polynomial in both n and 1/ε.
The FPTAS works by rounding the values:
python def knapsack_fptas(values, weights, capacity, epsilon): """FPTAS for 0/1 Knapsack: (1-epsilon)-approximation.""" n = len(values) v_max = max(values) K = epsilon * v_max / n # scaling factor if K == 0: K = 1 # Round values scaled = [int(v / K) for v in values] V = sum(scaled) # DP on scaled values: dp[j] = min weight to achieve value j INF = float('inf') dp = [INF] * (V + 1) dp[0] = 0 keep = [[False]*(V+1) for _ in range(n)] for i in range(n): for j in range(V, scaled[i]-1, -1): if dp[j - scaled[i]] + weights[i] < dp[j]: dp[j] = dp[j - scaled[i]] + weights[i] keep[i][j] = True # Find best achievable value within capacity best_j = max(j for j in range(V+1) if dp[j] <= capacity) # Backtrack to find items items = [] j = best_j for i in range(n-1, -1, -1): if keep[i][j]: items.append(i) j -= scaled[i] return items, sum(values[i] for i in items) # Running time: O(n^2 / epsilon) — polynomial in BOTH n and 1/epsilon
The canvas below lets you vary ε and see how it affects the knapsack FPTAS. Smaller ε means a better approximation but more computation. Watch the number of DP table entries grow as you decrease ε.
Drag the ε slider. The bar chart shows: actual value achieved (orange) vs optimal (teal). The counter shows DP table size.
The key insight is that rounding values by a factor K introduces at most K error per item. With at most n items in the solution, the total error is at most nK. We chose K = ε · vmax / n, so the total error is at most n · (ε · vmax / n) = ε · vmax ≤ ε · OPT (since OPT ≥ vmax for any valid solution containing the most valuable item, or else we can check single items separately).
Therefore the FPTAS value ≥ OPT − ε · OPT = (1 − ε) · OPT. The running time is O(n · V') where V' = ∑ ⌊vi / K⌋ ≤ n · (vmax / K) = n · (n / ε) = n2 / ε. This is polynomial in both n and 1/ε — the definition of FPTAS.
Suppose you have 100 items with values up to 1000, and you want a solution within 5% of optimal (ε = 0.05). The scaling factor K = 0.05 × 1000 / 100 = 0.5. Each rounded value is at most 1000/0.5 = 2000. The DP table has 100 × 200,000 = 20 million entries. That is large but polynomial and easily fits in memory. An exact DP with values up to 100,000 total would have 10 million entries — not much smaller. The FPTAS pays a moderate cost for its guaranteed approximation quality.
Now set ε = 0.001 (0.1% accuracy). K = 0.01. DP table: 100 × 10,000,000 = 1 billion entries. Still polynomial, but now impractical. This illustrates the FPTAS tradeoff: as you demand more accuracy, computation grows as 1/ε.
We have seen problems that can be approximated within a factor of 2, ln n, or (1 + ε). But some NP-hard problems resist approximation entirely. Understanding these barriers is as important as knowing the algorithms — it tells you when to stop trying.
If distances do not satisfy the triangle inequality, TSP cannot be approximated within any polynomial function ρ(n) in polynomial time, unless P = NP.
The proof is a beautiful reduction. Suppose you had a ρ(n)-approximation A for general TSP. We will use it to solve Hamiltonian Cycle (an NP-complete problem) in polynomial time. Given a graph G on n vertices, construct a complete weighted graph: edges in G get weight 1, non-edges get weight ρ(n) · n + 1. If G has a Hamiltonian cycle, the optimal TSP tour has cost n. If not, every tour must use a non-edge, costing at least ρ(n) · n + 1. So A can distinguish the two cases — solving the NP-complete problem.
The most profound result in the theory of approximation is the PCP Theorem (Arora, Lund, Motwani, Sudan, Szegedy, 1998). PCP stands for Probabilistically Checkable Proofs. The theorem states:
What does this mean? Every NP statement has a proof that can be verified by reading only O(1) random bits and O(log n) positions in the proof. A "verifier" flips a few coins to decide which positions to check, then accepts or rejects based on those few bits.
The connection to approximation: the PCP theorem implies that MAX-3SAT is NP-hard to approximate within some constant factor. From there, reductions give inapproximability results for dozens of other problems. The entire landscape of "what can and cannot be approximated" traces back to PCP.
| Problem | Best Poly-Time Approx | Hardness of Approximation |
|---|---|---|
| MAX-3SAT | 7/8 | No (7/8 + ε) unless P = NP |
| Clique | O(n / log2 n) | No n1−ε unless P = NP |
| Chromatic number | O(n / log3 n) | No n1−ε unless P = NP |
| Set Cover | (1 − o(1)) ln n | No (1 − ε) ln n unless P = NP |
| Vertex Cover | 2 | No (2 − ε) under UGC |
| TSP (general) | None | No ρ(n) for any computable ρ |
Notice the spectrum. MAX-3SAT and set cover have tight results: the best algorithm matches the hardness bound. Clique is nearly impossible to approximate — polynomial ratio is essentially the trivial algorithm. Vertex cover is conjectured to be tight at 2, but the proof depends on the Unique Games Conjecture (UGC), which remains open.
Maximum Clique is one of the hardest problems to approximate. Given a graph, find the largest complete subgraph. The best polynomial-time algorithm finds a clique of size O(n / log2 n) when the optimal clique has size n — essentially no better than the trivial algorithm that returns a single vertex.
Hastad (1996) proved that Clique cannot be approximated within n1−ε for any ε > 0, unless P = NP. For a graph with 1000 vertices, this means no polynomial-time algorithm can distinguish between "largest clique is 10" and "largest clique is 10000.99 ≈ 977" — unless P = NP. This is a stunningly strong inapproximability result, far beyond what most NP-hard problems exhibit.
Why does the PCP theorem connect to approximation? Here is the intuition. An NP-complete problem like SAT asks "is there an assignment satisfying ALL clauses?" An approximation problem like MAX-SAT asks "what fraction of clauses can we satisfy?" The PCP theorem says that these two questions are computationally equivalent up to a constant factor. If you could approximate MAX-SAT better than 7/8, you could distinguish "all satisfiable" from "at most 7/8 satisfiable" — which is the gap that the PCP characterization creates.
This gap amplification — turning a "yes/no" decision problem into a "high/low value" promise problem — is the mechanism that converts NP-hardness into inapproximability.
We proved the TSP result quickly in the text above. Let us now walk through it with actual numbers to make it visceral.
Suppose someone claims a 1000-approximation algorithm A for general TSP. Given a graph G on 10 vertices, we construct G' with:
If G has a Hamiltonian cycle, the optimal tour in G' costs 10. Algorithm A returns a tour of cost ≤ 1000 × 10 = 10,000.
If G has NO Hamiltonian cycle, every tour uses at least one non-edge. That non-edge alone costs 10,001. So A's tour costs ≥ 10,001 > 10,000.
We can distinguish the two cases by checking if A's tour costs ≤ 10,000 or > 10,000. We just solved Hamiltonian Cycle in polynomial time. Since HC is NP-complete, P = NP. The same argument works for ANY ratio ρ(n) — just set the non-edge weight to ρ(n) · n + 1.
The canvas below maps NP-hard problems by their approximability. Left = easy to approximate (FPTAS). Right = impossible. Click on any problem to see its details.
Problems arranged by how well they can be approximated. Click a problem for details. FPTAS (left) to inapproximable (right).
Putting it all together, we can rank NP-hard optimization problems by how well they can be approximated. This hierarchy is fundamental to understanding the landscape of computational complexity:
Every rung of this ladder represents a fundamental barrier. Moving a problem from one level to another (proving a better algorithm or a tighter hardness result) is typically a major research achievement. The PCP theorem was what established most of these barriers.
Here is a useful mnemonic for the hierarchy: FPTAS is a dream, PTAS is nice, APX is solid, Log-APX is acceptable, and inapproximable is a nightmare. When analyzing a new NP-hard problem, your first question should be: "Which rung of the ladder does it sit on?" This immediately tells you what techniques are worth trying.
Approximation algorithms come up in interviews under a specific pattern: "This problem is NP-hard. What do you do?" The interviewer is testing whether you (a) recognize the NP-hardness, (b) know what approximation ratios are achievable, and (c) can implement the key algorithms.
| Problem | Type | Best Ratio | Algorithm | Time |
|---|---|---|---|---|
| Vertex Cover | Min | 2 | Maximal matching | O(V+E) |
| TSP (Δ-ineq) | Min | 3/2 | Christofides | O(n3) |
| TSP (Euclidean) | Min | 1+ε | Arora PTAS | O(n logO(1/ε) n) |
| Set Cover | Min | ln n + 1 | Greedy | O(nm) |
| MAX-SAT | Max | 3/4 | Rand. rounding | O(LP time) |
| MAX-CUT | Max | 0.878 | SDP rounding | O(SDP time) |
| Knapsack | Max | 1+ε | FPTAS | O(n2/ε) |
| Bin Packing | Min | 1+ε | APTAS | poly(n, 1/ε) |
| Steiner Tree | Min | 1.39 | LP + iterative | O(LP time) |
| Scheduling (makespan) | Min | 4/3 | LPT | O(n log n) |
Pattern 1: "Minimize cost under coverage constraint"
If the problem looks like "cover everything at minimum cost," think Set Cover. Greedy gives O(ln n) ratio. If costs are unit, look for a 2-approximation via LP rounding or combinatorial methods.
Pattern 2: "Find shortest/cheapest tour"
First ask: does triangle inequality hold? If yes, MST doubling (ratio 2) or Christofides (ratio 3/2). If Euclidean, Arora's PTAS. If no metric assumption, explain that no approximation is possible.
Pattern 3: "Select subset maximizing value under constraints"
If it looks like Knapsack, remember the FPTAS. If it looks like MAX-SAT, think LP relaxation + randomized rounding. If it is MAX-CUT or constraint satisfaction, mention SDP rounding.
Pattern 4: "Graph selection problem"
Vertex cover, independent set, dominating set. Know that vertex cover has ratio 2, independent set is nearly inapproximable (no n1-ε), and dominating set is essentially set cover (ln n ratio).
Pattern 5: "Partition into groups optimally"
This is scheduling (partition jobs into machines) or bin packing (partition items into bins). For makespan scheduling, LPT gives 4/3. For bin packing, First Fit Decreasing gives 11/9 · OPT + 6/9. Both have PTAS/APTAS variants for tighter ratios.
Pattern 6: "This is hard but the instance is special"
Always check for special structure. Is the graph planar? Bounded treewidth? Is the metric Euclidean? Special structure often admits better approximations or even PTAS. For example, vertex cover on planar graphs has a PTAS (via Baker's technique), while general vertex cover is stuck at ratio 2.
Interviewer: "You need to place the minimum number of fire stations such that every house is within 5 miles of a station. How would you solve this?"
You: "This is the minimum dominating set problem on a distance graph — each node is a potential station location, and edges connect locations within 5 miles. This is NP-hard, so I would not try to find the exact optimum."
You (continued): "I would reduce it to set cover: each potential location has a 'coverage set' of houses within 5 miles. Then I would use the greedy set cover algorithm — at each step, place the station that covers the most uncovered houses. This gives an O(ln n) approximation, which means if the optimal solution needs 10 stations, I would use at most about 33."
You (continued): "If the geography is Euclidean (houses and stations on a 2D plane), I could exploit geometric structure for a better ratio. But for a road network with arbitrary distances, greedy set cover with O(ln n) is essentially the best we can do — it is provably tight."
The canvas below presents random instances. Implement and trace through the algorithms mentally, then check your answer.
A random graph or set cover instance is shown. Trace the approximation algorithm, predict the cover, then click "Show Answer".
When an interviewer says "this is NP-hard, what do you do?", walk through this hierarchy:
Trap 1: "Just use DP." If the problem has exponential state space (e.g., subset selection with n > 30), exact DP is infeasible. Know when to switch to approximation.
Trap 2: "Just use greedy." Greedy is only an approximation algorithm if you can prove a ratio. "Greedy works well in practice" is not a proof. Always state the ratio.
Trap 3: "This looks like vertex cover, so ratio 2." Be careful with reductions. Independent set is the complement of vertex cover, but it has a WILDLY different approximation ratio (near n1−ε inapproximability vs ratio 2). Complementation does not preserve approximation ratios.
Trap 4: Confusing PTAS existence with practical efficiency. A PTAS for Euclidean TSP exists, but for small ε it is impractical. In interviews, mention the PTAS for theoretical completeness but recommend Christofides for practical use.
Trap 5: Forgetting to state assumptions. The TSP 3/2-approximation requires the triangle inequality. Always state which variant you are solving. If the interviewer says "arbitrary distances," you must say "no constant approximation exists" — not "use Christofides."
Trap 6: Not knowing when exact is feasible. For small n (say n ≤ 20 for TSP, n ≤ 30 for vertex cover), exact algorithms via DP or branch-and-bound may run in seconds. Do not propose a 2-approximation when n = 10 and you could just solve it exactly. Know the crossover point.
python # 1. Vertex Cover 2-approx: O(V + E) # - Find maximal matching (greedy on edges) # - Return both endpoints of every matched edge # 2. TSP 2-approx (metric): O(n^2) for MST + O(n) for DFS # - Build MST (Prim's or Kruskal's) # - DFS preorder traversal of MST # - Return as Hamiltonian cycle # 3. Set Cover O(ln n)-approx: O(n * m) per step # - While uncovered elements exist: # - Pick set with most uncovered elements # - Mark its elements as covered # 4. Knapsack FPTAS: O(n^2 / epsilon) # - Scale values: v' = floor(v * n / (epsilon * v_max)) # - Run DP on scaled values # - Backtrack to find items
Approximation algorithms sit at the intersection of several major algorithmic themes. Here is how this chapter connects to the broader CLRS landscape and to real-world practice.
| Chapter | Connection |
|---|---|
| Ch 34: NP-Completeness | Approximation only matters because these problems are NP-hard. Ch 34 establishes the hardness; Ch 35 asks "now what?" |
| Ch 16: Greedy Algorithms | Set cover's greedy algorithm is a direct application of the greedy paradigm. The difference: in Ch 16, greedy is exact; here, greedy is approximate with a proven ratio. |
| Ch 23: MST | The MST is the lower bound for metric TSP. Prim's and Kruskal's algorithms are subroutines in the TSP approximation. |
| Ch 29: Linear Programming | LP relaxation is the foundation of randomized rounding and the vertex cover LP-based 2-approximation. Duality gives lower bounds on OPT. |
| Ch 15: Dynamic Programming | The knapsack FPTAS uses DP as a subroutine, with scaled values to control table size. DP + scaling = approximation scheme. |
| Approach | Guarantee | When to Use |
|---|---|---|
| Exact (brute force/DP) | Optimal | Small instances (n ≤ 20) |
| Approximation algorithms | Provable ratio | When you need guarantees |
| Heuristics (simulated annealing, genetic) | None | When speed matters, guarantees don't |
| Fixed-parameter tractable | Exact, parameterized | When a parameter (treewidth, etc.) is small |
| Randomized algorithms | Expected or w.h.p. | When randomness helps (MAX-CUT, SAT) |
Vehicle routing: TSP variants with time windows, capacity constraints. Christofides-based heuristics are the backbone of logistics optimization at UPS, FedEx, and Amazon. Google's OR-Tools routing solver uses MST-based initial solutions with local search improvement, processing millions of deliveries daily.
Network design: Set cover and Steiner tree approximations power cell tower placement, server farm location, and CDN design. When Akamai or Cloudflare decides where to place edge servers, the underlying optimization is a facility location problem (a close cousin of set cover) solved via LP-based approximations.
Scheduling: Makespan minimization on parallel machines uses LPT (Longest Processing Time first), a 4/3-approximation that runs in O(n log n). Every cloud scheduler (Kubernetes, Mesos, Borg) uses scheduling approximations to assign pods to nodes.
Compiler optimization: Register allocation is a graph coloring problem. Approximation and heuristic coloring algorithms decide which variables live in registers vs memory. LLVM uses a greedy graph coloring heuristic that has no formal approximation ratio but works well in practice.
Machine learning: Feature selection (pick k features to maximize prediction accuracy) is related to set cover. Clustering (k-means, k-median) has constant-factor approximation algorithms. The celebrated k-means++ initialization (Arthur and Vassilvitskii, 2007) is an O(log k)-approximation for k-means clustering, and it runs in O(nk) time. It is the default initialization in scikit-learn.
Bioinformatics: Genome assembly, sequence alignment, and phylogenetic tree construction all involve NP-hard optimization problems. The Steiner tree problem (connect a subset of "terminal" nodes in a graph at minimum cost) arises in network biology and has a 1.39-approximation via LP-based methods.
VLSI design: Circuit layout involves placing components and routing wires to minimize total wire length — this is a collection of NP-hard problems (graph partitioning, Steiner tree, TSP variants). VLSI CAD tools like Cadence and Synopsys use approximation algorithms as starting points for simulated annealing and other metaheuristics.
When faced with an NP-hard optimization problem, work through this decision tree:
Modern ML systems often solve NP-hard optimization problems using learned heuristics (e.g., neural network-guided TSP solvers). These can produce excellent solutions in practice but come with no worst-case guarantees. The approximation algorithm community's response: use the ML heuristic for speed, but verify using the approximation algorithm as a baseline. If the ML solution is worse than the 2-approximation, something is wrong with the model.
There is also an emerging field of learning-augmented algorithms: use ML predictions to improve worst-case algorithms. For example, a TSP solver might use a neural network to predict a good starting tour, then prove that if the prediction is ε-close to optimal, the algorithm achieves a (1 + O(ε))-approximation. If the prediction is garbage, the algorithm still guarantees ratio 2. Best of both worlds: good performance when ML works, guaranteed safety when it does not.
Several major questions remain open in approximation algorithms:
Vertex Cover: Is the ratio of 2 optimal for polynomial-time algorithms? The best inapproximability result (assuming UGC) says no (2 − ε)-approximation exists. But UGC itself is unproven. Without UGC, the best known hardness is about 1.36 (Dinur and Safra, 2005). The gap between 1.36 and 2 has been open for decades.
Metric TSP: Christofides' ratio of 3/2 has been essentially unimproved since 1976 (the 2020 improvement of 10−36 barely counts). Can we get below 1.4? 1.3? Nobody knows. The best lower bound is 123/122 ≈ 1.008 (from NP-hardness), leaving an enormous gap. The true answer might be anything between 1.008 and 1.5 — that is how little we know about the hardness of metric TSP.
Unique Games Conjecture: Resolving UGC would settle approximation ratios for dozens of problems simultaneously. It is the single most impactful open problem in the theory of approximation. Subhash Khot received the Nevanlinna Prize (now the Abacus Medal) in 2014 for formulating it.
Beyond worst case: Can we prove better ratios for "typical" or "random" inputs? The field of average-case approximation is nascent but promising. For instance, on random graphs, the greedy vertex cover algorithm empirically achieves ratio ≈ 1.1, far better than the worst-case 2. Making such average-case bounds rigorous is an active research area.
If this chapter sparked your interest, here are the natural next steps. For the theory: Vazirani's "Approximation Algorithms" textbook goes far deeper into LP rounding, semidefinite programming, and primal-dual methods. Williamson and Shmoys' "Design of Approximation Algorithms" is the modern reference. For the practice: solve approximation problems on competitive programming platforms (Codeforces tags: greedy, graphs, flow). For the frontier: read about the resolution of the Unique Games Conjecture and the new results on the hardness of MAX-CUT.
Within this lesson series, the most relevant connections are:
| Lesson | Why It Connects |
|---|---|
| Ch 16: Greedy Algorithms | Greedy is the most common approximation technique. Set cover greedy is Chapter 16 applied to NP-hard problems. |
| Ch 34: NP-Completeness | You need to know a problem is NP-hard before approximation makes sense. Reductions also prove inapproximability. |
Approximation algorithms represent a profound shift in thinking. Classical algorithm design asks: "What is the optimal solution?" Approximation asks: "How close can I get, how fast, and can I prove it?" This shift from optimality to bounded sub-optimality is not a defeat — it is a mature response to computational reality.
The P ≠ NP conjecture (if true) means that thousands of natural optimization problems cannot be solved exactly in polynomial time. Approximation algorithms say: "Fine. We cannot find the best answer. But we can find an answer that is within 50% of best, or within 10%, or within 0.1%, and we can prove it." That is not giving up. That is engineering.
The deepest lesson from this chapter is that the quality of an approximation is itself computable and provable. We do not just hope our heuristic works. We prove it does, on every input, against every adversary. The approximation ratio is not a suggestion — it is a theorem.
This mathematical certainty is what separates approximation algorithms from heuristics. A simulated annealing algorithm for TSP might produce excellent tours in practice, but it comes with no guarantee — on some pathological input, it might produce a tour 100x worse than optimal. Christofides will never be worse than 1.5x. When lives depend on the answer (medical scheduling, emergency routing, infrastructure design), that guarantee is worth its weight in gold.
As a final thought: the existence of approximation algorithms shows that the NP-hardness barrier is not a wall but a slope. Some problems (knapsack) can be climbed almost to the top with an FPTAS. Others (vertex cover, TSP) have a ledge at a constant factor where we are stuck. And a few (clique, general TSP) are sheer cliffs where no polynomial-time approach can gain any meaningful foothold. Mapping this slope — problem by problem, ratio by ratio — is one of the great achievements of 20th-century theoretical computer science, and the work continues today.
"The art of approximation is not settling for less — it is proving how little you have lost."