Introduction to Algorithms (CLRS) — Chapter 35

Approximation Algorithms

When exact is impossible, good enough with a guarantee is gold. Vertex cover, TSP, set cover, randomized rounding, PTAS, and inapproximability.

Prerequisites: NP-completeness (Ch 34 — know what NP-hard means) + Greedy algorithms (Ch 16) + LP basics (Ch 29 — just know what an LP is).
10
Chapters
9+
Simulations
5
Core Algorithms

Chapter 0: The Problem

You are a delivery driver. You have 15 stops to make today, and you want the shortest route that visits all of them and returns home. This is the Travelling Salesman Problem (TSP). In Chapter 34 you learned it is NP-hard: nobody knows a polynomial-time algorithm that finds the exact optimal tour, and most complexity theorists believe no such algorithm exists.

So you are stuck. Your boss does not care about P versus NP. He wants a route by 8 AM. What do you do?

You find a route that is pretty good. Not perfect, but provably close. Instead of examining all 15! = 1,307,674,368,000 possible orderings (that would take longer than the age of the universe on any computer), you run a fast algorithm that guarantees a tour at most 1.5 times longer than the optimal one. It finishes in milliseconds. Your boss is happy. Complexity theory is satisfied. Everyone wins.

This is the domain of approximation algorithms. When an optimization problem is NP-hard, we abandon the quest for the exact optimum and instead seek polynomial-time algorithms that come with a mathematical guarantee: the answer will be within a known factor of optimal.

Think of it this way. You are designing a bridge. An exact stress analysis might take weeks of supercomputer time. An approximate analysis that is guaranteed to overestimate the stress by at most 50% takes ten minutes. You build the bridge 50% stronger than the approximation suggests, and you are guaranteed safe. The extra 50% is the "cost of approximation" — and it is vastly cheaper than waiting weeks for exactness.

The Approximation Ratio

Let C be the cost of the solution your algorithm produces, and let C* be the cost of the optimal solution. For a minimization problem, the approximation ratio ρ(n) satisfies:

C / C* ≤ ρ(n)

For a maximization problem, we flip it:

C* / C ≤ ρ(n)

In both cases, ρ(n) ≥ 1. A ratio of 1 means exact. A ratio of 2 means your answer is at most twice as bad as optimal. A ratio of ln(n) means it degrades logarithmically with input size. The closer to 1, the better.

An algorithm with ratio ρ(n) is called a ρ(n)-approximation algorithm. The holy grail is a polynomial-time approximation scheme (PTAS): for any ε > 0, a (1 + ε)-approximation in polynomial time. Some NP-hard problems have a PTAS. Many do not. Understanding which problems allow how-good an approximation is one of the deepest questions in theoretical computer science.

Let us work through a concrete example to make the ratio tangible. Suppose you are solving minimum vertex cover on a graph with 50 vertices. The optimal cover has 12 vertices (you do not know this — computing it is NP-hard). You run a 2-approximation and get a cover of 20 vertices. The actual ratio: 20/12 ≈ 1.67. The guaranteed worst-case ratio: 2. So the guarantee says at most 24 vertices, and you did better. This is typical — the guarantee is a ceiling, not a prediction.

The core insight. Approximation algorithms are not about giving up. They are about being precise about how much you are giving up. An algorithm with a proven ratio of 2 is infinitely more valuable than a heuristic that "usually works well" — because the guarantee holds on EVERY input, including adversarial ones your test suite never imagined.

A Taste: Optimal vs Approximate

The canvas below shows a TSP instance: cities scattered on a plane. The blue tour is the optimal solution (computed by brute force for this small instance). The orange tour is a 2-approximation computed by the MST-based algorithm you will learn in Chapter 2. Watch how close the approximate tour is to optimal — and how much faster it was to compute.

TSP: Optimal Tour vs 2-Approximation

Blue = optimal tour (brute force). Orange = 2-approx via MST doubling. Click "New Cities" to regenerate. The status shows tour lengths and the actual ratio.

Click "New Cities" to generate a TSP instance.

Notice two things. First, the approximate tour is always within 2x of the optimal length — often much closer. Second, the brute-force optimal computation takes visibly longer even for just 8 cities. At 20 cities, brute force would take hours. At 50, it would outlast the heat death of the universe. The approximation runs in milliseconds regardless.

Key insight. The approximation ratio is a worst-case guarantee. On most real inputs, the approximate solution is much closer to optimal than the ratio would suggest. But having the guarantee is what lets you deploy with confidence.

Minimization vs Maximization

The definition of approximation ratio differs slightly depending on whether you are minimizing or maximizing. Let us be precise.

For a minimization problem (e.g., TSP, vertex cover, set cover): your algorithm produces a solution of cost C, the optimum is C*. The ratio is C / C*. Since C ≥ C*, the ratio is ≥ 1. A 2-approximation means C ≤ 2 · C*. You are paying at most twice the optimal cost.

For a maximization problem (e.g., MAX-SAT, MAX-CUT, knapsack): your algorithm produces value C, the optimum is C*. The ratio is C* / C. Since C ≤ C*, the ratio is ≥ 1. A 2-approximation means C ≥ C* / 2 — you are getting at least half the optimal value. Some authors instead define the ratio as C / C* ∈ (0, 1] for maximization and call it the "performance guarantee." We will use the ≥ 1 convention throughout, as CLRS does.

In both cases, a ratio closer to 1 is better. A ratio of 1 means exact. A ratio of ∞ means no useful guarantee.

What Makes a Good Lower Bound?

Every approximation proof has the same skeleton: (1) find a lower bound on OPT that you can compute efficiently, and (2) show your algorithm's cost is at most ρ times that lower bound. The art is finding the right lower bound.

Common lower bounds used in this chapter:

ProblemLower Bound on OPTWhy It Works
Vertex CoverSize of maximal matchingOPT must cover each matching edge, using distinct vertices
TSPMST costRemoving one edge from any tour gives a spanning tree
Set CoverLP relaxation valueRelaxing integrality can only decrease the optimum
KnapsackFractional knapsack valueAllowing fractional items can only increase the value
The lower-bound principle. You cannot compare your algorithm to OPT directly — you do not know OPT (computing it is NP-hard!). So you compare to a lower bound L ≤ OPT. If your cost C ≤ ρ · L, then C ≤ ρ · OPT. The lower bound is your proxy for the unknowable optimum.

The Landscape

Not all NP-hard problems are equally hard to approximate. Here is a preview of what we will learn:

ProblemBest ApproximationHardness
Vertex Cover2-approximationNo (2 − ε) unless UGC
TSP (with Δ-ineq)3/2-approximation (Christofides)No PTAS unless P=NP
TSP (general)No constant ratioNo ρ(n) for any computable ρ
Set Cover(ln n + 1)-approximationNo (1 − ε)ln n unless P=NP
KnapsackFPTAS: (1 + ε) for any εFully poly-time approx. scheme exists
MAX-3SAT7/8-approximationNo (7/8 + ε) unless P=NP

This table is the roadmap for the chapter. By the end, you will understand every entry: the algorithm, the proof, and why the hardness barrier exists.

The Three Pillars of Approximation

Every approximation algorithm in this chapter rests on one of three techniques. Understanding which technique to reach for is half the battle.

Pillar 1: Combinatorial algorithms. Directly construct a solution using a simple rule (greedy, matching, etc.) and prove its quality by comparing to a combinatorial lower bound. Examples: vertex cover via maximal matching, set cover via greedy, TSP via MST. These are the simplest to implement and the first thing to try.

Pillar 2: LP relaxation and rounding. Formulate the problem as an integer linear program, relax to a linear program (which gives a lower/upper bound), solve, and round the fractional solution to integers. The LP value serves as the bound on OPT. Examples: weighted vertex cover, MAX-SAT, set cover (the LP gives a tighter analysis than the combinatorial proof).

Pillar 3: Scaling and DP. Scale the input to reduce the problem size, solve the scaled problem exactly with DP, and bound the error from scaling. This is how FPTAS works for knapsack and other problems with pseudo-polynomial exact algorithms.

Choosing your weapon. If the problem has obvious greedy structure, start with Pillar 1. If you need a fractional lower bound or the problem has complex constraints, go to Pillar 2. If the problem already has a pseudo-polynomial DP, Pillar 3 gives you an FPTAS. Many problems can be attacked by multiple pillars — vertex cover has both a combinatorial 2-approximation and an LP-based 2-approximation.
Quick check: An algorithm for a minimization problem produces a solution of cost 150. The optimal cost is 100. What is the approximation ratio achieved on this instance?

Chapter 1: Vertex Cover 2-Approximation

In Chapter 0 we established the goal: polynomial-time algorithms with provable guarantees. Now let us see the simplest such algorithm — so simple it fits in five lines, yet its proof is surprisingly elegant.

The Vertex Cover Problem

Given an undirected graph G = (V, E), a vertex cover is a subset S ⊆ V such that every edge in E has at least one endpoint in S. The optimization problem: find the smallest vertex cover. This is NP-hard (it was one of Karp's original 21 NP-complete problems).

Think of it as a security problem. You have a building with hallways (edges) connecting rooms (vertices). You need to place security cameras in rooms such that every hallway is watched by at least one camera. You want to use as few cameras as possible.

Vertex cover is the canonical "easy to approximate, hard to solve exactly" problem. The 2-approximation is trivially simple (five lines of code). But finding the exact minimum cover is NP-hard. And improving the ratio below 2 is a 50-year open problem — one of the most famous in theoretical computer science.

Why Not Just Pick the Highest-Degree Vertex?

Your first instinct might be: "Pick the vertex with the most edges, add it to the cover, remove its edges, repeat." This greedy-by-degree strategy is intuitive and often works well in practice. But it is NOT a 2-approximation. On certain graphs (specifically, graphs constructed from set-cover-like instances), greedy-by-degree can produce covers of size Ω(ln n) · OPT. It degrades like set cover, not like vertex cover.

The correct algorithm is shockingly different — and counterintuitively wasteful.

The Algorithm: APPROX-VERTEX-COVER

Here is the entire algorithm:

Initialize
C = {} (empty cover), E' = copy of all edges
Pick Edge
Pick any edge (u, v) from E'
Add Both
Add BOTH u and v to cover C
Remove
Remove from E' all edges incident to u or v
↻ repeat while E' ≠ ∅

That is it. Pick an edge, add both endpoints, remove all now-covered edges, repeat. The set of picked edges forms a maximal matching — no two picked edges share an endpoint, and no remaining edge can extend the matching.

Why Is This a 2-Approximation?

Let A be the set of edges we picked. No two edges in A share an endpoint (we removed all incident edges after picking), so |A| edges contribute 2|A| vertices to our cover C. Thus |C| = 2|A|.

Now consider the optimal cover C*. For each edge (u, v) in A, at least one of u or v must be in C* (otherwise that edge would be uncovered). Here is the crucial observation: since no two edges in A share an endpoint (the matching property), the vertices forced into C* by different matching edges are all distinct. Edge (u1, v1) forces at least one of u1, v1 into C*. Edge (u2, v2) forces at least one of u2, v2. Since {u1, v1} and {u2, v2} are disjoint (no shared endpoints), these are different vertices being added to C*. Therefore |C*| ≥ |A|.

Combining: |C| = 2|A| ≤ 2|C*|. The approximation ratio is 2.

Read that proof again slowly. It has only four sentences and three inequalities, yet it establishes a result that has withstood 50 years of attempts at improvement. The simplicity of the proof matches the simplicity of the algorithm — this is a hallmark of beautiful mathematics.

The proof structure is called the lower-bound method: find something computable (|A|) that (a) bounds your cost from above: |C| = 2|A|, and (b) bounds OPT from below: |C*| ≥ |A|. The ratio ρ = upper/lower = 2|A| / |A| = 2. Every approximation proof in this chapter follows this template.

The lower-bound trick. The proof has two parts: (1) bound the algorithm's cost in terms of |A|, and (2) show OPT is at least |A|. This is the universal template for approximation proofs: find a lower bound on OPT, then show your algorithm's cost is at most ρ times that lower bound.

Implementation

python
def approx_vertex_cover(adj):
    # adj: dict of {node: set of neighbors}
    cover = set()
    edges = set()
    for u in adj:
        for v in adj[u]:
            if (u, v) not in edges and (v, u) not in edges:
                edges.add((u, v))
    remaining = set(edges)
    matching = []
    while remaining:
        u, v = remaining.pop()        # pick any edge
        cover.add(u)
        cover.add(v)                   # add BOTH endpoints
        matching.append((u, v))
        # remove edges incident to u or v
        remaining = {(a, b) for (a, b) in remaining
                     if a != u and a != v and b != u and b != v}
    return cover, matching

# Example: triangle with pendant
adj = {0: {1,2}, 1: {0,2,3}, 2: {0,1}, 3: {1}}
cover, matching = approx_vertex_cover(adj)
print("Cover:", cover)       # e.g., {0, 1, 2, 3} or {0, 1} + {2, 3}
print("Matching:", matching) # edges we picked
print("|C| =", len(cover), "<= 2 * OPT")

The algorithm runs in O(V + E) time. For each edge picked, we scan incident edges to remove them. The total work is proportional to the sum of degrees of picked vertices, which is at most 2|E|.

Worked Example: Hand Trace

Consider a graph with vertices {0, 1, 2, 3, 4, 5} and edges {(0,1), (0,2), (1,2), (1,3), (2,4), (3,4), (4,5)}. Let us trace APPROX-VERTEX-COVER step by step.

StepEdge PickedAdded to CoverEdges RemovedRemaining Edges
1(0, 1){0, 1}(0,1), (0,2), (1,2), (1,3){(2,4), (3,4), (4,5)}
2(2, 4){0, 1, 2, 4}(2,4), (3,4), (4,5){}

The algorithm picked 2 edges, producing a cover of size 4: {0, 1, 2, 4}. The matching has 2 edges, so OPT ≥ 2. In fact, OPT = 3 (the cover {1, 2, 4} works — verify: edge (0,1) is covered by 1, (0,2) by 2, (1,2) by both, (1,3) by 1, (2,4) by both, (3,4) by 4, (4,5) by 4). Our ratio on this instance: 4/3 ≈ 1.33, well within the guaranteed bound of 2.

Could we do worse? Yes. Consider a star graph: one central vertex connected to n leaves. The algorithm might pick any edge, adding both the center and a leaf (2 vertices). But OPT = 1 (just the center). As we add more leaves, the ratio approaches 2 but never exceeds it.

Tight example. The ratio of 2 is tight. For a perfect matching (n/2 disjoint edges), the algorithm adds all n vertices. But OPT is n/2 (one endpoint per edge suffices). The ratio is exactly 2. So the analysis is not loose — the algorithm really can be that bad, and there is no easy way to improve it.

Watch It Build

The canvas below shows the algorithm running step by step. Green nodes are in the cover. Red edges are the ones we picked (the maximal matching). Gray edges are already covered. Click "Step" to advance one edge pick at a time, or "Run" to animate.

Vertex Cover 2-Approximation — Step by Step

Green = in cover. Red edge = just picked. Gray = already covered. Watch both endpoints get added.

Click "New Graph" to generate a random graph, then "Step" or "Run".
Why both endpoints? Adding only one endpoint per edge (the "better" choice) would require solving the problem optimally — that is exactly what makes it NP-hard. Adding both is wasteful by at most a factor of 2, but it is computable in polynomial time. The magic: a factor of 2 is the best known polynomial-time guarantee for vertex cover.

Alternative Approach: LP Rounding

We can also get a 2-approximation via LP relaxation. Formulate vertex cover as an ILP: minimize ∑ xv subject to xu + xv ≥ 1 for each edge, xv ∈ {0, 1}. Relax to xv ∈ [0, 1]. Solve the LP. Round: if xv ≥ 0.5, include v in the cover.

Why is this valid? For each edge (u, v), the LP says xu + xv ≥ 1. So at least one of them is ≥ 0.5, and at least one endpoint is in the rounded cover. The cost? Each xv that rounds to 1 had LP value ≥ 0.5, so the rounded cost ≤ 2 × LP* ≤ 2 × OPT. Same ratio, different technique.

The LP approach is more general: it extends naturally to weighted vertex cover (where vertices have costs and you minimize total cost), while the matching-based approach handles only the unweighted case cleanly.

The Power of LP Duality

LP relaxation for approximation connects to LP duality, one of the deepest ideas in optimization. The dual of the vertex cover LP is the maximum matching LP. Strong duality says LP* = dual LP*. This means the LP-based 2-approximation is really saying: "the optimal vertex cover is at least as large as the maximum fractional matching, and our rounded solution is at most twice the fractional matching." The duality gives us the lower bound for free.

Many advanced approximation algorithms use the primal-dual method: simultaneously build a feasible primal solution (the cover) and a feasible dual solution (the matching/packing), and show their costs are within a factor of ρ. This is the workhorse technique behind approximation algorithms for network design, facility location, and Steiner tree problems.

Matching vs LP: Two Proofs, Same Ratio

It is instructive to see that both approaches give ratio 2 via different lower bounds on OPT. The matching approach uses |A| (matching size) as the lower bound. The LP approach uses LP* (fractional optimum). In general, LP* ≥ |A| (the LP is a tighter relaxation), so LP rounding can sometimes produce a smaller cover than the matching approach on the same instance. But both guarantee ratio 2.

Half-Integrality of Vertex Cover LP

A remarkable structural property: the vertex cover LP always has an optimal solution where every variable is 0, 1/2, or 1. This is called half-integrality. It means the LP relaxation is not as "continuous" as you might expect — it naturally gravitates toward half-integer values.

Why does this matter? It means the rounding step (threshold at 0.5) rounds every variable by at most 0.5, giving a cleaner analysis. It also means that if the LP happens to have an integer optimum (all 0s and 1s), the rounding is exact and the approximation matches OPT perfectly. This happens surprisingly often on structured graphs like bipartite graphs (where vertex cover is solvable in polynomial time via maximum matching + Konig's theorem).

Half-integrality is special to vertex cover. Most LP relaxations do not have this property, which is why rounding other LPs requires more sophisticated techniques (randomized rounding, iterative rounding, etc.).

Quick check: The APPROX-VERTEX-COVER algorithm picks 5 edges for the maximal matching. How large is the cover, and what can you say about the optimal cover?

Chapter 2: TSP Approximation

The Travelling Salesman Problem is perhaps the most famous NP-hard optimization problem. Given n cities and pairwise distances, find the shortest tour that visits every city exactly once and returns to the start. We cannot solve it exactly in polynomial time (unless P = NP). But under a natural assumption, we can approximate it well.

The Triangle Inequality Assumption

We say distances satisfy the triangle inequality if for all cities u, v, w:

d(u, w) ≤ d(u, v) + d(v, w)

This is just "the direct path is never longer than a detour." It holds for Euclidean distances, road networks, shortest-path distances in any graph, and most real-world settings. With this assumption, we get good approximations. Without it, as we will see in Chapter 7, TSP is inapproximable — no polynomial-time algorithm can achieve any constant ratio, or even any computable ratio.

Formally, a TSP instance satisfying the triangle inequality is called metric TSP. A special case is Euclidean TSP, where cities are points in Rd and distances are Euclidean (straight-line). Euclidean TSP admits even better approximations than general metric TSP — it has a PTAS (see Chapter 6).

Algorithm 1: MST-Based 2-Approximation

The idea is beautifully simple. A Hamiltonian cycle (tour) visits every vertex. If you remove one edge from the tour, you get a spanning tree. Therefore the cost of the minimum spanning tree (MST) is a lower bound on the optimal tour cost: MST ≤ OPT.

Now run the algorithm:

Step 1: MST
Compute the minimum spanning tree T of the complete graph
Step 2: Double
Double every edge in T to get a multigraph (every vertex now has even degree)
Step 3: Euler Tour
Find an Eulerian circuit in the multigraph (visits every edge once)
Step 4: Shortcut
Convert to Hamiltonian cycle by skipping already-visited vertices

The doubled MST has cost 2 · MST ≤ 2 · OPT. The shortcutting step can only decrease cost (by the triangle inequality, skipping a vertex via a direct edge is never longer). So the final tour costs at most 2 · OPT. We have a 2-approximation.

Algorithm 2: Christofides' 3/2-Approximation

Christofides (1976) improved the ratio to 3/2 with a clever observation. Instead of doubling all MST edges, he added only enough edges to make the MST Eulerian. A graph has an Eulerian circuit if and only if every vertex has even degree. The MST has some odd-degree vertices. We need to add edges that make all degrees even.

The key step: find a minimum-weight perfect matching on the odd-degree vertices. Adding these matching edges to the MST creates a multigraph where every vertex has even degree. Then find an Euler tour, shortcut, done.

Why 3/2? The MST costs at most OPT. The matching costs at most OPT/2 (we can show this by considering the optimal tour restricted to odd-degree vertices — it forms a Hamiltonian cycle on them, and the cheaper of its two perfect matchings costs at most OPT/2). Total: OPT + OPT/2 = 3/2 · OPT.

45 years and counting. Christofides' ratio of 3/2 stood as the best known approximation for metric TSP from 1976 until 2020, when Karlin, Klein, and Oveis Gharan improved it to 3/2 − 10−36. That is a vanishingly small improvement after 44 years. TSP is hard to approximate even to the constant we have.

Worked Example: MST-Based 2-Approximation

Consider 5 cities at coordinates: A=(0,0), B=(1,0), C=(2,0), D=(1,1), E=(0,1). The complete distance matrix (Euclidean):

ABCDE
A01.002.001.411.00
B1.0001.001.001.41
C2.001.0001.412.24
D1.411.001.4101.00
E1.001.412.241.000

Step 1: MST. Using Prim's from A: edges A-B (1.0), B-C (1.0), B-D (1.0), A-E (1.0). MST cost = 4.0.

Step 2: Double. Every MST edge appears twice. Total cost = 8.0. Every vertex now has even degree.

Step 3: Euler tour. One possible tour: A → B → C → B → D → B → A → E → A. (Visits every edge twice.)

Step 4: Shortcut. DFS preorder on MST (equivalent): A → B → C → D → E → A. Cost: 1.0 + 1.0 + 1.41 + 1.0 + 1.0 = 5.41.

Check: Tour cost 5.41 ≤ 2 × MST cost 4.0 = 8.0. And the optimal tour is A → B → C → D → E → A = 5.41 (this happens to be optimal for this configuration). Ratio: 5.41/5.41 = 1.0 on this instance. We got lucky — on this symmetric input, the approximation equals the optimum.

Why the Triangle Inequality Is Essential

The shortcutting step — replacing a path A → B → C with a direct edge A → C — relies on the triangle inequality: d(A,C) ≤ d(A,B) + d(B,C). If this fails, the shortcut could be longer than the original path, and the 2× guarantee collapses.

Concrete counterexample: three cities A, B, C with d(A,B) = 1, d(B,C) = 1, d(A,C) = 1000. The triangle inequality is violated (1000 > 1 + 1). The MST is A-B-C, cost 2. The doubled MST tour A-B-C-B-A costs 4. But shortcutting to A-B-C-A costs 1 + 1 + 1000 = 1002. The optimal tour is A-B-C-B-A = 4. The shortcut made things catastrophically worse.

This is not just a toy example. In real networks, triangle inequality violations are common when distances represent costs (not physical distance). For instance, a direct flight from A to C might cost $1000 while connecting through B costs $50 + $50 = $100. The "shortcut" (direct flight) is far more expensive than the "detour" (connecting flight). Any TSP algorithm assuming triangle inequality would give garbage results on airline pricing data.

Always check your assumptions. Before applying the MST-based TSP approximation, verify that your distances satisfy the triangle inequality. For Euclidean distances (straight-line on a map): always holds. For shortest-path distances in a graph: always holds. For arbitrary weights, costs, or latencies: often does NOT hold. In the non-metric case, your only options are heuristics without guarantees or exact solvers for small instances.

Watch the MST-to-Tour Conversion

The canvas below shows the MST approach step by step. First the MST (green edges), then the doubled edges (dashed), then the Euler tour, then the final shortcut tour (orange).

TSP: MST → Double → Euler Tour → Shortcut

Step through the MST-based 2-approximation for TSP. Green = MST. Orange = final Hamiltonian tour.

Click "New Cities" to generate a TSP instance, then step through.
python
import heapq
from itertools import combinations

def tsp_mst_approx(cities):
    """2-approx for metric TSP via MST doubling + shortcut."""
    n = len(cities)
    # Prim's MST
    dist = lambda i,j: ((cities[i][0]-cities[j][0])**2+(cities[i][1]-cities[j][1])**2)**0.5
    in_mst = [False]*n
    adj = [[] for _ in range(n)]
    heap = [(0, 0, -1)]  # (cost, node, parent)
    mst_cost = 0
    while heap:
        c, u, p = heapq.heappop(heap)
        if in_mst[u]: continue
        in_mst[u] = True; mst_cost += c
        if p >= 0:
            adj[u].append(p); adj[p].append(u)
        for v in range(n):
            if not in_mst[v]:
                heapq.heappush(heap, (dist(u,v), v, u))

    # DFS preorder traversal = shortcut of doubled MST Euler tour
    visited = [False]*n
    tour = []
    def dfs(u):
        visited[u] = True; tour.append(u)
        for v in adj[u]:
            if not visited[v]: dfs(v)
    dfs(0)
    tour.append(tour[0])  # return to start

    tour_cost = sum(dist(tour[i], tour[i+1]) for i in range(len(tour)-1))
    return tour, tour_cost, mst_cost
Shortcutting = DFS preorder. Instead of literally doubling edges and finding an Euler tour, we can just do a DFS preorder traversal of the MST. The preorder visits each vertex exactly once, skipping over previously visited ones — which is exactly what the shortcutting step does. Same result, simpler code.

Christofides' Algorithm: The Full Picture

Christofides improves the ratio from 2 to 3/2 by being smarter about which edges to add. Instead of doubling ALL MST edges (wasteful), add only enough edges to make the MST Eulerian. The key insight: a multigraph has an Eulerian circuit if and only if every vertex has even degree.

The MST has some vertices with odd degree. By the handshaking lemma, the number of odd-degree vertices is even. Find a minimum-weight perfect matching on these odd-degree vertices (this can be done in O(n3) time using Edmonds' algorithm). Add the matching edges to the MST. Now every vertex has even degree. Find an Euler tour, shortcut, done.

The cost analysis:

Cost(MST) ≤ OPT (removing one tour edge gives a spanning tree)
Cost(Matching) ≤ OPT/2 (see proof below)
Cost(Christofides tour) ≤ Cost(MST) + Cost(Matching) ≤ 3/2 · OPT

Why is the matching cost at most OPT/2? Consider the optimal tour restricted to the odd-degree vertices (in MST order). This restricted tour visits all odd-degree vertices and forms a Hamiltonian cycle on them. This cycle can be decomposed into two perfect matchings (alternate edges). The cheaper matching costs at most half the cycle cost, which is at most OPT/2 (since the cycle uses only edges of the optimal tour, and the triangle inequality means shortcuts do not increase cost).

Comparison of TSP Approaches

AlgorithmRatioTimeKey Idea
Nearest neighborO(log n)O(n2)Greedy, no guarantee better than log n
MST doubling2O(n2)Double MST edges, Euler tour, shortcut
Christofides3/2O(n3)MST + min matching on odd-degree vertices
Arora PTAS1 + εO(n logO(1/ε) n)Recursive dissection (Euclidean only)

In practice, Christofides combined with 2-opt local search (repeatedly swap two edges if it shortens the tour) produces tours within 1-5% of optimal on most real-world instances. The approximation algorithm provides the starting point; local search refines it.

The 2-Opt Improvement Heuristic

After computing an approximate tour, a simple local search improves it: pick two edges, remove them (splitting the tour into two paths), and reconnect in the only other possible way. If the new tour is shorter, keep it. Repeat until no improving swap exists. This is 2-opt, and it typically improves a Christofides tour by 5-15% on real instances.

python
def two_opt(tour, dist_fn):
    """Improve a tour using 2-opt swaps. Returns improved tour."""
    n = len(tour) - 1  # tour[0] == tour[n]
    improved = True
    while improved:
        improved = False
        for i in range(1, n - 1):
            for j in range(i + 1, n):
                # Cost of removing edges (i-1,i) and (j,j+1)
                # vs adding edges (i-1,j) and (i,j+1)
                d1 = dist_fn(tour[i-1], tour[i]) + dist_fn(tour[j], tour[j+1])
                d2 = dist_fn(tour[i-1], tour[j]) + dist_fn(tour[i], tour[j+1])
                if d2 < d1:
                    tour[i:j+1] = tour[i:j+1][::-1]  # reverse segment
                    improved = True
    return tour

2-opt has no approximation guarantee (it can get stuck in local optima), but starting from a Christofides tour gives it a strong head start. The combination of "provably good start + local refinement" is the industry standard for TSP and vehicle routing.

Quick check: Why does the MST-based TSP approximation fail without the triangle inequality?

Chapter 3: Set Cover

You run a radio station and want every town in your region to receive your signal. You have a menu of transmitter locations, each covering a different set of towns. Each transmitter costs money to build. You want to cover all towns at minimum cost. This is the Set Cover problem.

Formal Definition

Given a universe U of n elements and a collection S = {S1, S2, ..., Sm} of subsets of U, find the smallest sub-collection C ⊆ S such that ∪S ∈ C S = U. (The weighted version assigns costs to sets and minimizes total cost.)

Set cover is NP-hard (reducible from vertex cover). But the greedy algorithm gives a surprisingly good approximation — and it is essentially the best possible.

Why Set Cover Matters

Set cover is a "meta-problem" — dozens of practical problems are set cover in disguise:

Real ProblemUniverse (elements)Sets
Cell tower placementHouses to coverCoverage area of each tower
Test suite selectionCode paths to testCode paths covered by each test
Hiring a teamSkills neededSkills of each candidate
Feature selectionData points to classifyPoints correctly classified by each feature
Server placementUsers to serveUsers reachable from each location

If you can model your problem as set cover, you immediately get the O(ln n) greedy approximation with zero algorithmic creativity required. This is the power of reduction: one algorithm solves dozens of problems.

The connection to vertex cover is also worth noting. Minimum vertex cover is a special case of set cover where each set contains exactly the edges incident to a vertex. The greedy set cover algorithm applied to vertex cover gives an O(ln Δ) ratio (where Δ is the max degree). For dense graphs this can be worse than the simple 2-approximation, but for sparse graphs it can be better. Knowing both algorithms lets you pick the better one for your instance.

The Greedy Algorithm

The greedy strategy is obvious: at each step, pick the set that covers the most uncovered elements. Repeat until everything is covered.

python
def greedy_set_cover(universe, sets):
    """Greedy set cover: always pick the set covering the most uncovered elements."""
    uncovered = set(universe)
    chosen = []
    available = list(range(len(sets)))
    while uncovered:
        # Pick set with maximum intersection with uncovered
        best = max(available, key=lambda i: len(sets[i] & uncovered))
        newly = sets[best] & uncovered
        if not newly: break  # no progress possible
        chosen.append(best)
        uncovered -= newly
        available.remove(best)
    return chosen

# Example
U = {1,2,3,4,5,6,7,8,9,10}
S = [{1,2,3,4}, {3,4,5,6}, {5,6,7,8}, {7,8,9,10}, {1,5,9}]
result = greedy_set_cover(U, S)
print("Sets chosen:", result)  # e.g., [0, 2, 3] (covers all 10 elements with 3 sets)

Approximation Ratio: H(n) = ln n + 1

The greedy algorithm uses at most H(n) · OPT sets, where H(n) = 1 + 1/2 + 1/3 + ... + 1/n is the n-th harmonic number. Since H(n) ≈ ln n + 0.577..., this is an O(ln n)-approximation.

Here is the proof idea, using the elegant charging argument. Assign a "charge" to each element: when element e is newly covered by the greedy pick, charge it 1/(number of newly covered elements in this step). The total charge across all elements equals the number of sets picked by greedy. Now show that the charge allocated to elements in any single optimal set Sj is at most H(|Sj|) ≤ H(n). Since OPT sets cover all elements, the total greedy cost is at most H(n) · OPT.

Let us trace the charge calculation for a single optimal set Sj of size k. At the time of the greedy algorithm's first pick, suppose Sj still has all k elements uncovered. The greedy algorithm chose the set covering the MOST uncovered elements — so the chosen set covers at least k elements (it covers at least as many as Sj would). The charge per newly-covered element is at most 1/k. After some Sj-elements get covered by greedy picks, Sj has fewer remaining uncovered elements. At each subsequent step, the greedy set covers at least as many uncovered elements as Sj's remaining count. The charges accumulate: at most 1/k + 1/(k-1) + ... + 1/1 = H(k) ≤ H(n).

The harmonic series connection. H(n) = 1 + 1/2 + 1/3 + ... + 1/n ≈ ln n. This is why the ratio is logarithmic. And this is essentially tight: there exist instances where greedy uses Ω(ln n) · OPT sets. So the analysis is not loose — the algorithm really can perform this badly.

Worked Example: Greedy Set Cover

Universe U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. Sets and their sizes:

SetElementsSize
S1{1, 2, 3, 4, 5, 6}6
S2{5, 6, 7, 8, 9, 10}6
S3{1, 2, 3, 7, 8, 9}6
S4{10, 11, 12}3
S5{4, 5, 11}3

Step 1: All three 6-element sets cover the same number of uncovered elements. Greedy picks any — say S1. Covered: {1,2,3,4,5,6}. Uncovered: {7,8,9,10,11,12}.

Step 2: S2 covers {7,8,9,10} = 4 new elements. S3 covers {7,8,9} = 3. S4 covers {10,11,12} = 3. S5 covers {11} = 1. Greedy picks S2. Uncovered: {11,12}.

Step 3: S4 covers {11,12} = 2. S5 covers {11} = 1. Greedy picks S4. All covered.

Result: 3 sets {S1, S2, S4}. The optimal solution also uses 3 sets (e.g., {S3, S2, S5} does not cover 4 or 12, so try {S1, S2, S4} — yes, 3 is optimal). Greedy matched OPT here. It often does for small instances. The logarithmic gap shows up on adversarial constructions designed to exploit the greedy heuristic.

The bound guarantees at most H(12) · 3 ≈ 3.1 × 3 ≈ 9.3 sets in the worst case. So using only 3 is dramatically better than the theoretical ceiling. This is typical: the H(n) ratio is a worst-case bound for adversarial inputs, and real-world instances rarely approach it.

The Tight Example: Where Greedy Struggles

Here is the classic adversarial construction. Take n = 2k elements. Create one "big" set of size n. Then create sets of size n/2, n/4, n/8, ..., 1 that together also cover all elements. The optimal solution uses the one big set (OPT = 1). But greedy, if it picks the half-sized sets first, might use O(log n) sets. This is the worst case, and it achieves the H(n) bound.

Concrete instance (n = 8):

SetElementsSize
SBIG{1, 2, 3, 4, 5, 6, 7, 8}8
SA{1, 2, 3, 4}4
SB{5, 6, 7, 8}4
SC{1, 2, 5, 6}4
SD{3, 4, 7, 8}4

Greedy sees four sets of size 4 tied with the big set of size 8... wait, the big set has size 8 and would be picked first. So the adversary needs to be cleverer. The real construction has the big set cost 1 and many small sets also cost 1. In the weighted version, assign cost 1 to SBIG and cost 1 to a collection of log n overlapping sets of various sizes. Greedy picks the log n small sets (each covering the most uncovered elements at cost 1), paying log n total, while OPT picks SBIG for cost 1.

Weighted Set Cover: The Greedy Rule Changes

For weighted set cover, greedy does not pick the set covering the most elements. Instead, it picks the set with the best cost-effectiveness: the set Si minimizing cost(Si) / |Si ∩ uncovered|. This "bang for the buck" greedy still achieves the H(n) ratio. The proof follows the same charging argument but with costs instead of counts.

This is exactly the principle behind coupon collectors, ad placement, and A/B test selection in tech companies. When you have a budget and want maximum "coverage" (user reach, feature testing, code path coverage), the cost-effective greedy is your go-to algorithm. It is also the algorithm behind the classical "coupon collector's problem" bound: to collect n distinct coupons by buying random ones, you need about n · H(n) ≈ n ln n purchases on average. The H(n) factor appears everywhere.

Set cover in interviews. If an interviewer describes a problem where you need to "cover" everything using a minimum number of "resources," it is almost certainly set cover. The answer is always: "Greedy by most uncovered elements. O(ln n) ratio. This is tight — you provably cannot do better in polynomial time."
python
def greedy_weighted_set_cover(universe, sets, costs):
    """Greedy weighted set cover: pick most cost-effective set each round."""
    uncovered = set(universe)
    chosen = []
    total_cost = 0
    available = list(range(len(sets)))
    while uncovered:
        # Cost per newly covered element
        best = min(
            available,
            key=lambda i: costs[i] / max(1, len(sets[i] & uncovered))
        )
        newly = sets[best] & uncovered
        if not newly: break
        chosen.append(best)
        total_cost += costs[best]
        uncovered -= newly
        available.remove(best)
    return chosen, total_cost

Animated Greedy Set Cover

The canvas below shows elements as dots and sets as colored regions. At each step, the greedy algorithm picks the set covering the most gray (uncovered) elements, turning them to the set's color. Watch how coverage accelerates with each pick.

Greedy Set Cover — Step by Step

Gray dots = uncovered. Colored dots = covered. Each step picks the set covering the most gray dots.

Click "New Instance" to generate a set cover instance.
Why not better? Dinur and Steurer (2014) proved that no polynomial-time algorithm can achieve a (1 − ε)ln n ratio for set cover unless P = NP. The greedy algorithm is essentially optimal among polynomial-time algorithms. Logarithmic is both the best you can do and the best you need to do.
Quick check: You have 100 elements and the optimal set cover uses 5 sets. The greedy algorithm is guaranteed to use at most how many sets?

Chapter 4: Randomized Rounding

So far our approximation algorithms have been combinatorial — direct greedy constructions. Now we introduce a radically different technique: solve a relaxed version of the problem (allowing fractional solutions), then round the fractions to integers. This is LP relaxation and randomized rounding.

The Idea: Relax, Then Round

Many NP-hard problems can be formulated as integer linear programs (ILPs): minimize a linear objective subject to linear constraints, where variables must be 0 or 1. The NP-hardness comes from the integrality constraint. If we drop it — allowing variables to be any real number in [0, 1] — we get a linear program (LP) that can be solved in polynomial time.

The LP solution is a lower bound on the optimal integer solution (for minimization). The question is: how do we convert the fractional LP solution into a valid integer solution without losing too much?

Think of it this way. Imagine you are packing a truck. The optimal integer solution says "take item A or B, not both." The LP relaxation says "take 0.7 of A and 0.3 of B." That is not physically possible — you cannot take 70% of a couch. But the fractional solution tells you something: A is probably more important than B. Now you need a rounding strategy.

Example: Weighted Vertex Cover via LP

The vertex cover problem as an ILP:

minimize ∑v w(v) · xv
subject to: xu + xv ≥ 1 for each edge (u,v)
            xv ∈ {0, 1} for each v

Relax xv ∈ {0, 1} to xv ∈ [0, 1]. Solve the LP. Now round: set xv = 1 if the LP value is ≥ 0.5, else xv = 0.

This is a valid cover: for each edge (u, v), the LP constraint says xu + xv ≥ 1, so at least one of them is ≥ 0.5, so at least one gets rounded to 1. The cost? Each variable at most doubles (from 0.5 to 1), so the rounded cost ≤ 2 · LP* ≤ 2 · OPT. Another 2-approximation for vertex cover, from a completely different angle.

Worked Example: LP Rounding for Vertex Cover

Consider a triangle graph: vertices {A, B, C}, edges {(A,B), (B,C), (A,C)}. The ILP formulation:

minimize xA + xB + xC
xA + xB ≥ 1 (edge A-B)
xB + xC ≥ 1 (edge B-C)
xA + xC ≥ 1 (edge A-C)
xA, xB, xC ∈ [0, 1]

The LP optimum: xA = xB = xC = 0.5. Cost = 1.5. (Verify: each constraint 0.5 + 0.5 = 1 ≥ 1. No feasible solution has lower cost — if any variable is below 0.5, one of its two constraints forces the other endpoint above 0.5.)

Rounding at 0.5: all three round to 1. Rounded cover = {A, B, C}. Cost = 3.

Optimal integer solution: any 2 of the 3 vertices (say {A, B}). Cost = 2.

Ratio: 3/2 = 1.5 on this instance. Within the guaranteed bound of 2. The LP* = 1.5 was a better lower bound than the matching bound (1 edge ⇒ OPT ≥ 1), which is why LP rounding can sometimes give tighter analysis.

Randomized Rounding for MAX-SAT

For maximization problems, randomized rounding shines. Consider MAX-SAT: given a boolean formula in CNF, find an assignment that satisfies as many clauses as possible.

LP relaxation: for each variable xi, let yi ∈ [0, 1] represent the "probability" of setting xi = true. For each clause Cj, let zj ∈ [0, 1] represent the probability of satisfying it. Maximize ∑ zj subject to constraints that link zj to the yi's in clause j.

After solving the LP, round randomly: set xi = true with probability yi (the LP value). Each clause Cj with k literals is satisfied with probability at least 1 − ∏(1 − yi) for positive literals and ∏ yi for negative literals. Using the AM-GM inequality and the LP constraint, this probability is at least (1 − (1 − zj/k)k). Since (1 − 1/k)k ≤ 1/e, the probability is at least (1 − 1/e) · zj ≈ 0.632 · zj. Summing over all clauses: the expected number of satisfied clauses is at least (1 − 1/e) times the LP optimum.

For clauses of size exactly k = 1, randomized rounding gives probability yi = zj (perfect). For k = 2, it gives at least (3/4) · zj. The overall guarantee of 3/4 comes from combining LP rounding with random assignment (see below).

The power of randomness. Deterministic rounding (threshold at 0.5) works for vertex cover because the constraint structure is simple. For MAX-SAT, the constraints are more complex, and randomized rounding gives a better ratio. The expected value of the random solution is provably good, and derandomization techniques can make it deterministic.

Why 1 − (1 − 1/k)k ≥ 1 − 1/e?

This bound appears repeatedly in randomized rounding. Suppose a clause has k literals, each set to true independently with probability at least 1/k. The probability the clause is unsatisfied is at most (1 − 1/k)k. By calculus, (1 − 1/k)k is an increasing function of k that converges to 1/e ≈ 0.368 as k → ∞. So the probability of satisfaction is at least 1 − 1/e ≈ 0.632.

For k = 1 (unit clause): probability = 1/1 = 1.0. For k = 2: 1 − (1/2)2 = 3/4 = 0.75. For k = 3: 1 − (2/3)3 ≈ 0.704. For large k: approaches 0.632. The worst case is large clauses, where randomized rounding still gives a constant-factor guarantee.

Combining Randomized Rounding with Random Assignment

A remarkable result: if you take the better of (a) randomized rounding using LP values and (b) a completely random assignment (each variable true with probability 1/2), the expected number of satisfied clauses is at least (3/4) · OPT. This is because LP rounding performs well on large clauses, and random assignment performs well on small clauses. Their maximum always achieves the 3/4 bound.

Let us verify with a concrete example. Consider the clause (x1 ∨ x2 ∨ x3). Random assignment satisfies it with probability 1 − (1/2)3 = 7/8. LP rounding (assuming each variable has LP value ≥ 1/3 from the clause constraint): probability ≥ 1 − (2/3)3 = 19/27 ≈ 0.704. Random assignment wins for this 3-literal clause.

Now consider a unit clause (x1). Random assignment satisfies it with probability 1/2. LP rounding (LP value = 1 from the constraint x1 ≥ 1): probability 1.0. LP rounding wins for unit clauses.

The combined algorithm takes the better of the two on each instance, achieving 3/4 overall. This 3/4 ratio for MAX-SAT is tight: Hastad proved that no polynomial-time algorithm can do better than 7/8 for MAX-3SAT (where every clause has exactly 3 literals), and the general MAX-SAT ratio of 3/4 is also tight.

Derandomization: Method of Conditional Expectations

Randomized rounding gives an expected approximation ratio. Can we make it deterministic? Yes, via the method of conditional expectations. Instead of flipping coins, set each variable greedily to maximize the conditional expectation of the objective. At each step, compute E[satisfied clauses | x1=T, ..., xi=v] for v ∈ {T, F} and pick the better one. This deterministic algorithm achieves the same guarantee as the randomized version, no luck required.

The Integrality Gap

The ratio between the optimal integer solution and the optimal LP relaxation is called the integrality gap. For vertex cover, the integrality gap is exactly 2 (achieved by the complete graph Kn for odd n, where the LP sets every xv = 0.5 for a cost of n/2, but the optimal integer cover needs n − 1 vertices). This means LP-rounding cannot give a ratio better than 2 for vertex cover — the LP itself is too loose.

To beat ratio 2 for vertex cover (if possible at all), one would need a technique that goes beyond LP relaxation — perhaps semidefinite programming (SDP) or a cleverer combinatorial approach. No one has succeeded in 50+ years of trying.

LP Relaxation Visualized

The canvas below shows a small vertex cover instance. Left: the fractional LP solution (each vertex shows its LP value, circle radius proportional to xv). Right: the rounded integer solution. Vertices with LP value ≥ 0.5 become solid (in the cover). Notice how the LP "spreads" the cover across vertices, and rounding concentrates it.

LP Relaxation → Rounding for Vertex Cover

Left: fractional LP solution (circle size = LP value). Right: rounded integer solution (solid = in cover). Edge constraints shown.

Click "New Graph" then "Solve LP & Round".
python
def lp_vertex_cover_round(n, edges, weights=None):
    """LP relaxation + deterministic rounding for weighted vertex cover."""
    if weights is None: weights = [1]*n
    # In practice, use scipy.optimize.linprog or PuLP
    # Here we simulate the LP solution for illustration
    # Real LP: minimize sum(w[v]*x[v]), s.t. x[u]+x[v]>=1 for (u,v), 0<=x[v]<=1

    # Simple approximation: solve LP, round at 0.5 threshold
    lp_vals = solve_lp(n, edges, weights)  # returns list of floats in [0,1]
    cover = [v for v in range(n) if lp_vals[v] >= 0.5]

    # Verify: every edge is covered
    for u, v in edges:
        assert u in cover or v in cover
    return cover, lp_vals
Quick check: An LP relaxation for a minimization problem gives an optimal value of 42. The true integer optimum is unknown. What can you conclude?

Chapter 5: Approximation Showcase

This is the payoff chapter. Below is a unified simulation where you can explore three NP-hard problems and their approximation algorithms side by side. For small instances, we compute the exact optimum via brute force so you can see the actual approximation ratio.

Approximation Algorithm Showcase

Choose a problem. Generate an instance. Run the approximation and see how it compares to optimal.

Vertices 8
Select a problem and click "New" to generate an instance.

Play with the sliders. Notice how the approximation ratio stays within its theoretical bound on every instance, but is usually much better. Try increasing the problem size — at some point "Compute Optimal" becomes noticeably slow (it is exponential), while the approximation stays instant.

What to Look For

As you experiment, test these predictions from the theory:

Vertex Cover: The approximation ratio should always be ≤ 2. On most random instances, it will be closer to 1.3-1.5. The worst case (ratio approaching 2) occurs on graphs that look like a disjoint union of edges — a perfect matching with no other edges. On such graphs, the algorithm adds both endpoints of every edge (2n vertices) while the optimal solution needs only n vertices (one endpoint per edge). Try generating many graphs to find one where the ratio exceeds 1.5 — it requires specific structure.

TSP: The MST-based 2-approximation should always produce a tour ≤ 2× optimal. Christofides would give ≤ 1.5×, but is harder to implement (we use the simpler MST approach here). On random city placements, expect ratios around 1.1-1.3.

Set Cover: The greedy algorithm should use at most (ln n + 1) × OPT sets. For n = 10, that is about 3.3 × OPT. For n = 16, about 3.8 × OPT. On random instances, greedy typically matches or nearly matches OPT. The logarithmic gap only appears on adversarially constructed instances.

The exponential wall. At 10 vertices, brute-force vertex cover checks 1024 subsets. At 20, it checks over a million. At 30, over a billion. The 2-approximation? O(V + E) regardless. This is why approximation algorithms matter: they run in time you can actually afford.
Experiment idea. Set the vertex count to 12 for vertex cover. Click "New Graph" several times and run both the approximation and the optimal. Record the ratio each time. You will find that the average ratio is well below 2, but every single instance satisfies the bound. This is the power of worst-case guarantees: they hold on EVERY input, not just on average.

Understanding the Color Coding

In the vertex cover visualization, the colors tell a story:

Orange nodes are in the approximate cover but NOT in the optimal cover. These are the "wasted" vertices — the price of using a simple algorithm. The 2-approximation can include up to twice as many vertices as needed, and the extras show up as orange.

Teal nodes are in the optimal cover but NOT in the approximate cover. These represent opportunities the approximation "missed" — vertices that a smarter algorithm would have included instead of some orange ones.

Yellow nodes are in BOTH covers. These are the essential vertices — both the optimal and approximate algorithms agree they must be included.

The ideal approximation would show all yellow (perfect overlap with optimal). In practice, you will see a mix, with the approximate cover always being at least as large as optimal and at most twice as large.

Timing the Exponential Wall

Try this experiment with TSP. Set cities to 7 and click "Compute Optimal" — it is instant. Set to 8 — still fast. Set to 9 — a slight pause. Set to 10 — noticeable delay. Each additional city multiplies the brute-force time by roughly n (since we iterate over permutations). The MST approximation? Instant at every size. This is the practical argument for approximation algorithms.

Here are the exact counts to make the exponential growth visceral. For n cities, brute-force TSP examines (n−1)! permutations (fixing the start city). The MST approximation does O(n2) work regardless:

Cities (n)PermutationsBrute Force TimeMST Approx Time
7720< 1ms< 1ms
85,040~1ms< 1ms
940,320~10ms< 1ms
10362,880~100ms< 1ms
1587 billion~hours< 1ms
201.2 × 1017~centuries< 1ms

The approximation algorithm's O(n2) time barely changes as n grows. The exact algorithm's factorial time explodes beyond any practical limit. At n = 20, the approximation gives a provably-within-2x answer in microseconds, while the exact answer would take longer than the age of the universe. This is the fundamental motivation for approximation algorithms.

Chapter 6: PTAS and FPTAS

Some NP-hard problems admit arbitrarily good approximations. You get to choose how close to optimal you want to be — at the cost of more computation time. This is the territory of polynomial-time approximation schemes.

PTAS: Polynomial-Time Approximation Scheme

A PTAS for a problem is a family of algorithms: for every fixed ε > 0, there is a (1 + ε)-approximation algorithm that runs in polynomial time in n. The catch: "polynomial in n" can mean O(n1/ε), which for small ε is a very large polynomial. The running time is polynomial in n but can be exponential in 1/ε.

Example: Euclidean TSP. Arora (1996) and Mitchell (1999) showed that Euclidean TSP has a PTAS: for any ε > 0, you can find a tour within (1 + ε) of optimal in O(n(log n)O(1/ε)) time. Want a tour within 1% of optimal? Set ε = 0.01 and wait. It is polynomial in n, but the constant is astronomical.

Another example: the subset sum problem. Given integers and a target T, find a subset with sum as close to T as possible without exceeding it. The DP runs in O(nT) which is pseudo-polynomial. The PTAS scales T by a factor of ε/n, reducing the DP table to O(n2/ε) entries. The result is within (1 − ε) of the optimal sum.

The Running Time Distinction

The critical distinction between PTAS and FPTAS is subtle but important. Consider the running time as a function of n and ε:

SchemeTime (example)ε = 0.1ε = 0.01ε = 0.001
PTASO(n1/ε)O(n10)O(n100)O(n1000)
FPTASO(n2/ε)O(10n2)O(100n2)O(1000n2)

The PTAS time explodes exponentially as ε shrinks — n100 is useless even for n = 2. The FPTAS time grows only linearly with 1/ε, remaining practical for small ε. This is why FPTAS is a much stronger guarantee.

FPTAS: Fully Polynomial-Time Approximation Scheme

An FPTAS is a PTAS where the running time is polynomial in both n and 1/ε. This is the strongest type of approximation guarantee short of an exact polynomial-time algorithm (which would require P = NP for NP-hard problems). With an FPTAS, you get a smooth, efficient knob: turn ε down for better quality, up for faster runtime, and the cost of turning the knob is itself polynomial.

Not every problem with a PTAS has an FPTAS. In fact, if a problem is strongly NP-hard (NP-hard even when all numbers in the input are polynomially bounded), then it cannot have an FPTAS unless P = NP. This is because an FPTAS on a strongly NP-hard problem would give an exact algorithm in polynomial time (set ε = 1/(nW) where W is the max value). The 3-partition problem, bin packing, and multiprocessor scheduling are strongly NP-hard and therefore have no FPTAS.

The canonical example is the 0/1 Knapsack problem. Recall that the DP solution for knapsack runs in O(nW) time, which is pseudo-polynomial — polynomial in the numeric value of W, but exponential in the number of bits needed to represent W. If W = 264, the DP table has 264 entries, which is infeasible even on modern hardware.

The FPTAS cleverly reduces the DP table size by rounding the input values. The idea: we do not need exact values; we just need to get within (1 − ε) of optimal. By coarsening the values, we shrink the DP table from O(nW) to O(n2/ε) — polynomial in both n and 1/ε.

The FPTAS works by rounding the values:

Scale
Let K = ε · vmax / n. Round each value: v'i = ⌊vi / K⌋
Solve
Run DP on the rounded instance. Max rounded value is n/ε, so DP is O(n2/ε)
Return
The selected items form the approximate solution. Error from rounding ≤ ε · OPT
python
def knapsack_fptas(values, weights, capacity, epsilon):
    """FPTAS for 0/1 Knapsack: (1-epsilon)-approximation."""
    n = len(values)
    v_max = max(values)
    K = epsilon * v_max / n   # scaling factor
    if K == 0: K = 1

    # Round values
    scaled = [int(v / K) for v in values]
    V = sum(scaled)

    # DP on scaled values: dp[j] = min weight to achieve value j
    INF = float('inf')
    dp = [INF] * (V + 1)
    dp[0] = 0
    keep = [[False]*(V+1) for _ in range(n)]

    for i in range(n):
        for j in range(V, scaled[i]-1, -1):
            if dp[j - scaled[i]] + weights[i] < dp[j]:
                dp[j] = dp[j - scaled[i]] + weights[i]
                keep[i][j] = True

    # Find best achievable value within capacity
    best_j = max(j for j in range(V+1) if dp[j] <= capacity)

    # Backtrack to find items
    items = []
    j = best_j
    for i in range(n-1, -1, -1):
        if keep[i][j]:
            items.append(i)
            j -= scaled[i]

    return items, sum(values[i] for i in items)
    # Running time: O(n^2 / epsilon) — polynomial in BOTH n and 1/epsilon

PTAS vs FPTAS: The Quality-Speed Tradeoff

The canvas below lets you vary ε and see how it affects the knapsack FPTAS. Smaller ε means a better approximation but more computation. Watch the number of DP table entries grow as you decrease ε.

Knapsack FPTAS: ε vs Quality vs Speed

Drag the ε slider. The bar chart shows: actual value achieved (orange) vs optimal (teal). The counter shows DP table size.

ε 0.30
Adjust ε to see the quality-speed tradeoff.

Why the Knapsack FPTAS Works: The Rounding Error

The key insight is that rounding values by a factor K introduces at most K error per item. With at most n items in the solution, the total error is at most nK. We chose K = ε · vmax / n, so the total error is at most n · (ε · vmax / n) = ε · vmax ≤ ε · OPT (since OPT ≥ vmax for any valid solution containing the most valuable item, or else we can check single items separately).

Therefore the FPTAS value ≥ OPT − ε · OPT = (1 − ε) · OPT. The running time is O(n · V') where V' = ∑ ⌊vi / K⌋ ≤ n · (vmax / K) = n · (n / ε) = n2 / ε. This is polynomial in both n and 1/ε — the definition of FPTAS.

Concrete Numbers

Suppose you have 100 items with values up to 1000, and you want a solution within 5% of optimal (ε = 0.05). The scaling factor K = 0.05 × 1000 / 100 = 0.5. Each rounded value is at most 1000/0.5 = 2000. The DP table has 100 × 200,000 = 20 million entries. That is large but polynomial and easily fits in memory. An exact DP with values up to 100,000 total would have 10 million entries — not much smaller. The FPTAS pays a moderate cost for its guaranteed approximation quality.

Now set ε = 0.001 (0.1% accuracy). K = 0.01. DP table: 100 × 10,000,000 = 1 billion entries. Still polynomial, but now impractical. This illustrates the FPTAS tradeoff: as you demand more accuracy, computation grows as 1/ε.

PTAS hierarchy. FPTAS ⊂ PTAS ⊂ APX (constant-factor approximation). Not every NP-hard problem has an FPTAS. Not every one has a PTAS. Not every one is in APX. Vertex cover is in APX (ratio 2) but is unlikely to have a PTAS. Set cover is not in APX (its ratio grows with n). Knapsack is the gold standard: it has an FPTAS, the strongest guarantee short of exact polynomial time.
Quick check: What is the key difference between a PTAS and an FPTAS?

Chapter 7: Inapproximability

We have seen problems that can be approximated within a factor of 2, ln n, or (1 + ε). But some NP-hard problems resist approximation entirely. Understanding these barriers is as important as knowing the algorithms — it tells you when to stop trying.

TSP Without Triangle Inequality: No Hope

If distances do not satisfy the triangle inequality, TSP cannot be approximated within any polynomial function ρ(n) in polynomial time, unless P = NP.

The proof is a beautiful reduction. Suppose you had a ρ(n)-approximation A for general TSP. We will use it to solve Hamiltonian Cycle (an NP-complete problem) in polynomial time. Given a graph G on n vertices, construct a complete weighted graph: edges in G get weight 1, non-edges get weight ρ(n) · n + 1. If G has a Hamiltonian cycle, the optimal TSP tour has cost n. If not, every tour must use a non-edge, costing at least ρ(n) · n + 1. So A can distinguish the two cases — solving the NP-complete problem.

The gap amplification trick. We set the non-edge weight so high that no ρ(n)-approximation can bridge the gap between "has Hamiltonian cycle" (cost n) and "doesn't" (cost > ρ(n) · n). This proves that NO constant, NO polynomial — not even n1000 — can be the approximation ratio. General TSP is as hard as it gets.

The PCP Theorem

The most profound result in the theory of approximation is the PCP Theorem (Arora, Lund, Motwani, Sudan, Szegedy, 1998). PCP stands for Probabilistically Checkable Proofs. The theorem states:

NP = PCP(O(log n), O(1))

What does this mean? Every NP statement has a proof that can be verified by reading only O(1) random bits and O(log n) positions in the proof. A "verifier" flips a few coins to decide which positions to check, then accepts or rejects based on those few bits.

The connection to approximation: the PCP theorem implies that MAX-3SAT is NP-hard to approximate within some constant factor. From there, reductions give inapproximability results for dozens of other problems. The entire landscape of "what can and cannot be approximated" traces back to PCP.

Known Inapproximability Results

ProblemBest Poly-Time ApproxHardness of Approximation
MAX-3SAT7/8No (7/8 + ε) unless P = NP
CliqueO(n / log2 n)No n1−ε unless P = NP
Chromatic numberO(n / log3 n)No n1−ε unless P = NP
Set Cover(1 − o(1)) ln nNo (1 − ε) ln n unless P = NP
Vertex Cover2No (2 − ε) under UGC
TSP (general)NoneNo ρ(n) for any computable ρ

Notice the spectrum. MAX-3SAT and set cover have tight results: the best algorithm matches the hardness bound. Clique is nearly impossible to approximate — polynomial ratio is essentially the trivial algorithm. Vertex cover is conjectured to be tight at 2, but the proof depends on the Unique Games Conjecture (UGC), which remains open.

The Clique Hardness: A Striking Result

Maximum Clique is one of the hardest problems to approximate. Given a graph, find the largest complete subgraph. The best polynomial-time algorithm finds a clique of size O(n / log2 n) when the optimal clique has size n — essentially no better than the trivial algorithm that returns a single vertex.

Hastad (1996) proved that Clique cannot be approximated within n1−ε for any ε > 0, unless P = NP. For a graph with 1000 vertices, this means no polynomial-time algorithm can distinguish between "largest clique is 10" and "largest clique is 10000.99 ≈ 977" — unless P = NP. This is a stunningly strong inapproximability result, far beyond what most NP-hard problems exhibit.

The Connection Graph: Approximation ↔ Proof Checking

Why does the PCP theorem connect to approximation? Here is the intuition. An NP-complete problem like SAT asks "is there an assignment satisfying ALL clauses?" An approximation problem like MAX-SAT asks "what fraction of clauses can we satisfy?" The PCP theorem says that these two questions are computationally equivalent up to a constant factor. If you could approximate MAX-SAT better than 7/8, you could distinguish "all satisfiable" from "at most 7/8 satisfiable" — which is the gap that the PCP characterization creates.

This gap amplification — turning a "yes/no" decision problem into a "high/low value" promise problem — is the mechanism that converts NP-hardness into inapproximability.

A Concrete Inapproximability Proof: TSP Revisited

We proved the TSP result quickly in the text above. Let us now walk through it with actual numbers to make it visceral.

Suppose someone claims a 1000-approximation algorithm A for general TSP. Given a graph G on 10 vertices, we construct G' with:

w(u, v) = 1 if (u, v) is an edge in G
w(u, v) = 10001 if (u, v) is NOT an edge in G

If G has a Hamiltonian cycle, the optimal tour in G' costs 10. Algorithm A returns a tour of cost ≤ 1000 × 10 = 10,000.

If G has NO Hamiltonian cycle, every tour uses at least one non-edge. That non-edge alone costs 10,001. So A's tour costs ≥ 10,001 > 10,000.

We can distinguish the two cases by checking if A's tour costs ≤ 10,000 or > 10,000. We just solved Hamiltonian Cycle in polynomial time. Since HC is NP-complete, P = NP. The same argument works for ANY ratio ρ(n) — just set the non-edge weight to ρ(n) · n + 1.

Visualizing the Approximation Landscape

The canvas below maps NP-hard problems by their approximability. Left = easy to approximate (FPTAS). Right = impossible. Click on any problem to see its details.

The Approximation Landscape

Problems arranged by how well they can be approximated. Click a problem for details. FPTAS (left) to inapproximable (right).

Click on a problem node to see its approximation details.
The Unique Games Conjecture. Proposed by Subhash Khot in 2002, the UGC implies tight inapproximability for vertex cover (ratio 2), MAX-CUT (ratio ≈ 0.878), and many constraint satisfaction problems. It remains neither proven nor disproven. If true, it would complete the picture for a huge class of problems. It is one of the biggest open questions in theoretical CS.

The Hierarchy of Hardness

Putting it all together, we can rank NP-hard optimization problems by how well they can be approximated. This hierarchy is fundamental to understanding the landscape of computational complexity:

FPTAS
Knapsack, subset sum. (1+ε)-approx in poly(n, 1/ε). The best possible.
↓ harder
PTAS
Euclidean TSP, planar problems. (1+ε)-approx, but time may be exponential in 1/ε.
↓ harder
APX (constant ratio)
Vertex cover (2), metric TSP (3/2), MAX-3SAT (7/8). Fixed constant, no tuning.
↓ harder
Log-APX
Set cover (ln n). Ratio grows with input size, but slowly.
↓ harder
Poly-APX
Clique, chromatic number. Ratio is a polynomial function of n.
↓ harder
Inapproximable
General TSP. No polynomial ratio exists unless P = NP.

Every rung of this ladder represents a fundamental barrier. Moving a problem from one level to another (proving a better algorithm or a tighter hardness result) is typically a major research achievement. The PCP theorem was what established most of these barriers.

Here is a useful mnemonic for the hierarchy: FPTAS is a dream, PTAS is nice, APX is solid, Log-APX is acceptable, and inapproximable is a nightmare. When analyzing a new NP-hard problem, your first question should be: "Which rung of the ladder does it sit on?" This immediately tells you what techniques are worth trying.

The approximation class zoo. Just as complexity theory has P, NP, coNP, PSPACE, etc., approximation theory has its own zoo: APX (constant-factor approximable), PTAS (schemes exist), FPTAS (fully polynomial schemes), NPO (NP optimization), and their completeness notions (APX-complete, etc.). Vertex cover is APX-complete: every APX problem reduces to it in an approximation-preserving way. This means improving vertex cover below ratio 2 would improve all APX problems — a breakthrough with enormous consequences.
Quick check: Why can general TSP (without triangle inequality) not be approximated within ANY polynomial factor?

Chapter 8: Interview Arsenal

Approximation algorithms come up in interviews under a specific pattern: "This problem is NP-hard. What do you do?" The interviewer is testing whether you (a) recognize the NP-hardness, (b) know what approximation ratios are achievable, and (c) can implement the key algorithms.

Cheat Sheet: Problems and Their Best Ratios

ProblemTypeBest RatioAlgorithmTime
Vertex CoverMin2Maximal matchingO(V+E)
TSP (Δ-ineq)Min3/2ChristofidesO(n3)
TSP (Euclidean)Min1+εArora PTASO(n logO(1/ε) n)
Set CoverMinln n + 1GreedyO(nm)
MAX-SATMax3/4Rand. roundingO(LP time)
MAX-CUTMax0.878SDP roundingO(SDP time)
KnapsackMax1+εFPTASO(n2/ε)
Bin PackingMin1+εAPTASpoly(n, 1/ε)
Steiner TreeMin1.39LP + iterativeO(LP time)
Scheduling (makespan)Min4/3LPTO(n log n)

Design Question Patterns

Pattern 1: "Minimize cost under coverage constraint"

If the problem looks like "cover everything at minimum cost," think Set Cover. Greedy gives O(ln n) ratio. If costs are unit, look for a 2-approximation via LP rounding or combinatorial methods.

Pattern 2: "Find shortest/cheapest tour"

First ask: does triangle inequality hold? If yes, MST doubling (ratio 2) or Christofides (ratio 3/2). If Euclidean, Arora's PTAS. If no metric assumption, explain that no approximation is possible.

Pattern 3: "Select subset maximizing value under constraints"

If it looks like Knapsack, remember the FPTAS. If it looks like MAX-SAT, think LP relaxation + randomized rounding. If it is MAX-CUT or constraint satisfaction, mention SDP rounding.

Pattern 4: "Graph selection problem"

Vertex cover, independent set, dominating set. Know that vertex cover has ratio 2, independent set is nearly inapproximable (no n1-ε), and dominating set is essentially set cover (ln n ratio).

Pattern 5: "Partition into groups optimally"

This is scheduling (partition jobs into machines) or bin packing (partition items into bins). For makespan scheduling, LPT gives 4/3. For bin packing, First Fit Decreasing gives 11/9 · OPT + 6/9. Both have PTAS/APTAS variants for tighter ratios.

Pattern 6: "This is hard but the instance is special"

Always check for special structure. Is the graph planar? Bounded treewidth? Is the metric Euclidean? Special structure often admits better approximations or even PTAS. For example, vertex cover on planar graphs has a PTAS (via Baker's technique), while general vertex cover is stuck at ratio 2.

Example Interview Dialogue

Interviewer: "You need to place the minimum number of fire stations such that every house is within 5 miles of a station. How would you solve this?"

You: "This is the minimum dominating set problem on a distance graph — each node is a potential station location, and edges connect locations within 5 miles. This is NP-hard, so I would not try to find the exact optimum."

You (continued): "I would reduce it to set cover: each potential location has a 'coverage set' of houses within 5 miles. Then I would use the greedy set cover algorithm — at each step, place the station that covers the most uncovered houses. This gives an O(ln n) approximation, which means if the optimal solution needs 10 stations, I would use at most about 33."

You (continued): "If the geography is Euclidean (houses and stations on a 2D plane), I could exploit geometric structure for a better ratio. But for a road network with arbitrary distances, greedy set cover with O(ln n) is essentially the best we can do — it is provably tight."

Coding Drill: Vertex Cover + Set Cover

The canvas below presents random instances. Implement and trace through the algorithms mentally, then check your answer.

Coding Drill: Trace the Algorithm

A random graph or set cover instance is shown. Trace the approximation algorithm, predict the cover, then click "Show Answer".

Click "Vertex Cover Drill" or "Set Cover Drill" to get a problem.

The "What Do You Do?" Framework

When an interviewer says "this is NP-hard, what do you do?", walk through this hierarchy:

Step 1
Acknowledge NP-hardness. "Exact solution requires exponential time."
Step 2
Ask about approximation requirements. "Is a factor of 2 acceptable? Do we need (1+ε)?"
Step 3
Propose the algorithm. State the ratio and time complexity.
Step 4
Sketch the proof. "The key insight is that OPT must [lower bound argument]."
Step 5
Mention alternatives. "If better ratios needed, consider LP rounding / PTAS / heuristics."
Interview meta-tip. Approximation algorithm questions are not about memorizing ratios. They test three things: (1) can you recognize when a problem is NP-hard? (2) can you find a lower bound on OPT? (3) can you design a simple algorithm whose cost you can bound relative to that lower bound? Master these three skills and you can handle any approximation question, even on problems you have never seen.

Common Interview Traps

Trap 1: "Just use DP." If the problem has exponential state space (e.g., subset selection with n > 30), exact DP is infeasible. Know when to switch to approximation.

Trap 2: "Just use greedy." Greedy is only an approximation algorithm if you can prove a ratio. "Greedy works well in practice" is not a proof. Always state the ratio.

Trap 3: "This looks like vertex cover, so ratio 2." Be careful with reductions. Independent set is the complement of vertex cover, but it has a WILDLY different approximation ratio (near n1−ε inapproximability vs ratio 2). Complementation does not preserve approximation ratios.

Trap 4: Confusing PTAS existence with practical efficiency. A PTAS for Euclidean TSP exists, but for small ε it is impractical. In interviews, mention the PTAS for theoretical completeness but recommend Christofides for practical use.

Trap 5: Forgetting to state assumptions. The TSP 3/2-approximation requires the triangle inequality. Always state which variant you are solving. If the interviewer says "arbitrary distances," you must say "no constant approximation exists" — not "use Christofides."

Trap 6: Not knowing when exact is feasible. For small n (say n ≤ 20 for TSP, n ≤ 30 for vertex cover), exact algorithms via DP or branch-and-bound may run in seconds. Do not propose a 2-approximation when n = 10 and you could just solve it exactly. Know the crossover point.

Quick Implementation Checklist

python
# 1. Vertex Cover 2-approx: O(V + E)
#    - Find maximal matching (greedy on edges)
#    - Return both endpoints of every matched edge

# 2. TSP 2-approx (metric): O(n^2) for MST + O(n) for DFS
#    - Build MST (Prim's or Kruskal's)
#    - DFS preorder traversal of MST
#    - Return as Hamiltonian cycle

# 3. Set Cover O(ln n)-approx: O(n * m) per step
#    - While uncovered elements exist:
#      - Pick set with most uncovered elements
#      - Mark its elements as covered

# 4. Knapsack FPTAS: O(n^2 / epsilon)
#    - Scale values: v' = floor(v * n / (epsilon * v_max))
#    - Run DP on scaled values
#    - Backtrack to find items
Quick check: A problem asks you to find the minimum dominating set in a graph (every vertex is either in the set or adjacent to a vertex in the set). This is NP-hard. What is the best approach?

Chapter 9: Connections

Approximation algorithms sit at the intersection of several major algorithmic themes. Here is how this chapter connects to the broader CLRS landscape and to real-world practice.

Direct Dependencies

ChapterConnection
Ch 34: NP-CompletenessApproximation only matters because these problems are NP-hard. Ch 34 establishes the hardness; Ch 35 asks "now what?"
Ch 16: Greedy AlgorithmsSet cover's greedy algorithm is a direct application of the greedy paradigm. The difference: in Ch 16, greedy is exact; here, greedy is approximate with a proven ratio.
Ch 23: MSTThe MST is the lower bound for metric TSP. Prim's and Kruskal's algorithms are subroutines in the TSP approximation.
Ch 29: Linear ProgrammingLP relaxation is the foundation of randomized rounding and the vertex cover LP-based 2-approximation. Duality gives lower bounds on OPT.
Ch 15: Dynamic ProgrammingThe knapsack FPTAS uses DP as a subroutine, with scaled values to control table size. DP + scaling = approximation scheme.

The Bigger Picture

ApproachGuaranteeWhen to Use
Exact (brute force/DP)OptimalSmall instances (n ≤ 20)
Approximation algorithmsProvable ratioWhen you need guarantees
Heuristics (simulated annealing, genetic)NoneWhen speed matters, guarantees don't
Fixed-parameter tractableExact, parameterizedWhen a parameter (treewidth, etc.) is small
Randomized algorithmsExpected or w.h.p.When randomness helps (MAX-CUT, SAT)

Real-World Applications

Vehicle routing: TSP variants with time windows, capacity constraints. Christofides-based heuristics are the backbone of logistics optimization at UPS, FedEx, and Amazon. Google's OR-Tools routing solver uses MST-based initial solutions with local search improvement, processing millions of deliveries daily.

Network design: Set cover and Steiner tree approximations power cell tower placement, server farm location, and CDN design. When Akamai or Cloudflare decides where to place edge servers, the underlying optimization is a facility location problem (a close cousin of set cover) solved via LP-based approximations.

Scheduling: Makespan minimization on parallel machines uses LPT (Longest Processing Time first), a 4/3-approximation that runs in O(n log n). Every cloud scheduler (Kubernetes, Mesos, Borg) uses scheduling approximations to assign pods to nodes.

Compiler optimization: Register allocation is a graph coloring problem. Approximation and heuristic coloring algorithms decide which variables live in registers vs memory. LLVM uses a greedy graph coloring heuristic that has no formal approximation ratio but works well in practice.

Machine learning: Feature selection (pick k features to maximize prediction accuracy) is related to set cover. Clustering (k-means, k-median) has constant-factor approximation algorithms. The celebrated k-means++ initialization (Arthur and Vassilvitskii, 2007) is an O(log k)-approximation for k-means clustering, and it runs in O(nk) time. It is the default initialization in scikit-learn.

Bioinformatics: Genome assembly, sequence alignment, and phylogenetic tree construction all involve NP-hard optimization problems. The Steiner tree problem (connect a subset of "terminal" nodes in a graph at minimum cost) arises in network biology and has a 1.39-approximation via LP-based methods.

VLSI design: Circuit layout involves placing components and routing wires to minimize total wire length — this is a collection of NP-hard problems (graph partitioning, Steiner tree, TSP variants). VLSI CAD tools like Cadence and Synopsys use approximation algorithms as starting points for simulated annealing and other metaheuristics.

The pragmatic lesson. In practice, most people combine approximation algorithms with local search. Use the approximation algorithm to get a provably good starting point, then improve it with local moves (2-opt for TSP, random restarts for SAT). The approximation provides the safety net; local search provides the polish. This hybrid approach dominates industry practice.

The Approximation Toolkit Summary

When faced with an NP-hard optimization problem, work through this decision tree:

Is there structure?
Metric? Planar? Bounded treewidth? Special structure enables better ratios.
Try greedy
Natural greedy often gives O(1) or O(log n) ratio. Prove it via the lower-bound method.
Try LP relaxation
Solve the LP, round deterministically or randomly. LP gives lower bound for free.
Try PTAS techniques
Scaling + DP (knapsack), divide-and-conquer + DP (geometric problems), local search.
Check hardness
Is there a matching inapproximability result? If so, you have reached the limit.

Where to Go Next

Approximation in the Age of Machine Learning

Modern ML systems often solve NP-hard optimization problems using learned heuristics (e.g., neural network-guided TSP solvers). These can produce excellent solutions in practice but come with no worst-case guarantees. The approximation algorithm community's response: use the ML heuristic for speed, but verify using the approximation algorithm as a baseline. If the ML solution is worse than the 2-approximation, something is wrong with the model.

There is also an emerging field of learning-augmented algorithms: use ML predictions to improve worst-case algorithms. For example, a TSP solver might use a neural network to predict a good starting tour, then prove that if the prediction is ε-close to optimal, the algorithm achieves a (1 + O(ε))-approximation. If the prediction is garbage, the algorithm still guarantees ratio 2. Best of both worlds: good performance when ML works, guaranteed safety when it does not.

Open Problems

Several major questions remain open in approximation algorithms:

Vertex Cover: Is the ratio of 2 optimal for polynomial-time algorithms? The best inapproximability result (assuming UGC) says no (2 − ε)-approximation exists. But UGC itself is unproven. Without UGC, the best known hardness is about 1.36 (Dinur and Safra, 2005). The gap between 1.36 and 2 has been open for decades.

Metric TSP: Christofides' ratio of 3/2 has been essentially unimproved since 1976 (the 2020 improvement of 10−36 barely counts). Can we get below 1.4? 1.3? Nobody knows. The best lower bound is 123/122 ≈ 1.008 (from NP-hardness), leaving an enormous gap. The true answer might be anything between 1.008 and 1.5 — that is how little we know about the hardness of metric TSP.

Unique Games Conjecture: Resolving UGC would settle approximation ratios for dozens of problems simultaneously. It is the single most impactful open problem in the theory of approximation. Subhash Khot received the Nevanlinna Prize (now the Abacus Medal) in 2014 for formulating it.

Beyond worst case: Can we prove better ratios for "typical" or "random" inputs? The field of average-case approximation is nascent but promising. For instance, on random graphs, the greedy vertex cover algorithm empirically achieves ratio ≈ 1.1, far better than the worst-case 2. Making such average-case bounds rigorous is an active research area.

Where to Go Next

If this chapter sparked your interest, here are the natural next steps. For the theory: Vazirani's "Approximation Algorithms" textbook goes far deeper into LP rounding, semidefinite programming, and primal-dual methods. Williamson and Shmoys' "Design of Approximation Algorithms" is the modern reference. For the practice: solve approximation problems on competitive programming platforms (Codeforces tags: greedy, graphs, flow). For the frontier: read about the resolution of the Unique Games Conjecture and the new results on the hardness of MAX-CUT.

Within this lesson series, the most relevant connections are:

LessonWhy It Connects
Ch 16: Greedy AlgorithmsGreedy is the most common approximation technique. Set cover greedy is Chapter 16 applied to NP-hard problems.
Ch 34: NP-CompletenessYou need to know a problem is NP-hard before approximation makes sense. Reductions also prove inapproximability.

The Philosophical Takeaway

Approximation algorithms represent a profound shift in thinking. Classical algorithm design asks: "What is the optimal solution?" Approximation asks: "How close can I get, how fast, and can I prove it?" This shift from optimality to bounded sub-optimality is not a defeat — it is a mature response to computational reality.

The P ≠ NP conjecture (if true) means that thousands of natural optimization problems cannot be solved exactly in polynomial time. Approximation algorithms say: "Fine. We cannot find the best answer. But we can find an answer that is within 50% of best, or within 10%, or within 0.1%, and we can prove it." That is not giving up. That is engineering.

The deepest lesson from this chapter is that the quality of an approximation is itself computable and provable. We do not just hope our heuristic works. We prove it does, on every input, against every adversary. The approximation ratio is not a suggestion — it is a theorem.

This mathematical certainty is what separates approximation algorithms from heuristics. A simulated annealing algorithm for TSP might produce excellent tours in practice, but it comes with no guarantee — on some pathological input, it might produce a tour 100x worse than optimal. Christofides will never be worse than 1.5x. When lives depend on the answer (medical scheduling, emergency routing, infrastructure design), that guarantee is worth its weight in gold.

As a final thought: the existence of approximation algorithms shows that the NP-hardness barrier is not a wall but a slope. Some problems (knapsack) can be climbed almost to the top with an FPTAS. Others (vertex cover, TSP) have a ledge at a constant factor where we are stuck. And a few (clique, general TSP) are sheer cliffs where no polynomial-time approach can gain any meaningful foothold. Mapping this slope — problem by problem, ratio by ratio — is one of the great achievements of 20th-century theoretical computer science, and the work continues today.

"The art of approximation is not settling for less — it is proving how little you have lost."