Greedy Algorithms — CLRS Chapter 16

Chapter 0: The Problem

You manage a single lecture hall. Ten professors want to use it today, each for a different time slot. Professor A needs it from 1 PM to 4 PM. Professor B needs 2 PM to 5 PM. Professor C needs 3 PM to 4 PM. Some of these overlap. Your job: schedule as many lectures as possible without any two overlapping.

This is the activity selection problem. You have n activities, each with a start time s_i and finish time f_i. Two activities are compatible if they do not overlap: one finishes before the other starts. You want the largest set of mutually compatible activities.

Let us formalize. Activity i occupies the half-open interval [s_i, f_i). Two activities i and j are compatible if s_i ≥ f_j or s_j ≥ f_i — one must finish before the other starts. The problem: find a maximum-cardinality subset S of mutually compatible activities.

Why Naive Strategies Fail

How would you solve it? Your first instinct might be: "Pick the shortest activity first." That seems reasonable — shorter activities leave more room for others. But it fails. Consider three activities: [0, 5], [4, 7], [6, 11]. The shortest is [4, 7] with duration 3, but picking it blocks both of the others (it overlaps with [0, 5] at time 4 and overlaps with [6, 11] at time 6-7). Picking [0, 5] and [6, 11] gives you two activities instead of one.

Maybe pick the activity that starts earliest? That fails too. One activity [0, 100] starts earliest but blocks everything. A single greedy-by-start-time pick monopolizes the entire day.

Maybe pick the activity that conflicts with the fewest others? This is cleverer, but it also fails on constructed examples. Consider activities arranged in a diamond pattern: one central activity overlapping few others but blocking the most productive arrangement of peripheral activities. The "minimum conflicts" heuristic is better than the first two, but still not optimal.

The right answer is beautifully simple: always pick the activity that finishes earliest. This is the greedy choice, and it always works. By picking the earliest finisher, you free up the timeline as soon as possible for remaining activities. You are being maximally conservative about how much future time you consume.

Think of it this way. Imagine you are standing at time t=0, looking rightward along the timeline. You want to pack as many activities as possible. Which activity consumes the least of the future? The one that ends soonest. Pick it, advance your clock to its finish time, and repeat. This greedy rule — "leave the most room" — is the heart of the algorithm.

The greedy principle. At each step, make the locally optimal choice. Never reconsider. Never backtrack. For the right class of problems, this locally optimal strategy leads to a globally optimal solution. That is the miracle of greedy algorithms — and this chapter is about understanding exactly when and why that miracle holds.

Why Greedy Matters

In Chapter 15 (Dynamic Programming), you learned to solve optimization problems by breaking them into overlapping sub-problems and building solutions bottom-up. DP is powerful but often expensive: it fills a table of size O(n²) or O(nW) or worse, considers all possible choices at each step, and uses significant memory.

Greedy algorithms are the fast lane. When applicable, they replace DP's exhaustive search with a simple rule: sort the input, scan once, make irrevocable choices. The result is algorithms that are simpler to code, faster to run, and easier to reason about. The catch: greedy only works for problems with a special structural property. Recognizing that property — and proving it holds — is the central skill of this chapter.

Think of it as a hierarchy of algorithmic sophistication:

Approach	Time	When It Works
Brute force	O(2ⁿ) or worse	Always (but too slow for n > 20)
Dynamic programming	O(n²) to O(nW)	Optimal substructure + overlapping sub-problems
Greedy	O(n log n) typical	Optimal substructure + greedy-choice property

Greedy is not a weaker tool than DP — it is a stronger requirement on the problem structure. When the requirement is met, you get a dramatic speedup for free. Think of greedy as the "express lane" — it is faster, but not every problem is eligible.

In this lesson we will study three canonical greedy problems in depth (activity selection, Huffman coding, fractional knapsack), prove their correctness using the exchange argument, see exactly where and why greedy fails on related problems (0/1 knapsack, general coin change), and build a practical toolkit for recognizing greedy problems in interviews and competitions.

Watch the Greedy Strategy in Action

The canvas below shows activities as colored bars on a timeline. The greedy algorithm sorts by finish time, then scans left to right, picking each activity that does not conflict with the last one selected. Click "Run Greedy" to watch it work. Click "New Activities" to generate a fresh random instance.

Activity Selection — Greedy by Earliest Finish

Activities are sorted by finish time. Green = selected. Red = rejected (overlaps with a selected activity). Click "Run Greedy" to step through.

Click "Run Greedy" to animate, or "Step" to go one activity at a time.

Notice the pattern. The greedy algorithm never backtracks. It makes one pass through the sorted activities, and each decision is final. It runs in O(n log n) time — the sort dominates. The scan is O(n). Compare this to a brute-force approach that would check all 2ⁿ subsets. For 20 activities, brute force examines over one million subsets. For 50 activities, it is over a quadrillion. Greedy solves it in microseconds.

Let us trace through a concrete example by hand. Consider these activities sorted by finish time:

Activity	Start	Finish	Greedy Decision
A	1	4	SELECT (first activity, always take it). last_finish = 4
B	3	5	SKIP (start 3 < last_finish 4)
C	0	6	SKIP (start 0 < last_finish 4)
D	5	7	SELECT (start 5 ≥ last_finish 4). last_finish = 7
E	3	9	SKIP (start 3 < last_finish 7)
F	5	9	SKIP (start 5 < last_finish 7)
G	6	10	SKIP (start 6 < last_finish 7)
H	8	11	SELECT (start 8 ≥ last_finish 7). last_finish = 11
I	8	12	SKIP (start 8 < last_finish 11)
J	2	14	SKIP (start 2 < last_finish 11)
K	12	16	SELECT (start 12 ≥ last_finish 11). last_finish = 16

Result: {A, D, H, K} — four activities, no overlaps. This is indeed optimal for this input. Try finding five compatible activities; you will not succeed. Here is a proof by exhaustion for the skeptical: any five activities must include at least two from the dense middle region (B, C, E, F, G), but all pairs from that region overlap with each other or with A/D. The maximum is four.

The algorithm structure is worth appreciating. The entire decision process is a single left-to-right scan. Each activity gets exactly one look. The invariant maintained through the scan is simple: last_finish tracks when the most recently selected activity ends. An incoming activity is compatible if and only if it starts at or after last_finish. This invariant is trivially maintained — when we select an activity, we update last_finish to its finish time. When we skip, last_finish stays the same.

But why does this work? Why is "pick the earliest finisher" always optimal? We will prove this rigorously in Chapter 1, using a technique called the exchange argument.

Quick check: You have activities [1,3], [2,5], [4,7], [6,9], [8,10]. The greedy algorithm (sort by finish time, pick non-conflicting) selects which set?

[1,3], [2,5], [8,10] — pick every other one [1,3], [4,7], [8,10] — three activities with no overlaps [1,3], [6,9] — only two non-conflicting activities exist

Chapter 1: The Greedy Choice Property

In Chapter 0 we saw that the "earliest finish time" greedy strategy works for activity selection. But we handwaved the "why." Now we make it airtight. The key requirement for any greedy algorithm to produce an optimal solution is the greedy-choice property.

What Is the Greedy-Choice Property?

A problem has the greedy-choice property if there exists an optimal solution that includes the greedy choice. In other words: when we make our locally optimal decision, we are not painting ourselves into a corner. There is always a globally optimal solution that agrees with our first greedy pick.

This is different from dynamic programming. DP makes no commitment until it has explored all sub-problems. Greedy commits immediately and never looks back. For this boldness to pay off, we need to know that our first move is never a mistake.

More precisely: a problem exhibits the greedy-choice property if, for every instance of the problem, there exists at least one optimal solution that includes the locally optimal (greedy) choice. We do not need EVERY optimal solution to include it — just one. That is enough to justify making the greedy choice and then recursing on the sub-problem.

The Exchange Argument

How do you prove the greedy-choice property holds? The standard technique is the exchange argument (also called a "swap argument" or "replacement argument"). The idea is deceptively simple: take any optimal solution, and show you can transform it into one that includes the greedy choice without making it any worse.

Here is the exchange argument for activity selection, stated with full rigor:

1. Assume an optimal solution O exists

O is any maximum-size set of compatible activities. We make no assumptions about what is in O.

↓

2. Let g be the greedy choice

g is the activity with the earliest finish time among all activities. The greedy algorithm picks g first.

↓

3. Case A: g ∈ O

If the optimal solution already contains g, we are done. The greedy choice is in an optimal solution.

↓

4. Case B: g ∉ O — exchange!

O must contain some first activity a₁ (the one with the earliest finish in O). Since g finishes no later than a₁ (g has the earliest finish of ALL activities), we can swap a₁ out and g in. Call this new set O'. O' has the same size as O (still optimal) and g does not conflict with any remaining activity in O' because g finishes no later than a₁ did.

↓

5. Conclusion

In both cases, there exists an optimal solution containing g. Therefore the greedy choice is safe. Repeat the argument on the remaining sub-problem (activities starting after g finishes).

The exchange argument template. For any greedy proof: (1) Assume an optimal solution O. (2) If it already includes the greedy choice, done. (3) Otherwise, show you can swap something in O for the greedy choice without making the solution worse. This "swap without damage" is the heart of every greedy correctness proof. Memorize this template — it appears in Huffman codes, minimum spanning trees, Dijkstra, and dozens of interview problems.

The Formal Proof

Theorem: Let S = {a₁, a₂, ..., a_n} be a set of activities sorted by finish time (f₁ ≤ f₂ ≤ ... ≤ f_n). Then there exists an optimal solution that includes a₁ (the activity with the earliest finish time).

Proof: Let O be an optimal solution. Let a_k be the activity in O with the earliest finish time. If a_k = a₁, we are done. Otherwise, consider the set O' = (O \ {a_k}) ∪ {a₁} — we remove a_k and add a₁.

We claim O' is also a valid optimal solution. First, |O'| = |O| (same cardinality — we swapped one element for another). Second, O' has no conflicts. Why? Since a₁ has the earliest finish time of all activities, f₁ ≤ f_k. Activity a_k was compatible with all other activities in O. Every other activity a_j in O has s_j ≥ f_k ≥ f₁. So a₁ is also compatible with every a_j in O. Therefore O' is a valid solution with the same size as O that includes a₁. QED.

Corollary: The greedy algorithm (repeatedly select the earliest-finishing compatible activity) produces an optimal solution. Proof by induction: the first greedy choice is safe (the theorem above). After making it, the remaining sub-problem has the same structure, and the same argument applies to the next greedy choice. By induction on the number of activities selected, the entire greedy solution is optimal.

Visualizing the Exchange

The canvas below shows two timelines. The top is an arbitrary optimal solution O. The bottom shows O' after swapping in the greedy choice. Watch how the swap cannot create any new conflicts.

Exchange Argument — Visual Proof

Top: optimal solution O with first activity a₁. Bottom: O' after swapping a₁ for g (earliest-finish). The swap is always safe because g finishes no later than a₁.

Click "Show Exchange" to see the swap.

Optimal Substructure

The greedy-choice property is necessary but not sufficient. We also need optimal substructure: after making the greedy choice, the remaining problem must itself have an optimal solution that, combined with the greedy choice, gives an optimal solution to the whole problem.

For activity selection: after picking the earliest-finishing activity g, we are left with the sub-problem of all activities that start after g finishes. An optimal solution to the original problem consists of g plus an optimal solution to this sub-problem. This is easy to prove by contradiction: if the remaining activities were not solved optimally, we could improve the overall solution by improving the sub-solution.

Greedy works ⇔ Greedy-choice property + Optimal substructure

Both conditions must hold. Many problems have optimal substructure (like the 0/1 knapsack) but lack the greedy-choice property. For those, you need DP. We will see concrete examples in Chapter 3.

Why the Exchange Argument Works

The exchange argument is so central to greedy algorithms that it is worth understanding at a meta-level. The argument has a specific logical structure:

Goal: Show that the greedy choice g is in some optimal solution.

Setup: Let O be any optimal solution. We consider two cases.

Case 1: g ∈ O. Done — we already have an optimal solution containing g.

Case 2: g ∉ O. This is the interesting case. We need to:

Identify an element x ∈ O that we can swap for g.
Show that the swap does not violate any constraints (the new solution is still feasible).
Show that the swap does not worsen the objective (the new solution is at least as good as O).

The result is a new optimal solution O' = (O \ {x}) ∪ {g} that contains g. Since O was arbitrary, this proves that no matter which optimal solution we start with, we can always "exchange in" the greedy choice.

Different problems use different exchange arguments. For activity selection, we swap the first activity. For Huffman coding, we swap the positions of the two least-frequent characters. For MST, we swap an edge on a cycle. The structure is always the same: swap, show feasibility, show no-worse.

Common exchange argument patterns. (1) Swap-first: Replace the first element of the optimal solution with the greedy choice (activity selection). (2) Swap-deep: Move two elements to the deepest level of a tree structure (Huffman). (3) Swap-on-cycle: Add the greedy edge, creating a cycle, then remove the heaviest edge on the cycle (MST). The underlying logic is identical in all three cases.

Practice: Construct Your Own Exchange Argument

Here is an exercise. Consider the problem: "Given n ropes with lengths L₁, ..., L_n, connect them into one rope. The cost of connecting two ropes of lengths a and b is a + b. Minimize total cost."

The greedy strategy: always connect the two shortest ropes first (same as Huffman!). Can you construct the exchange argument? Hint: assume an optimal solution connects some non-shortest pair first. Show you can swap this connection with the shortest-pair connection and reduce the total cost. The argument is almost identical to the Huffman optimality proof — the two shortest "ropes" should be merged first because they contribute their length to every subsequent merge step, so minimizing their individual contribution minimizes the total.

This problem (LeetCode #1167, "Minimum Cost to Connect Sticks") is literally Huffman coding in disguise. Recognizing structural equivalences like this is what turns algorithm knowledge from memorized patterns into genuine problem-solving power.

Chapter 2: Activity Selection Deep Dive

We have established that the greedy algorithm is correct. Now let us understand it as an algorithm engineer: exact implementation, complexity analysis, and the road not taken (DP).

The Greedy Algorithm — Step by Step

1. Sort activities by finish time

Given n activities with (s_i, f_i), sort by f_i ascending. Cost: O(n log n).

↓

2. Pick the first activity

The activity with the smallest f_i is always in an optimal solution (exchange argument). Add it to the selected set S. Set last_finish = f₁.

↓

3. Scan left to right

For each remaining activity i (in sorted order): if s_i ≥ last_finish, add i to S and update last_finish = f_i. Otherwise skip i.

↓

4. Return S

S is a maximum-size set of mutually compatible activities. Total time: O(n log n) for the sort + O(n) for the scan = O(n log n).

python
def activity_selection_greedy(activities):
    # activities: list of (start, finish) tuples
    # Sort by finish time
    activities.sort(key=lambda x: x[1])

    selected = [activities[0]]
    last_finish = activities[0][1]

    for i in range(1, len(activities)):
        start, finish = activities[i]
        if start >= last_finish:  # compatible
            selected.append(activities[i])
            last_finish = finish

    return selected

# Example
acts = [(1,4), (3,5), (0,6), (5,7), (3,9), (5,9),
        (6,10), (8,11), (8,12), (2,14), (12,16)]
result = activity_selection_greedy(acts)
print(result)  # [(1,4), (5,7), (8,11), (12,16)] — 4 activities

The DP Approach (And Why It Is Overkill)

Activity selection can be solved with dynamic programming. Define sub-problem S_ij as the set of activities that start after activity i finishes and finish before activity j starts. Let c[i, j] = size of the maximum compatible subset of S_ij. The recurrence is:

c[i, j] = max_{a_k ∈ S_ij} { c[i, k] + c[k, j] + 1 }

This recurrence considers every possible "next activity" a_k in S_ij. For each candidate, we split into left sub-problem S_ik (activities between i and k) and right sub-problem S_kj (activities between k and j), solve both optimally, and take the best overall split. The number of sub-problems is O(n²) (one for each (i, j) pair). Each sub-problem considers up to O(n) candidates for a_k. Total: O(n³) in the worst case, or O(n²) with optimized implementation. Either way, it works — but it does far more work than necessary.

The DP approach treats the problem as if we do not know which activity to pick first. It considers ALL possibilities and chooses the best. Greedy exploits the insight that we DO know which to pick first: the one that finishes earliest. This collapses n choices at the first step into exactly one, and the problem becomes a single chain of decisions rather than a branching tree of sub-problems.

The greedy algorithm exploits the fact that we never need to consider all possible next activities. The greedy choice property tells us exactly which one to pick: the one that finishes earliest. No table, no backtracking, no quadratic time. Just O(n log n).

DP vs Greedy — Side by Side

The canvas below runs the same set of activities through both algorithms. On the left, DP fills a table. On the right, greedy makes a single pass. Both find the same optimal solution, but greedy gets there faster.

DP Table vs Greedy Scan — Same Activities

Left: DP fills an n×n table (O(n²)). Right: greedy scans once (O(n)). Both produce the optimal solution.

Click "Run Both" to compare DP and Greedy on the same input.

Property	DP Approach	Greedy Approach
Time complexity	O(n²)	O(n log n)
Space complexity	O(n²) for the table	O(n) for the sorted array + O(k) for the result
Correctness	Always optimal (considers all sub-problems)	Always optimal (proven by exchange argument)
Implementation	Moderate (need to define sub-problems carefully)	Simple (sort + one pass)
When to use	When greedy-choice property does not hold	When greedy-choice property holds

The lesson. When a problem has the greedy-choice property, DP still works but is needlessly expensive. Greedy is DP that happens to never need the full table because the optimal first choice is always clear. This is why recognizing the greedy-choice property is so valuable: it simplifies both the algorithm and the proof.

Recursive vs Iterative Implementation

CLRS presents activity selection in both recursive and iterative forms. The recursive version mirrors the proof structure directly:

python
def recursive_activity_select(activities, k, n):
    # activities: sorted by finish time
    # k: index of last selected activity
    # n: total number of activities
    m = k + 1
    # Find the first activity that starts after activity k finishes
    while m < n and activities[m][0] < activities[k][1]:
        m += 1
    if m < n:
        return [activities[m]] + recursive_activity_select(activities, m, n)
    else:
        return []

# Usage: prepend a dummy activity (0, 0) to bootstrap
acts = [(0, 0), (1,4), (3,5), (0,6), (5,7), (8,11), (12,16)]
result = recursive_activity_select(acts, 0, len(acts))
print(result)  # [(1,4), (5,7), (8,11), (12,16)]

The recursive version makes the inductive structure explicit: select the first compatible activity (the greedy choice), then recurse on the remaining sub-problem. But it is tail-recursive — the recursive call is the last operation — so it converts trivially to the iterative version we saw earlier. In practice, always use the iterative version. It avoids stack depth issues and is equally clear.

Weighted Activity Selection

A harder variant: each activity also has a weight (profit), and you want to maximize total weight rather than count. Now greedy fails — you might skip a high-weight activity to fit two low-weight ones. This problem requires DP:

dp[i] = max( dp[i-1], w_i + dp[p(i)] )

where p(i) is the latest activity that finishes before activity i starts (found by binary search). This runs in O(n log n). The weighted variant is a classic interview problem (LeetCode #1235 "Maximum Profit in Job Scheduling").

python
import bisect

def weighted_activity_selection(activities):
    # activities: list of (start, finish, weight)
    activities.sort(key=lambda x: x[1])  # sort by finish
    n = len(activities)
    finishes = [a[1] for a in activities]
    dp = [0] * (n + 1)

    for i in range(1, n + 1):
        # p(i): latest job ending <= start of job i
        s_i = activities[i - 1][0]
        p = bisect.bisect_right(finishes, s_i, 0, i - 1)
        dp[i] = max(dp[i - 1], activities[i - 1][2] + dp[p])

    return dp[n]

Unweighted: greedy. Weighted: DP. Adding weights to activity selection destroys the greedy-choice property. The "earliest finish" activity might have weight 1, while a later-finishing activity has weight 1000. Greedy cannot make this tradeoff; it needs the full DP table. This is a perfect example of how a small change to a problem can shift it from greedy to DP territory.

Variants of Activity Selection

Activity selection has several important variants, each requiring a different technique:

Variant	Goal	Algorithm	Complexity
Maximum count (original)	Most activities, unweighted	Greedy by earliest finish	O(n log n)
Maximum weight	Most total weight	DP with binary search	O(n log n)
Minimum rooms	Schedule ALL activities, minimize rooms	Greedy with heap (interval partitioning)	O(n log n)
Maximum overlap	Find time with most simultaneous activities	Sweep line	O(n log n)
Minimum removals	Remove fewest to eliminate all overlaps	n - (greedy max count)	O(n log n)

The key insight: the original activity selection problem (max count, unweighted) is one of the few variants where greedy is optimal. Most other variants require DP, heap-based approaches, or sweep lines. Knowing which variant you face is essential for choosing the right algorithm.

In an interview, when you hear "schedule activities/jobs/meetings," immediately ask: "Are we maximizing count, maximizing total value/weight, minimizing rooms, or something else?" The answer determines your algorithm:

Max count, unweighted?

Greedy by earliest finish. One line of sorting + one loop.

↓ No

Max total weight?

DP with binary search. Sort by finish, dp[i] = max(dp[i-1], w_i + dp[p(i)]).

↓ No

Schedule all, min rooms?

Greedy with heap. Sort by start, reuse earliest-ending room.

↓ No

Something else?

Probably sweep line, segment tree, or specialized DP. Ask clarifying questions.

Complexity check: The greedy activity selection algorithm runs in O(n log n). Where does the log n come from, and what is the cost of the scan phase alone?

The O(n log n) comes from sorting the activities by finish time. The scan phase is O(n) — a single pass through the sorted array. The O(n log n) comes from the binary search at each step. The scan is also O(n log n). The O(n log n) comes from building a balanced BST. The scan is O(log n).

Chapter 3: Greedy vs Dynamic Programming

Both greedy and DP exploit optimal substructure: the optimal solution to the whole problem contains optimal solutions to sub-problems. The difference lies in how they make choices.

Property	Greedy	Dynamic Programming
Choice strategy	Make the best-looking choice NOW, never reconsider	Consider all choices, pick the best after seeing all sub-problem solutions
Sub-problems	One sub-problem remains after each choice (no branching)	Many overlapping sub-problems, solved bottom-up or top-down with memoization
Required properties	Greedy-choice property + optimal substructure	Optimal substructure + overlapping sub-problems
Typical complexity	O(n log n) or O(n)	O(n²) or O(nW) etc.

The Knapsack Litmus Test

The most famous example of "greedy works here but fails there" is the knapsack problem. There are two variants:

Fractional knapsack: You have a knapsack with capacity W. There are n items, each with weight w_i and value v_i. You can take fractions of items. Goal: maximize total value.

0/1 knapsack: Same setup, but you must take an item entirely or not at all. No fractions.

For fractional knapsack, the greedy strategy is simple: compute the value density v_i/w_i for each item, sort by density descending, and greedily take as much of the densest item as possible, then the next, until the knapsack is full. This is optimal.

Fractional knapsack greedy: sort items by v_i/w_i descending, take greedily

For 0/1 knapsack, the same greedy strategy fails. Here is the classic counterexample:

Item	Weight	Value	Density
A	10	60	6.0
B	20	100	5.0
C	30	120	4.0

Knapsack capacity: W = 50. Greedy by density picks A (weight 10, value 60), then B (weight 20, value 100). Total weight 30, value 160. Remaining capacity 20 -- item C weighs 30, does not fit. Final: value 160.

But the optimal 0/1 solution is B + C: weight 50, value 220. Greedy missed it because it committed to A too early, and in the 0/1 world you cannot take a fraction of C to fill the gap.

For fractional knapsack, greedy would take all of A (60), all of B (100), and 20/30 of C (80) = total value 240. That is optimal. The ability to take fractions means the greedy choice never wastes capacity.

Let us trace through the fractional knapsack algorithm step by step:

Step	Item	Density	Weight Available	Take	Value Gained	Remaining Capacity
1	A	6.0	50	All 10 kg	60	40
2	B	5.0	40	All 20 kg	100	20
3	C	4.0	20	20/30 = 66.7%	80	0

Total value: 60 + 100 + 80 = 240. The knapsack is perfectly full (50 kg). No capacity is wasted.

Why does the exchange argument work here? Suppose some optimal solution does not take the highest-density item first. Then it must allocate some capacity to a lower-density item instead. Swapping that capacity to the higher-density item increases the value (higher density = more value per kilogram). So the original solution was not optimal — contradiction. The key: "swapping capacity" is possible because items are divisible. In 0/1 knapsack, you cannot transfer a fraction of one item's allocation to another.

Why greedy fails for 0/1 knapsack. The greedy-choice property does not hold. Taking the highest-density item first does NOT guarantee an optimal solution exists that includes it. The exchange argument breaks down: you cannot swap items freely because you cannot split them. This is the boundary between greedy and DP. When items are indivisible, you need DP to consider all combinations.

The DP Solution for 0/1 Knapsack

Since greedy fails for 0/1 knapsack, we need DP. Define dp[i][w] = maximum value achievable using items 1..i with capacity w. The recurrence is:

dp[i][w] = max( dp[i-1][w], dp[i-1][w - w_i] + v_i )

The first term (dp[i-1][w]) corresponds to NOT taking item i. The second term corresponds to taking item i (if it fits: w_i ≤ w). We check both options and take the better one. This is what makes DP fundamentally different from greedy: DP considers both "take" and "don't take" at every step, using a table to avoid recomputation.

python
def knapsack_01(items, W):
    # items: list of (weight, value)
    n = len(items)
    dp = [[0] * (W + 1) for _ in range(n + 1)]

    for i in range(1, n + 1):
        w_i, v_i = items[i - 1]
        for w in range(W + 1):
            dp[i][w] = dp[i - 1][w]  # don't take item i
            if w_i <= w:
                dp[i][w] = max(dp[i][w], dp[i - 1][w - w_i] + v_i)

    return dp[n][W]

# Example: greedy gets 160, DP gets 220
print(knapsack_01([(10,60), (20,100), (30,120)], 50))  # 220

Time complexity: O(nW). Space: O(nW) for the full table (optimizable to O(W) with a rolling array). This is pseudo-polynomial — polynomial in the numeric value of W, not in the number of bits needed to represent W. For small W, this is fast. For W in the billions, you need approximation algorithms.

Fractional vs 0/1 Knapsack — Interactive

The canvas below shows both variants side by side. On the left, fractional knapsack with greedy filling. On the right, 0/1 knapsack showing why greedy fails. Adjust the knapsack capacity with the slider.

Fractional vs 0/1 Knapsack

Left: fractional (greedy works). Right: 0/1 (greedy fails). Green = taken, orange = partial, gray = skipped.

Capacity W 50

The Tree Analogy

Think of solving an optimization problem as navigating a tree of decisions. At each node, you choose which branch to take. The leaves are complete solutions, and you want the leaf with the best value.

Brute force explores the entire tree — every possible combination of choices. Exponential time.

Dynamic programming prunes the tree by recognizing that many branches lead to the same sub-problem. It solves each sub-problem once, stores the result, and reuses it. The tree collapses from exponential to polynomial size.

Greedy does not explore the tree at all. At each node, it takes one branch — the locally optimal one — and never looks at the others. It walks a single path from root to leaf. If the greedy-choice property holds, this one path happens to reach an optimal leaf. If not, it might reach a suboptimal leaf and never know.

Greedy is a bet. You bet that the locally optimal branch is globally optimal. The exchange argument is your proof that the bet always pays off. Without that proof, greedy is just a heuristic — fast but potentially wrong.

The Decision Checklist

When faced with an optimization problem, ask these questions in order:

1. Does the problem have optimal substructure?

Can you express the optimal solution in terms of optimal solutions to smaller sub-problems? If NO, the problem may need a different approach entirely (e.g., network flow, LP relaxation).

↓

2. Does it have the greedy-choice property?

Can you prove (by exchange argument) that a locally optimal choice is always part of a globally optimal solution? If YES → greedy. If NO → continue.

↓

3. Does it have overlapping sub-problems?

Are the same sub-problems solved multiple times in a naive recursion? If YES → DP (memoization or tabulation). If NO → divide-and-conquer suffices.

python
def fractional_knapsack(items, W):
    # items: list of (weight, value) tuples
    # Sort by value density descending
    items.sort(key=lambda x: x[1] / x[0], reverse=True)

    total_value = 0
    remaining = W

    for weight, value in items:
        if remaining <= 0:
            break
        take = min(weight, remaining)  # take as much as fits
        total_value += value * (take / weight)
        remaining -= take

    return total_value

# Example: items = [(10, 60), (20, 100), (30, 120)], W = 50
print(fractional_knapsack([(10,60), (20,100), (30,120)], 50))
# 240.0 — take all A, all B, 2/3 of C

Greedy vs DP: Items are (weight, value): A=(5, 10), B=(4, 40), C=(6, 30), D=(3, 50). Knapsack capacity W = 10. What does the greedy-by-density algorithm select for the 0/1 variant, and is it optimal?

Greedy picks D (50/3=16.7) then B (40/4=10.0), weight=7, value=90. This IS optimal. Greedy picks D then A, weight=8, value=60. Not optimal. Greedy picks D (density 16.7) then B (density 10.0), weight=7, value=90. This IS optimal for this particular input, but that is lucky — the greedy strategy is not guaranteed to be optimal for 0/1 knapsack in general.

Chapter 4: Huffman Codes

You need to compress a file. The file contains characters, each appearing with some frequency. A naive encoding uses a fixed number of bits per character (like ASCII's 8 bits). But if some characters appear much more often than others, this wastes bits. Frequent characters should get short codes; rare characters should get longer ones.

This is exactly what Huffman coding achieves. It is the optimal prefix-free variable-length code for a given character frequency distribution. And it is built by a greedy algorithm.

Prefix-Free Codes

A prefix-free code (sometimes called a "prefix code") is a code where no codeword is a prefix of another codeword. Why does this matter? Because it makes decoding unambiguous. When you read a bitstream, you can identify each codeword immediately without needing a separator or lookahead.

Example of a prefix-free code: a=0, b=10, c=110, d=111. When you read "010110111", you can decode it left to right without ambiguity: 0|10|110|111 = "abcd".

Example of a NON-prefix-free code: a=0, b=01, c=010. Reading "010" is ambiguous: is it "ab" (0, 10)? Wait, 10 is not a code. Is it "a, b, ..." starting with 0, 1, 0? This is a mess. Prefix-free codes avoid this entirely.

Prefix-free codes and binary trees. Every prefix-free code corresponds to a binary tree. Each character is a leaf. Left edges are 0, right edges are 1. The codeword is the path from root to leaf. Because characters are only at leaves, no codeword can be a prefix of another. This is the key insight behind Huffman coding: building the optimal code IS building the optimal tree.

The Huffman Algorithm

David Huffman discovered this algorithm as a term paper assignment in 1952. His professor, Robert Fano, had been working on the same problem with Claude Shannon. Huffman's algorithm beat their approach (Shannon-Fano coding) and became one of the most influential algorithms in computer science.

1. Create a leaf node for each character

Each leaf stores the character and its frequency. Put all leaves in a min-priority queue keyed by frequency.

↓

2. Greedy merge: extract two lowest-frequency nodes

Remove the two nodes with the smallest frequencies from the queue. Call them x and y.

↓

3. Create a new internal node

Make a new node z with frequency = freq(x) + freq(y). Set x as z's left child and y as z's right child.

↓

4. Insert the new node back into the queue

Push z into the priority queue. The queue now has one fewer element.

↻ Repeat steps 2-4 until only one node remains

↓

5. The last node is the root of the Huffman tree

Traverse the tree: left edge = 0, right edge = 1. The path from root to each leaf gives that character's codeword.

Worked Example: "abracadabra"

The string "abracadabra" has 11 characters. Let us count frequencies:

Character	Frequency	Probability
a	5	5/11 = 0.45
b	2	2/11 = 0.18
r	2	2/11 = 0.18
c	1	1/11 = 0.09
d	1	1/11 = 0.09

Now run Huffman's algorithm:

Step 1: Priority queue: c(1), d(1), b(2), r(2), a(5).

Step 2: Extract c(1) and d(1). Merge into [cd](2). Queue: b(2), r(2), [cd](2), a(5).

Step 3: Extract b(2) and r(2). Merge into [br](4). Queue: [cd](2), a(5), [br](4). Wait -- let us re-sort: [cd](2), [br](4), a(5).

Step 4: Extract [cd](2) and [br](4). Merge into [cdbr](6). Queue: a(5), [cdbr](6).

Step 5: Extract a(5) and [cdbr](6). Merge into root(11). Done!

Reading the codes from the tree (left = 0, right = 1, read the path from root to leaf):

Character	Frequency	Huffman Code	Code Length	Bits Used
a	5	0	1	5 × 1 = 5
b	2	110	3	2 × 3 = 6
r	2	111	3	2 × 3 = 6
c	1	100	3	1 × 3 = 3
d	1	101	3	1 × 3 = 3

Total bits: 5 + 6 + 6 + 3 + 3 = 23 bits. Fixed-length encoding would need ⌈log₂(5)⌉ = 3 bits per character × 11 = 33 bits. Huffman saves 30%.

Notice the elegant tradeoff: the most frequent character (a, appearing 5 times) gets a 1-bit code. The rarest characters (c and d, appearing once each) get 3-bit codes. Frequent characters are close to the root; rare characters are deep in the tree. This is exactly what minimizes the total bit count.

Why Huffman is greedy. At each step, we make the locally optimal choice: merge the two least-frequent nodes. We never reconsider this choice. The greedy-choice property holds because the two least-frequent characters must be siblings at the deepest level of any optimal tree (we prove this in Chapter 5). By merging them first, we are placing them at the bottom of the tree — exactly where they belong.

The encoded string "abracadabra" becomes: 0 110 111 0 100 0 101 0 110 111 0 = 23 bits. To decode, read bit by bit from the root: 0 → leaf a. Reset to root. 1 → go right, 1 → go right, 0 → leaf b. Reset. And so on. The prefix-free property ensures this always works without ambiguity.

Complexity Analysis

Building a Huffman tree processes n characters (where n is the alphabet size, not the text length). The algorithm performs n-1 merge operations. Each merge requires two extract-min operations and one insert, each taking O(log n) on a binary min-heap. Total: O(n log n).

If the frequencies are pre-sorted, a clever implementation using two queues can build the tree in O(n) time. Queue 1 holds the original leaves (sorted). Queue 2 holds merged nodes (automatically sorted because merge order is monotonic). At each step, pick the two smallest from the fronts of both queues. This is the van Leeuwen optimization, rarely needed in practice but elegant in theory.

Operation	With Binary Heap	With Two Queues (pre-sorted)
Building the tree	O(n log n)	O(n)
Encoding the text	O(m) where m = text length	O(m)
Decoding	O(m) — walk tree per bit	O(m)
Total (build + encode)	O(n log n + m)	O(n + m)

Interactive Huffman Tree Builder

Enter character frequencies below and watch the Huffman tree build bottom-up. The algorithm merges the two smallest nodes at each step.

Huffman Tree Builder

Watch the tree grow from leaves to root. Each merge step is shown. The resulting code table appears below the tree.

Enter text and click "Build Tree" to watch the Huffman algorithm.

python
import heapq
from collections import Counter

class HuffNode:
    def __init__(self, char, freq, left=None, right=None):
        self.char = char
        self.freq = freq
        self.left = left
        self.right = right
    def __lt__(self, other):
        return self.freq < other.freq

def huffman_tree(text):
    freq = Counter(text)
    heap = [HuffNode(ch, f) for ch, f in freq.items()]
    heapq.heapify(heap)

    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)
        merged = HuffNode(None, left.freq + right.freq, left, right)
        heapq.heappush(heap, merged)

    return heap[0]  # root of Huffman tree

def huffman_codes(root, prefix="", codes=None):
    if codes is None: codes = {}
    if root.char is not None:
        codes[root.char] = prefix or "0"  # single-char edge case
        return codes
    huffman_codes(root.left, prefix + "0", codes)
    huffman_codes(root.right, prefix + "1", codes)
    return codes

root = huffman_tree("abracadabra")
codes = huffman_codes(root)
for ch, code in sorted(codes.items()):
    print(f"{ch}: {code}")
# a: 0, b: 110, c: 100, d: 101, r: 111 (or similar — tree shape may vary)

Edge Cases and Practical Considerations

Single-character alphabet: If the text contains only one unique character (e.g., "aaaa"), the Huffman tree has a single leaf. The convention is to assign it the code "0" (1 bit per character). Some implementations handle this by creating a dummy second character.

Equal frequencies: When multiple characters have the same frequency, the algorithm's behavior depends on tie-breaking. Different tie-breaking strategies produce different trees (different code assignments) but ALL have the same optimal cost B(T). Huffman coding is not unique in its tree structure, but it IS unique in its cost.

Empty input: Edge case. No characters to encode. Return empty code table. Your implementation should handle this gracefully.

Two-character alphabet: Characters get codes "0" and "1" — one bit each. Huffman reduces to the most basic possible encoding. Still optimal.

Large alphabets: With n = 256 characters (byte-level encoding), the Huffman tree has up to 255 internal nodes and 256 leaves. Building it takes O(256 log 256) = O(256 · 8) = ~2048 operations. Negligible. The encoding/decoding of the actual data (potentially gigabytes) dominates.

Transmitting the tree. When you compress a file with static Huffman, you must store the code table (or the tree) at the beginning of the compressed file so the decoder can reconstruct it. The overhead is typically 256-512 bytes for a byte-level alphabet. For very small files (under ~1KB), this overhead can negate the compression gains. This is why most practical compressors (gzip, zstd) use a combination of predetermined tables and custom tables, choosing based on the data size.

Huffman construction: Given characters with frequencies A:3, B:3, C:1, D:1, E:1. The first merge step combines which two nodes?

A and B (they have equal and highest frequencies) Any two of C, D, E (they have the lowest frequencies, 1 each — the algorithm picks the two smallest) C and A (smallest and largest)

Chapter 5: Huffman Deep Dive

We have built Huffman trees. Now let us understand why Huffman coding is optimal, how it relates to information theory, and what its limits are.

Cost of a Code

The cost of a prefix-free code is the expected number of bits per character. If character c_i has frequency f_i and depth d_i in the tree (depth = codeword length), the cost is:

B(T) = ∑_i f_i · d_i

Huffman coding minimizes B(T) over all possible prefix-free codes. No other prefix-free code can achieve a lower expected bits per character for the given frequency distribution.

Proof of Optimality (Sketch)

The proof uses two key lemmas:

Lemma 1: In an optimal tree, the two characters with the lowest frequencies are siblings at the maximum depth. Proof by exchange: if they were not at maximum depth, some higher-frequency character would be deeper, and swapping would reduce the cost (lower frequency × greater depth versus higher frequency × lesser depth).

Lemma 2 (Optimal Substructure): Let T be an optimal tree for alphabet C with |C| ≥ 2. Let x and y be the two least-frequent characters, siblings in T at maximum depth. Let C' = C \ {x, y} ∪ {z} where z has frequency f(z) = f(x) + f(y). Let T' be the tree for C' obtained by replacing the subtree containing x and y with the single leaf z. Then T' is an optimal tree for C'.

Proof of Lemma 2: B(T) = B(T') + f(x) + f(y). This is because x and y each contribute their frequency times one extra level of depth compared to z. If T' were not optimal for C', there would exist a better tree T'' for C'. Expanding z in T'' back into x and y would give a tree for C with cost B(T'') + f(x) + f(y) < B(T') + f(x) + f(y) = B(T), contradicting T's optimality.

Together: Huffman's greedy choice (merge the two least-frequent) satisfies the greedy-choice property (Lemma 1) and optimal substructure (Lemma 2). Therefore the algorithm is correct. The proof structure is identical to activity selection: exchange argument for the first choice, induction on the sub-problem size.

The deep connection. Both activity selection and Huffman coding follow the same meta-proof: (1) prove the greedy choice is safe (exchange/swap argument), (2) prove the sub-problem has the same structure (optimal substructure), (3) conclude by induction. This is the universal pattern for proving greedy correctness. If you can execute these three steps, you have a rigorous proof.

Connection to information theory. Shannon's source coding theorem says the optimal average code length is bounded below by the entropy H = -∑ p_i log₂(p_i). Huffman coding achieves H ≤ B(T) < H + 1 bit per symbol. So Huffman is always within 1 bit of the theoretical optimum. For long blocks or arithmetic coding, you can get arbitrarily close to H, but for single-character codes, Huffman is the best you can do.

Huffman vs Fixed-Length — Bit Comparison

The canvas below shows the same text encoded with a fixed-length code and a Huffman code. The height of each bar represents the number of bits used. Frequent characters get shorter Huffman codes (shorter bars), rare characters get longer ones.

Huffman vs Fixed-Length Encoding

Each bar = one character occurrence. Height = bits used for that occurrence. Orange = Huffman, gray = fixed-length. Total bits shown below.

Enter text and click "Compare" to see bit savings.

Entropy Calculation

For "abracadabra" with probabilities a=5/11, b=2/11, r=2/11, c=1/11, d=1/11:

H = -(5/11)log₂(5/11) - 2(2/11)log₂(2/11) - 2(1/11)log₂(1/11)
= -(0.455)(−1.138) - 2(0.182)(−2.459) - 2(0.091)(−3.459)
= 0.518 + 0.895 + 0.629
= 2.04 bits per character

Huffman average: 23 bits / 11 characters = 2.09 bits per character. That is within 0.05 bits of the entropy — nearly perfect for a per-character code.

Fixed-length average: 33 bits / 11 characters = 3.0 bits per character. Huffman saves 30% over fixed-length and comes within a rounding error of the theoretical minimum.

The Formal Bounds

Shannon's source coding theorem gives us a tight sandwich:

H(X) ≤ L_Huffman < H(X) + 1

where H(X) = -∑ p_i log₂(p_i) is the entropy and L_Huffman is the expected Huffman code length. The lower bound says no prefix-free code can beat entropy. The upper bound says Huffman is never more than 1 bit above entropy per symbol.

When does the gap approach 1 bit? When one character has very high probability. If p₁ = 0.95 and there is one other character with p₂ = 0.05, the entropy is about 0.286 bits/symbol. But Huffman must use at least 1 bit for the frequent character (since the two codewords must be "0" and "1"). So L = 0.95(1) + 0.05(1) = 1.0 bits/symbol. The gap is 0.714 bits — close to the theoretical maximum of 1.

When does the gap vanish? When all probabilities are negative powers of 2. If you have 4 characters each with probability 1/4, entropy is exactly 2.0 bits and Huffman produces a balanced tree with exactly 2 bits per character. The gap is zero.

Huffman in Practice

Huffman coding is everywhere:

Format/Standard	How Huffman Is Used
DEFLATE (zip, gzip, PNG)	Huffman codes the output of LZ77 compression. Two Huffman trees: one for literal/length symbols, one for distance symbols.
JPEG	Huffman codes the quantized DCT coefficients. Two tables: one for DC coefficients, one for AC.
MP3	Huffman codes the quantized frequency-domain samples. 32 predefined Huffman tables selected per granule.
HTTP/2 (HPACK)	Huffman codes HTTP header strings. Uses a fixed Huffman table derived from web traffic statistics.

In modern compressors like zstd and brotli, Huffman has been partly replaced by Asymmetric Numeral Systems (ANS), which achieves closer-to-entropy coding with similar speed. But Huffman remains the standard for simplicity-critical applications.

When Huffman is suboptimal. Huffman assigns an integer number of bits to each character. If a character has probability 0.9, its optimal code length is -log₂(0.9) = 0.15 bits, but Huffman must use at least 1 bit. For very skewed distributions, arithmetic coding or Asymmetric Numeral Systems (ANS) can approach the entropy limit more closely by coding sequences of symbols jointly rather than one at a time. But for practical applications with reasonable distributions, Huffman is simple, fast, and nearly optimal.

Adaptive vs Static Huffman

The algorithm we described is static Huffman: you scan the entire input to compute frequencies, build the tree, then encode. This requires two passes over the data and you must transmit the tree (or frequency table) alongside the compressed data.

Adaptive (dynamic) Huffman (Vitter's algorithm) builds the tree on-the-fly, updating frequencies and restructuring the tree as each symbol is processed. The decoder mirrors this process, so no separate tree transmission is needed. Adaptive Huffman is more complex to implement but useful for streaming data where you cannot make two passes.

Decoding Walkthrough

Let us decode a Huffman-encoded bitstream by hand. Using our "abracadabra" tree (a=0, c=100, d=101, b=110, r=111), decode the bitstream: 0 110 111 0 100 0 101 0 110 111 0.

Start at the root. Read one bit at a time:

Bits Read	Tree Walk	Output
0	Root → left → LEAF 'a'	a
1	Root → right	(continue)
1	→ right	(continue)
0	→ left → LEAF 'b'	b
1	Root → right	(continue)
1	→ right	(continue)
1	→ right → LEAF 'r'	r
0	Root → left → LEAF 'a'	a
1	Root → right	(continue)
0	→ left	(continue)
0	→ left → LEAF 'c'	c

And so on. The prefix-free property guarantees that we always know exactly when a codeword ends: it ends when we reach a leaf. No lookahead, no backtracking. The decoder runs in O(m) time where m is the number of bits in the encoded stream.

python
def huffman_decode(root, bitstream):
    # bitstream: string of '0' and '1'
    result = []
    node = root
    for bit in bitstream:
        if bit == '0':
            node = node.left
        else:
            node = node.right
        if node.char is not None:  # leaf
            result.append(node.char)
            node = root  # reset to root
    return ''.join(result)

Extended Huffman: Beyond Characters

Huffman coding is not limited to characters. You can Huffman-encode any discrete symbol set. In JPEG, the "symbols" are run-length-encoded DCT coefficients. In DEFLATE, the symbols are literal bytes, match lengths, and match distances. The algorithm is the same — only the alphabet changes.

For very long texts, you can extend Huffman to blocks of characters. Instead of encoding single characters, encode pairs (bigrams), triples (trigrams), or variable-length blocks. This approaches the entropy limit more closely because it captures inter-character dependencies. In practice, modern compressors achieve near-entropy coding through other means (LZ77 + Huffman in DEFLATE, ANS in zstd).

Theory check: A text has 4 characters each with probability exactly 1/4. What is the entropy, and what does Huffman coding achieve?

Entropy = 2 bits/char. Huffman achieves exactly 2 bits/char (same as fixed-length with 2 bits), because when all frequencies are equal, Huffman produces a balanced tree identical to fixed-length encoding. Entropy = 4 bits/char. Huffman achieves 3 bits/char. Entropy = 1 bit/char. Huffman achieves 1 bit/char.

Chapter 6: Greedy Showcase

This is the big payoff. Four classic greedy problems in one interactive playground. Select a problem, configure the input, and watch the greedy algorithm work. For coin change, see exactly when and why greedy fails.

Greedy Algorithm Playground

Select a problem below, adjust parameters, and watch the greedy strategy execute step by step.

Activities: 10

Activity Selection mode. Click "Run" to watch the greedy algorithm.

The coin change problem is particularly instructive. With US coin denominations [25, 10, 5, 1], greedy always works: to make change for 30 cents, take one quarter (25) and one nickel (5). But with denominations [1, 3, 4], greedy fails for amount 6: greedy picks 4+1+1 = 3 coins, but optimal is 3+3 = 2 coins. The canvas shows this failure vividly.

Why greedy fails for general coin change. The greedy-choice property requires that picking the largest coin first always leads to an optimal solution. For US coins this works because the denominations were designed that way (each denomination roughly doubles the next). For arbitrary denominations, picking the largest coin can force you into a suboptimal path. This is exactly when you need DP: dp[i] = min_{c ∈ coins}(dp[i - c]) + 1.

Try different amounts with the "Bad Coins [1,3,4]" button. Amount 6 is the classic failure case. Amount 12 is another: greedy gives 4+4+4 = 3 coins, optimal is also 4+4+4 = 3 coins (greedy happens to match). Amount 10: greedy gives 4+4+1+1 = 4 coins, but optimal is 3+3+4 = 3 coins. The unpredictability is the point — without the greedy-choice property, you cannot trust the greedy answer.

The Coin Change DP

Since greedy fails for arbitrary denominations, here is the DP solution. Define dp[i] as the minimum number of coins to make amount i. The recurrence is beautifully simple:

dp[0] = 0, dp[i] = min_{c ∈ coins, c ≤ i}( dp[i - c] + 1 )

For each amount i, we try every coin denomination c. If we use coin c, we need dp[i - c] additional coins for the remaining amount, plus 1 for the coin we just used. We take the minimum over all valid coins.

python
def coin_change_dp(coins, amount):
    dp = [float('inf')] * (amount + 1)
    dp[0] = 0
    for i in range(1, amount + 1):
        for c in coins:
            if c <= i and dp[i - c] + 1 < dp[i]:
                dp[i] = dp[i - c] + 1
    return dp[amount] if dp[amount] != float('inf') else -1

# coins = [1,3,4], amount = 6
# dp: [0, 1, 2, 1, 1, 2, 2]
# dp[6] = 2 (coins 3+3), greedy would give 4+1+1 = 3
print(coin_change_dp([1,3,4], 6))  # 2

Let us trace through dp for coins [1, 3, 4] and amount 6:

i	Try c=1	Try c=3	Try c=4	dp[i]	Coins used
0	—	—	—	0	—
1	dp[0]+1=1	—	—	1	1
2	dp[1]+1=2	—	—	2	1+1
3	dp[2]+1=3	dp[0]+1=1	—	1	3
4	dp[3]+1=2	dp[1]+1=2	dp[0]+1=1	1	4
5	dp[4]+1=2	dp[2]+1=3	dp[1]+1=2	2	4+1
6	dp[5]+1=3	dp[3]+1=2	dp[2]+1=3	2	3+3

DP finds dp[6] = 2 (using coins 3+3). Greedy would have picked 4 first, then needed 1+1 for the remaining 2, giving 3 coins. The DP table considers all possibilities at each step — it never commits prematurely.

Putting It All Together

This showcase demonstrates the core lesson of CLRS Chapter 16. Greedy algorithms are:

Property	What It Means
Simple to implement	Sort + single pass. Rarely more than 15 lines of code.
Fast	O(n log n) in most cases, dominated by sorting.
Fragile	Only work when the greedy-choice property holds. Small changes to the problem (fractional → 0/1, US coins → arbitrary coins) can break them completely.
Provable	When they work, you can prove optimality via the exchange argument. This is what separates greedy algorithms from greedy heuristics.

Chapter 7: More Greedy Applications

Greedy algorithms appear across computer science. Here are the major applications beyond what we have covered, each using the same pattern: make a locally optimal choice, prove it is globally safe.

Job Scheduling with Deadlines

You have n jobs, each with a deadline d_i and a penalty w_i if the job is late. Each job takes unit time. Schedule jobs to minimize total penalty.

The greedy strategy: schedule jobs in order of decreasing penalty (highest penalty first). For each job, place it in the latest available slot before its deadline. If no slot is available, the job is late.

python
def job_scheduling(jobs):
    # jobs: list of (id, deadline, penalty)
    # Sort by penalty descending
    jobs.sort(key=lambda x: x[2], reverse=True)
    n = len(jobs)
    max_deadline = max(d for _, d, _ in jobs)
    slots = [None] * (max_deadline + 1)  # 1-indexed
    scheduled = []
    late = []

    for job_id, deadline, penalty in jobs:
        # Try to place in latest slot <= deadline
        placed = False
        for t in range(min(deadline, max_deadline), 0, -1):
            if slots[t] is None:
                slots[t] = job_id
                scheduled.append(job_id)
                placed = True
                break
        if not placed:
            late.append((job_id, penalty))

    total_penalty = sum(p for _, p in late)
    return scheduled, late, total_penalty

Why does this greedy strategy work? The exchange argument: suppose the optimal schedule does not include the highest-penalty job j in its on-time set. If j can be swapped into the schedule (displacing a lower-penalty job), the total penalty decreases. By processing jobs in penalty-descending order and greedily assigning each to the latest available slot, we ensure the highest-value jobs get first priority for on-time slots.

With a naive implementation, assigning each job takes O(n) to find a free slot, giving O(n²) total. Using a Union-Find data structure (same one used in Kruskal's MST), we can find the latest available slot in nearly O(1) amortized time, giving O(n log n) overall (dominated by the initial sort).

Minimum Spanning Trees (Preview of CLRS Ch. 23)

Given a weighted, connected graph G = (V, E, w), find a spanning tree T (a connected, acyclic subgraph containing all vertices) that minimizes the total edge weight ∑_{e ∈ T} w(e). Two classic algorithms, both greedy:

Kruskal's algorithm: Sort all edges by weight. Process edges from lightest to heaviest. Add each edge to the tree if it does not create a cycle (checked using Union-Find). Stop when you have |V|-1 edges. The greedy choice: always pick the cheapest safe edge globally. Time: O(E log E) for sorting + nearly O(E) for union-find operations with path compression and union by rank.

Prim's algorithm: Start from any vertex. Maintain a priority queue of edges crossing the "cut" between tree vertices and non-tree vertices. Repeatedly extract the cheapest such edge and add the non-tree endpoint to the tree. Time: O(E log V) with a binary heap, or O(E + V log V) with a Fibonacci heap.

Both work because the MST problem has the greedy-choice property, established by the cut property: for any cut of the graph (any partition of vertices into two non-empty sets), the minimum-weight edge crossing the cut is in some MST. This is the MST exchange argument: if the lightest crossing edge were not in some MST T, adding it creates a cycle. That cycle must contain another edge crossing the same cut. Removing that other edge (which is heavier) gives a spanning tree with lower total weight — contradicting T's optimality.

python
# Kruskal's MST with Union-Find
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # path compression
        return self.parent[x]

    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py: return False
        if self.rank[px] < self.rank[py]: px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]: self.rank[px] += 1
        return True

def kruskal(n, edges):
    # edges: list of (weight, u, v)
    edges.sort()  # sort by weight
    uf = UnionFind(n)
    mst = []
    for w, u, v in edges:
        if uf.union(u, v):
            mst.append((w, u, v))
            if len(mst) == n - 1: break
    return mst

Dijkstra's Shortest Paths (Preview of CLRS Ch. 24)

Find shortest paths from a source vertex to all other vertices in a graph with non-negative edge weights. Dijkstra's algorithm is greedy: at each step, extract the vertex with the smallest tentative distance and finalize it. The greedy choice is safe because edge weights are non-negative -- no future path can improve on the shortest tentative distance.

python
import heapq

def dijkstra(graph, source):
    # graph: dict of {node: [(neighbor, weight), ...]}
    dist = {node: float('inf') for node in graph}
    dist[source] = 0
    pq = [(0, source)]  # (distance, node)
    visited = set()

    while pq:
        d, u = heapq.heappop(pq)
        if u in visited: continue
        visited.add(u)
        for v, w in graph[u]:
            if d + w < dist[v]:
                dist[v] = d + w
                heapq.heappush(pq, (dist[v], v))

    return dist

Negative edges break Dijkstra. If edge weights can be negative, the greedy-choice property fails. A vertex finalized early might later be reachable by a cheaper path through a negative edge. Dijkstra's exchange argument depends on the invariant: "if u is the unvisited vertex with smallest tentative distance, no future path to u can be shorter." With negative edges, a longer path through more vertices could discover a negative edge that makes it shorter. This is why Bellman-Ford (a DP-style algorithm that relaxes all edges V-1 times) is needed for graphs with negative weights.

Set Cover (Greedy Approximation)

Given a universe U of elements and a collection of subsets S₁, S₂, ..., S_n whose union is U, find the minimum number of subsets that cover all elements in U. This problem is NP-hard — no polynomial-time exact algorithm is known unless P = NP.

But the greedy strategy gives a remarkably good approximation. The greedy set cover algorithm:

1. Initialize uncovered = U

All elements are uncovered.

↓

2. While uncovered is non-empty

Pick the subset S_i that covers the most uncovered elements. Add S_i to the solution. Remove its elements from uncovered.

This greedy algorithm uses at most H(|U|) = ln(|U|) + 1 times as many sets as the optimal solution, where H(n) is the n-th harmonic number. This is tight: there are instances where greedy uses exactly H(n) times optimal. Moreover, under standard complexity assumptions, no polynomial algorithm can achieve a better ratio than (1 - ε) ln(|U|). So greedy is essentially the best polynomial algorithm for set cover.

python
def greedy_set_cover(universe, subsets):
    # universe: set of elements
    # subsets: list of sets
    uncovered = set(universe)
    chosen = []
    while uncovered:
        # Pick subset covering the most uncovered elements
        best = max(subsets, key=lambda s: len(s & uncovered))
        chosen.append(best)
        uncovered -= best
    return chosen

This is a greedy approximation algorithm: it does not guarantee optimality, but it guarantees a bounded suboptimality. Many NP-hard problems admit greedy approximations with provable bounds.

Greedy Approximation Algorithms

When a problem is NP-hard, we cannot hope for a polynomial-time exact algorithm (assuming P ≠ NP). But greedy can often provide approximation guarantees: a provable bound on how far from optimal the greedy solution can be.

Problem	Greedy Strategy	Approximation Ratio	Is This Tight?
Set cover	Cover most uncovered elements	H(n) ≈ ln(n) + 1	Yes — cannot do better in poly time
Vertex cover	Pick endpoints of maximal matching	2 (not greedy per se, but simple)	Tight for matching-based approach
TSP (metric)	Nearest neighbor	O(log n) in theory, ~1.25 in practice	Christofides gives 1.5
Bin packing	First Fit Decreasing	11/9 · OPT + 6/9	Nearly tight
Max-cut	Random or greedy flip	1/2	SDP-based: 0.878 (Goemans-Williamson)

The key insight: even when greedy is not optimal, it is often the simplest algorithm with a non-trivial approximation guarantee. This makes greedy the go-to starting point for NP-hard optimization problems. You can always improve later with more sophisticated techniques (LP relaxation, semidefinite programming, local search), but greedy gives you a baseline with minimal effort.

In interviews for systems or ML roles, you will rarely need to implement these approximation algorithms. But knowing that greedy set cover exists and has an O(ln n) guarantee is the kind of algorithmic literacy that separates strong candidates from average ones in system design discussions about, say, feature selection or resource allocation.

The relationship between exact and approximate greedy is important to understand:

Exact greedy: The exchange argument proves the greedy solution is OPTIMAL. Examples: activity selection, Huffman, fractional knapsack, MST, Dijkstra. You get the best possible answer.

Approximate greedy: The analysis proves a BOUND on suboptimality. Examples: set cover (O(ln n)), vertex cover (2x), bin packing (11/9). You get a provably good answer, but not necessarily the best.

Both are greedy in strategy (make local choices, never backtrack). The difference is in what you can prove about the outcome.

Problem	Greedy Strategy	Complexity	Optimal?
Activity selection	Earliest finish time	O(n log n)	Yes
Huffman coding	Merge two lowest frequencies	O(n log n)	Yes
Fractional knapsack	Highest value density first	O(n log n)	Yes
Job scheduling	Highest penalty first	O(n²) naive, O(n log n) with union-find	Yes
Kruskal's MST	Lightest non-cycle edge	O(E log E)	Yes
Prim's MST	Cheapest edge to non-tree vertex	O(E log V)	Yes
Dijkstra's paths	Smallest tentative distance	O(E log V)	Yes (non-negative weights)
Coin change (US coins)	Largest denomination first	O(n)	Yes (US coins only)
Set cover	Cover most uncovered elements	O(\|U\| · n)	No — O(ln n) approx

Matroids: The Unifying Theory (CLRS 16.4)

CLRS Chapter 16.4 introduces matroids — the abstract mathematical structure that unifies ALL greedy algorithms that produce optimal solutions. A matroid is a pair M = (S, I) where S is a finite set and I is a family of "independent" subsets satisfying three axioms:

1. Hereditary

If B ∈ I and A ⊆ B, then A ∈ I. (Every subset of an independent set is independent.)

↓

2. Non-empty

∅ ∈ I. (The empty set is always independent.)

↓

3. Exchange

If A, B ∈ I and |A| < |B|, then ∃ x ∈ B\A such that A ∪ {x} ∈ I. (A smaller independent set can always be enlarged by adding an element from a larger one.)

The fundamental theorem: a greedy algorithm on a weighted matroid always produces an optimal solution. The algorithm: sort elements by weight (descending for maximization), and greedily add each element if it keeps the set independent.

Examples of matroids in disguise:

Problem	Ground Set S	Independent Sets I	Weight
MST (Kruskal)	Edges of the graph	Acyclic subsets (forests)	Negative edge weight (minimize)
Job scheduling	Jobs	Sets of jobs schedulable before deadlines	Penalties
Maximum weight forest	Edges	Acyclic subsets	Edge weight

Not every greedy-solvable problem is a matroid (Huffman coding is not, for instance), but matroids provide a powerful lens for understanding when and why greedy works. If you can show your problem's feasible sets form a matroid, optimality of the greedy approach follows immediately from the theorem.

The matroid framework also explains why greedy FAILS on certain problems. The 0/1 knapsack feasible sets ({items with total weight ≤ W}) do NOT form a matroid. Specifically, the exchange axiom fails: if A = {item1} with weight 10 and B = {item2, item3} each with weight 5, and W = 10, then |A| < |B| but we cannot add either element of B to A without exceeding W. The exchange axiom is violated, so this is not a matroid, and greedy is not guaranteed optimal.

The practical takeaway: you do not need to verify matroid axioms in an interview. But understanding that matroids are the theoretical foundation of greedy optimality deepens your intuition for when greedy will and will not work. Problems with "capacity constraints" that cannot be decomposed (like 0/1 knapsack) usually fail the matroid test. Problems where "adding one more element is always possible if the set is small enough" (like forests in a graph) usually pass it.

Interactive: Job Scheduling

Job Scheduling with Deadlines

Jobs sorted by penalty (highest first). Green slots = scheduled on time. Red = late (penalty incurred). Click "New Jobs" to regenerate.

Click "Schedule" to run the greedy job scheduler.

Pattern recognition: Why does Dijkstra's algorithm require non-negative edge weights for correctness?

Because the priority queue does not support negative keys Because once Dijkstra finalizes a vertex's distance, it never revisits it. With negative edges, a finalized vertex could later be reached more cheaply through a negative-weight edge, violating the greedy-choice property. Because negative edges create infinite loops

Chapter 8: Interview Arsenal

Greedy problems are among the most common in coding interviews. They are fast to implement once you recognize the pattern, but recognizing the pattern is the hard part. This chapter is your cheat sheet.

The "Is It Greedy?" Checklist

1. Can I sort the input?

Most greedy solutions start with sorting by some criterion (finish time, density, deadline, etc.). If sorting reveals a natural ordering, greedy is likely.

↓

2. Can I make a local choice that never needs to be undone?

If at each step you can commit to a choice without ever needing to backtrack, that is the hallmark of greedy. If you might need to undo choices, you likely need DP or backtracking.

↓

3. Can I prove the exchange argument?

Take any optimal solution. If you can swap in your greedy choice without making it worse, the greedy-choice property holds. If you cannot, try DP.

↓

4. Does the problem shrink after each choice?

After making the greedy choice, is the remaining problem a smaller instance of the same type? This is optimal substructure. If yes and steps 1-3 check out, code the greedy solution.

The Three Greedy Families

Most greedy interview problems fall into three families. Learn the family, and you can solve any member:

Family 1: Interval scheduling. You have intervals. You want to select, merge, partition, or cover them optimally. The universal technique: sort by one endpoint (usually the end), then scan.

Family 2: Greedy on sorted arrays. Sort the input by some criterion, then make one pass making irrevocable decisions. Jump Game, Gas Station, Candy, Task Scheduler.

Family 3: Priority-queue greedy. Maintain a heap and repeatedly extract the optimal element. Huffman coding, Dijkstra, Prim's MST, merge K sorted lists, reorganize string (LC #767).

Recognizing which family a problem belongs to is 80% of the solution. The remaining 20% is choosing the right sorting criterion or heap key.

LeetCode Pattern: Interval Problems

A huge family of greedy problems involves intervals. The core technique is always the same: sort by one endpoint, then scan.

Problem	Sort By	Greedy Rule	LeetCode
Max non-overlapping intervals	End time	Pick if start ≥ last end	#435
Min intervals to remove	End time	Count overlaps = n - max non-overlapping	#435
Merge intervals	Start time	Extend end if overlap	#56
Min meeting rooms (interval partitioning)	Start time + min-heap of end times	Reuse room if earliest end ≤ current start	#253
Insert interval	Already sorted	Binary search + merge	#57
Min arrows to burst balloons	End coordinate	Same as activity selection	#452

Coding Drill: Meeting Rooms II (Interval Partitioning)

This is the "dual" of activity selection. Instead of maximizing activities in one room, minimize the number of rooms needed for all activities. Greedy solution: sort by start time, use a min-heap to track when each room becomes free.

python
# LeetCode 253: Meeting Rooms II
# Minimum number of conference rooms required
import heapq

def minMeetingRooms(intervals):
    if not intervals: return 0
    intervals.sort(key=lambda x: x[0])  # sort by start
    rooms = []  # min-heap of end times
    for start, end in intervals:
        if rooms and rooms[0] <= start:
            heapq.heapreplace(rooms, end)  # reuse room
        else:
            heapq.heappush(rooms, end)  # need new room
    return len(rooms)

The greedy choice: for each meeting, check if the earliest-freeing room is available. If yes, reuse it (greedy — do not allocate a new room when an existing one works). If no, allocate a new room. The heap tracks end times, so `rooms[0]` is always the room that frees up soonest.

Why is this optimal? The number of rooms needed equals the maximum number of simultaneously occurring meetings. This is a lower bound — no algorithm can do better. Our greedy approach achieves this lower bound because it reuses rooms as aggressively as possible (always checking the earliest-freeing room first).

An alternative approach uses a sweep line: create events for each start (+1) and end (-1), sort them, scan left to right tracking the running count. The peak count is the answer. Both approaches are O(n log n).

The sweep line approach is particularly elegant because it makes the lower bound obvious. At any point in time, the number of concurrent meetings is a lower bound on rooms needed (you cannot schedule two concurrent meetings in one room). The peak concurrency is therefore a lower bound. The greedy approach achieves this bound, so it is optimal.

python
# Sweep-line alternative
def minMeetingRoomsSweep(intervals):
    events = []
    for s, e in intervals:
        events.append((s, 1))   # meeting starts
        events.append((e, -1))  # meeting ends
    events.sort()  # ties: ends before starts (process -1 before +1)
    peak = curr = 0
    for _, delta in events:
        curr += delta
        peak = max(peak, curr)
    return peak

LeetCode Pattern: Greedy on Arrays

Problem	Greedy Insight	LeetCode
Jump Game	Track farthest reachable index	#55
Jump Game II	BFS-style: current reach, next reach	#45
Gas Station	If total gas ≥ total cost, start from where tank would be most empty	#134
Candy	Two passes: left-to-right and right-to-left	#135
Task Scheduler	Most frequent task determines idle slots	#621
Partition Labels	Track last occurrence of each char, extend partition	#763

Coding Drill: Activity Selection

python
# LeetCode 435: Non-overlapping Intervals
# Given intervals, return min number to remove for no overlaps
def eraseOverlapIntervals(intervals):
    intervals.sort(key=lambda x: x[1])  # sort by end
    count = 0
    last_end = float('-inf')
    for start, end in intervals:
        if start >= last_end:
            last_end = end   # keep this interval
        else:
            count += 1       # remove this interval
    return count

Coding Drill: Jump Game

python
# LeetCode 55: Jump Game
# Can you reach the last index?
def canJump(nums):
    farthest = 0
    for i in range(len(nums)):
        if i > farthest:
            return False  # can't reach here
        farthest = max(farthest, i + nums[i])
    return True

Coding Drill: Jump Game II (Minimum Jumps)

python
# LeetCode 45: Jump Game II
# Minimum number of jumps to reach the last index
def jump(nums):
    jumps = 0
    current_end = 0   # farthest we can reach with current jumps
    farthest = 0      # farthest we can reach with one more jump
    for i in range(len(nums) - 1):
        farthest = max(farthest, i + nums[i])
        if i == current_end:  # must jump now
            jumps += 1
            current_end = farthest
            if current_end >= len(nums) - 1:
                break
    return jumps

This is a BFS-style greedy. Think of each jump as a "level" in BFS. At each level, you explore all positions reachable from the current level and compute the farthest position reachable at the next level. When you exhaust the current level (i == current_end), you increment jumps and extend to the next level (current_end = farthest). This guarantees minimum jumps because BFS always finds shortest paths in unweighted graphs.

Coding Drill: Gas Station

python
# LeetCode 134: Gas Station
# Find starting station for circular route, or -1
def canCompleteCircuit(gas, cost):
    if sum(gas) < sum(cost):
        return -1  # impossible
    tank = 0
    start = 0
    for i in range(len(gas)):
        tank += gas[i] - cost[i]
        if tank < 0:
            start = i + 1  # can't start at or before i
            tank = 0
    return start

The greedy insight has two parts. First: if total gas ≥ total cost, a valid starting station exists (provable by considering the cumulative surplus around the circuit — there must be a point where the running total is at its minimum, and starting just after that point keeps the tank non-negative throughout). Second: if starting from station s causes the tank to go negative at station k, then no station between s and k can be a valid start either. Why? The prefix sum from s to any intermediate station j (where s < j ≤ k) is non-negative (we had not yet gone negative at j). Starting from j gives a prefix sum at k equal to (prefix from j to k) = (prefix from s to k) - (prefix from s to j) ≤ (negative) - (non-negative) = still negative. So we skip all of them and try station k+1 next. One pass, O(n), no backtracking.

This is a subtle but powerful greedy argument. The "skip ahead" step is the key — it discards multiple candidate starting stations at once based on local information. This is characteristic of many greedy array problems: the greedy algorithm processes each element at most once, but each step may eliminate multiple future candidates.

Coding Drill: Fractional Knapsack

python
# Fractional knapsack — greedy is optimal
def fractionalKnapsack(items, capacity):
    # items: [(weight, value), ...]
    items.sort(key=lambda x: x[1]/x[0], reverse=True)
    total = 0
    for w, v in items:
        if capacity <= 0: break
        take = min(w, capacity)
        total += v * (take / w)
        capacity -= take
    return total

Coding Drill: Partition Labels

python
# LeetCode 763: Partition Labels
# Partition string so each letter appears in at most one part
def partitionLabels(s):
    last = {ch: i for i, ch in enumerate(s)}
    partitions = []
    start = end = 0
    for i, ch in enumerate(s):
        end = max(end, last[ch])  # extend partition
        if i == end:  # reached the end of current partition
            partitions.append(end - start + 1)
            start = i + 1
    return partitions

# "ababcbacadefegdehijhklij" -> [9, 7, 8]

The greedy insight: track the last occurrence of each character. As you scan, the current partition must extend to at least the last occurrence of every character you have seen. When your index reaches the current partition boundary, cut.

Coding Drill: Task Scheduler

python
# LeetCode 621: Task Scheduler
# Minimum intervals to complete all tasks with cooldown n
from collections import Counter

def leastInterval(tasks, n):
    freq = Counter(tasks)
    max_freq = max(freq.values())
    # Count how many tasks have the maximum frequency
    max_count = sum(1 for v in freq.values() if v == max_freq)
    # The most frequent task creates (max_freq - 1) gaps of size n
    # Each gap can be filled with other tasks
    result = (max_freq - 1) * (n + 1) + max_count
    return max(result, len(tasks))  # at least len(tasks)

This is greedy because the most frequent task determines the structure. You build a frame: (max_freq - 1) blocks of size (n + 1) plus one final block. The other tasks fill in the idle slots greedily.

Coding Drill: Reorganize String

python
# LeetCode 767: Reorganize String
# Rearrange so no two adjacent characters are the same
import heapq
from collections import Counter

def reorganizeString(s):
    freq = Counter(s)
    # Check if possible: most frequent char can't exceed ceil(n/2)
    max_freq = max(freq.values())
    if max_freq > (len(s) + 1) // 2:
        return ""

    # Max-heap (negate for min-heap)
    heap = [(-cnt, ch) for ch, cnt in freq.items()]
    heapq.heapify(heap)
    result = []
    prev_cnt, prev_ch = 0, ''

    while heap:
        cnt, ch = heapq.heappop(heap)
        # Re-add the previous character if it still has count
        if prev_cnt < 0:
            heapq.heappush(heap, (prev_cnt, prev_ch))
        result.append(ch)
        prev_cnt, prev_ch = cnt + 1, ch  # decrement (negated)

    return ''.join(result)

This is a priority-queue greedy algorithm: always place the most frequent remaining character, but never place the same character twice in a row. The "previous character" is held out of the heap for one round, then re-added. This ensures no adjacent duplicates while using the most frequent characters as fast as possible.

Coding Drill: Jump Game III — BFS (Not Greedy)

Not all "Jump Game" variants are greedy. LeetCode #1306 (Jump Game III) asks: "Can you reach index 0 starting from any given index, where you can jump +arr[i] or -arr[i]?" This is BFS/DFS, not greedy — you may need to explore multiple paths. Recognizing when a problem is NOT greedy is as important as recognizing when it is.

python
# LeetCode 1306: Jump Game III — BFS, NOT greedy
from collections import deque

def canReach(arr, start):
    n = len(arr)
    visited = set()
    q = deque([start])
    while q:
        i = q.popleft()
        if arr[i] == 0: return True
        if i in visited: continue
        visited.add(i)
        for ni in [i + arr[i], i - arr[i]]:
            if 0 <= ni < n and ni not in visited:
                q.append(ni)
    return False

The difference: Jump Game I/II have a monotonic structure (you always move rightward, farthest-reachable is non-decreasing). This monotonicity enables greedy. Jump Game III allows bidirectional movement, breaking monotonicity and requiring full graph search.

The Greedy Interview Cheat Sheet

Print this mental checklist. Run through it for every optimization problem you encounter:

Step	Question	If YES	If NO
1	Can I sort the input by some criterion?	Promising for greedy. Try step 2.	Maybe DP or graph search.
2	Can I make one local choice and never undo it?	Strong greedy signal. Try step 3.	Need DP or backtracking.
3	Can I construct a quick exchange argument?	CODE IT. Greedy is almost certainly correct.	Try a small counterexample.
4	Does my counterexample break greedy?	Pivot to DP. State why greedy fails.	Greedy probably works. Code it, prove later.

In a 45-minute interview, spending 3 minutes on this checklist saves you from 20 minutes down a wrong path. The most common mistake is jumping to DP when greedy suffices (wasting time on unnecessary complexity) or jumping to greedy when DP is needed (getting wrong answers).

The Five Interview Dimensions for Greedy

Dimension	What They Test	Example Question	Strong Answer Pattern
Concept	First-principles understanding	"When does greedy produce optimal solutions?"	State greedy-choice property + optimal substructure. Give exchange argument for a concrete example.
Design	Choosing the right approach	"Design a system to schedule N meeting rooms"	Recognize as interval partitioning. Sort by start, use min-heap of end times. O(n log n).
Code	Clean implementation	"Implement Huffman coding"	Use heapq, define comparable node class, handle edge cases (single character, empty input).
Debug	Finding why greedy fails	"Your greedy coin change gives wrong answer for denominations [1,5,8]"	Construct counterexample (amount 10: greedy gives 8+1+1=3, optimal is 5+5=2). Explain missing greedy-choice property. Pivot to DP.
Frontier	Advanced awareness	"How is Huffman used in modern compression?"	DEFLATE, JPEG, then discuss ANS as the modern successor. Mention that LLM tokenizers use BPE (a greedy algorithm).

The interview meta-strategy. When you see an optimization problem in an interview: (1) Check if sorting reveals a greedy structure. (2) Try the greedy approach. (3) Verify with a small counterexample — pick 3-4 items and check by hand. If the counterexample breaks greedy, pivot to DP. If greedy works, state the exchange argument (even informally). Interviewers love seeing this structured thinking. It shows algorithmic maturity beyond "I memorized the solution."

Common Mistakes in Greedy Interviews

Mistake 1: No proof sketch. You code a greedy solution but cannot explain why it works. The interviewer asks "how do you know this is optimal?" and you say "it seems right." Always prepare a 2-sentence exchange argument: "Assume an optimal solution without our greedy choice. We can swap X for our choice without increasing cost because..."

Mistake 2: Wrong sorting criterion. You recognize a greedy problem but sort by the wrong attribute. Activity selection by duration, knapsack by value instead of density. When in doubt, try 2-3 small examples to validate your criterion before coding.

Mistake 3: Greedy when DP is needed. You force a greedy approach on a DP problem (like 0/1 knapsack or edit distance). If you cannot construct an exchange argument in 30 seconds, it is probably not greedy. Check for the telltale sign: can you make an irrevocable local choice? If the optimal decision at step i depends on future steps, greedy will not work.

Interactive: Jump Game Simulator

Jump Game — Greedy Reach Tracker

Each cell shows the max jump length. Green = reachable. The moving bar shows the farthest reachable index. Can you reach the end?

Click "Run" to watch the greedy reach tracker.

Pattern check: You need to burst all balloons arranged on a number line. Each balloon covers an interval [x_start, x_end]. You can shoot vertical arrows from any x-coordinate; an arrow bursts all balloons whose interval contains that x. What is the minimum number of arrows? Which greedy strategy works?

Sort balloons by end coordinate. Shoot at each balloon's end. Skip balloons already burst. This is activity selection in disguise — each arrow "selects" a group of overlapping balloons, and sorting by end maximizes how many each arrow hits. Sort by start coordinate and shoot at the midpoint of each group. Sort by balloon width and shoot the widest first.

Chapter 9: Connections

Greedy algorithms do not exist in isolation. They connect deeply to dynamic programming, graph algorithms, and information theory. Here is how this chapter fits into the broader CLRS landscape and beyond.

This lesson covered the full depth of CLRS Chapter 16. We started with the activity selection problem, proved greedy correctness via the exchange argument, explored Huffman coding as a completely different application of the same greedy paradigm, drew sharp boundaries between greedy and DP using the knapsack litmus test, and built a practical interview toolkit. Along the way, we touched on matroids, approximation algorithms, and the deep connections between greedy and graph algorithms in later CLRS chapters.

Where Greedy Appears in CLRS

Chapter	Topic	Greedy Role
Ch 15	Dynamic Programming	DP is the "safe fallback" when greedy-choice property fails. Activity selection appears in both chapters to illustrate the relationship.
Ch 16	Greedy Algorithms (this lesson)	Core theory: greedy-choice property, optimal substructure, exchange arguments.
Ch 23	Minimum Spanning Trees	Both Kruskal's and Prim's are greedy. The cut property is the MST exchange argument.
Ch 24	Single-Source Shortest Paths	Dijkstra's algorithm is greedy (works for non-negative weights). Bellman-Ford is DP (handles negative weights).
Ch 25	All-Pairs Shortest Paths	Floyd-Warshall is DP. Johnson's algorithm uses Dijkstra (greedy) after reweighting.

The Greedy-DP Spectrum

Think of algorithms on a spectrum from most constrained to most flexible:

Greedy

Commit immediately. One sub-problem per step. O(n log n) typical. Requires greedy-choice property.

↓

Dynamic Programming

Defer commitment. Multiple overlapping sub-problems. O(n²) or O(nW) typical. Requires optimal substructure + overlapping sub-problems.

↓

Backtracking / Branch-and-Bound

Try and undo. Exponential worst case. No structural requirements -- works on anything but slowly.

When designing an algorithm, start at the top. If greedy works, use it -- it is the simplest and fastest. If not, try DP. Only resort to backtracking if the problem has no useful substructure.

Greedy in the Real World

Application	Algorithm	Why Greedy Works
Data compression (zip, gzip)	Huffman coding	Prefix-free codes have the greedy-choice property
Network routing (OSPF)	Dijkstra	Non-negative link weights guarantee greedy correctness
Network design (fiber optic)	Kruskal/Prim MST	Cut property ensures cheapest spanning tree
CPU scheduling	Shortest Job First	Minimizes average wait time (exchange argument)
Bandwidth allocation	Fractional knapsack	Resources are divisible
Compiler register allocation	Graph coloring heuristic	Greedy approximation (not always optimal)

Greedy Beyond CLRS

Several important greedy algorithms appear outside the standard algorithms curriculum but are worth knowing:

BPE (Byte Pair Encoding). The tokenizer behind GPT, LLaMA, and most modern LLMs is a greedy algorithm. It repeatedly merges the most frequent pair of adjacent tokens in the training corpus. This is structurally similar to Huffman — both build a hierarchy by merging pairs. But BPE merges the most frequent pair (top-down optimization for compression), while Huffman merges the least frequent pair (bottom-up tree construction for optimal prefix codes). The result is a vocabulary of subword tokens optimized for the training distribution. When you type "unbelievable" into ChatGPT, BPE might tokenize it as ["un", "believ", "able"] — common subwords get their own tokens, just as frequent characters get short Huffman codes.

Greedy feature selection. Forward selection in machine learning adds features one at a time, always picking the feature that most improves model accuracy on a validation set. This is greedy and not guaranteed optimal — the greedy-choice property does not hold because features can have synergistic effects (two features useless alone but powerful together). But it is fast (O(n · k) model fits for n features and k selected) and usually produces good results. Backward elimination (start with all features, greedily remove the least useful) is the dual approach.

A* search. A* is a "best-first" search that greedily expands the node with the lowest f(n) = g(n) + h(n), where g(n) is the cost so far and h(n) is the heuristic estimate of remaining cost. With an admissible heuristic (h never overestimates), A* is optimal. With a consistent heuristic (h satisfies the triangle inequality), A* never re-expands a node. This is another example of "greedy with the right criterion = optimal." Dijkstra is the special case where h(n) = 0 — no heuristic, just greedily expand the nearest unexplored node.

Limitations and Honest Assessment

Greedy algorithms are beautiful when they work. But they have real limitations:

When greedy fails. (1) 0/1 knapsack — items are indivisible, greedy overfits to density. (2) Coin change with arbitrary denominations — largest-first can overshoot. (3) Traveling salesman — nearest-neighbor greedy produces tours 20-25% longer than optimal on average. (4) Graph coloring — greedy coloring depends on vertex order and can use far more colors than optimal (χ(G) colors vs O(n) in the worst case). (5) General scheduling with dependencies — greedy heuristics give approximations, not optimality. (6) Edit distance / string alignment — matching characters greedily misses the global structure; DP is required. For all of these, DP, ILP, or specialized algorithms are needed.

The fundamental insight of this chapter: greedy is the simplest paradigm that achieves optimality. It does not always apply. When it does, it gives you elegant, fast, simple code with provable guarantees. The skill is recognizing when the exchange argument holds — and having the discipline to reach for DP when it does not.

The takeaway: greedy is a tool, not a religion. Use it when the exchange argument holds. Reach for DP when it does not. Know the difference, and you will solve optimization problems faster than anyone in the room.

Key Takeaways for the Exam / Interview

#	Concept	One-Sentence Summary
1	Greedy-choice property	There exists an optimal solution containing the locally optimal (greedy) choice.
2	Exchange argument	Take any optimal solution; show you can swap in the greedy choice without making it worse.
3	Optimal substructure	After the greedy choice, the remaining sub-problem has the same structure.
4	Activity selection	Sort by finish time, scan once. O(n log n). Exchange argument: earliest finisher can replace any first choice.
5	Huffman coding	Merge two lowest-frequency nodes greedily. Optimal prefix-free code. O(n log n).
6	Fractional knapsack	Sort by value density, take greedily. Works because items are divisible.
7	0/1 knapsack	Greedy fails. Need DP: O(nW) pseudo-polynomial.
8	Matroids	The abstract structure guaranteeing greedy optimality. Greedy on weighted matroid = optimal.
9	Greedy vs DP	Both need optimal substructure. Greedy additionally needs greedy-choice property. DP needs overlapping sub-problems.

"An algorithm must be seen to be believed." — Donald Knuth. You have now seen nine greedy algorithms, their proofs, their failures, and their implementations. The exchange argument is your universal tool for verifying greedy correctness. When you can construct one, code confidently. When you cannot, pivot to DP without guilt.

Resource	What You Will Learn
CLRS Ch. 23 (MST)	The cut property — the most elegant exchange argument in all of algorithms. Kruskal's and Prim's algorithms with full proofs.
CLRS Ch. 24 (Shortest Paths)	Dijkstra's algorithm: greedy on graphs. Bellman-Ford: DP on graphs. When to use which.
CLRS Ch. 15 (DP) revisited	After understanding greedy, re-read Chapter 15. The contrast between "make one choice" and "consider all choices" will be crystal clear.
Kleinberg & Tardos Ch. 4	More greedy algorithms with exchange argument proofs. Excellent exposition of interval scheduling variants.
Dasgupta, Papadimitriou & Vazirani Ch. 5	Huffman coding, set cover, and the approximation ratio analysis.
LeetCode Greedy tag	300+ problems. Start with the "Easy" ones, work up. Focus on intervals, arrays, and strings.

The Three Things to Remember

If you remember only three things from this entire lesson, make them these:

1. Greedy-choice property

The locally optimal choice is always part of some globally optimal solution. Without this, greedy is just a heuristic. With this, greedy is provably optimal.

↓

2. The exchange argument

Take any optimal solution. Swap in the greedy choice. Show it is still feasible and still optimal. This is your proof technique for every greedy algorithm you will ever encounter.

↓

3. Greedy ≠ DP

Both need optimal substructure. Greedy needs the greedy-choice property (commit early). DP needs overlapping sub-problems (consider all options). Greedy is faster when applicable. DP is more general.

Final synthesis: A problem has optimal substructure but NOT the greedy-choice property. It also has overlapping sub-problems. What algorithmic paradigm should you use?

Dynamic programming — it handles optimal substructure with overlapping sub-problems and does not require the greedy-choice property Greedy with a different sorting criterion Divide and conquer — because it always works on sub-problems