Activity selection, Huffman codes, fractional knapsack — when locally optimal IS globally optimal.
You manage a single lecture hall. Ten professors want to use it today, each for a different time slot. Professor A needs it from 1 PM to 4 PM. Professor B needs 2 PM to 5 PM. Professor C needs 3 PM to 4 PM. Some of these overlap. Your job: schedule as many lectures as possible without any two overlapping.
This is the activity selection problem. You have n activities, each with a start time si and finish time fi. Two activities are compatible if they do not overlap: one finishes before the other starts. You want the largest set of mutually compatible activities.
Let us formalize. Activity i occupies the half-open interval [si, fi). Two activities i and j are compatible if si ≥ fj or sj ≥ fi — one must finish before the other starts. The problem: find a maximum-cardinality subset S of mutually compatible activities.
How would you solve it? Your first instinct might be: "Pick the shortest activity first." That seems reasonable — shorter activities leave more room for others. But it fails. Consider three activities: [0, 5], [4, 7], [6, 11]. The shortest is [4, 7] with duration 3, but picking it blocks both of the others (it overlaps with [0, 5] at time 4 and overlaps with [6, 11] at time 6-7). Picking [0, 5] and [6, 11] gives you two activities instead of one.
Maybe pick the activity that starts earliest? That fails too. One activity [0, 100] starts earliest but blocks everything. A single greedy-by-start-time pick monopolizes the entire day.
Maybe pick the activity that conflicts with the fewest others? This is cleverer, but it also fails on constructed examples. Consider activities arranged in a diamond pattern: one central activity overlapping few others but blocking the most productive arrangement of peripheral activities. The "minimum conflicts" heuristic is better than the first two, but still not optimal.
The right answer is beautifully simple: always pick the activity that finishes earliest. This is the greedy choice, and it always works. By picking the earliest finisher, you free up the timeline as soon as possible for remaining activities. You are being maximally conservative about how much future time you consume.
In Chapter 15 (Dynamic Programming), you learned to solve optimization problems by breaking them into overlapping sub-problems and building solutions bottom-up. DP is powerful but often expensive: it fills a table of size O(n2) or O(nW) or worse, considers all possible choices at each step, and uses significant memory.
Greedy algorithms are the fast lane. When applicable, they replace DP's exhaustive search with a simple rule: sort the input, scan once, make irrevocable choices. The result is algorithms that are simpler to code, faster to run, and easier to reason about. The catch: greedy only works for problems with a special structural property. Recognizing that property — and proving it holds — is the central skill of this chapter.
Think of it as a hierarchy of algorithmic sophistication:
| Approach | Time | When It Works |
|---|---|---|
| Brute force | O(2n) or worse | Always (but too slow for n > 20) |
| Dynamic programming | O(n2) to O(nW) | Optimal substructure + overlapping sub-problems |
| Greedy | O(n log n) typical | Optimal substructure + greedy-choice property |
Greedy is not a weaker tool than DP — it is a stronger requirement on the problem structure. When the requirement is met, you get a dramatic speedup for free. Think of greedy as the "express lane" — it is faster, but not every problem is eligible.
In this lesson we will study three canonical greedy problems in depth (activity selection, Huffman coding, fractional knapsack), prove their correctness using the exchange argument, see exactly where and why greedy fails on related problems (0/1 knapsack, general coin change), and build a practical toolkit for recognizing greedy problems in interviews and competitions.
The canvas below shows activities as colored bars on a timeline. The greedy algorithm sorts by finish time, then scans left to right, picking each activity that does not conflict with the last one selected. Click "Run Greedy" to watch it work. Click "New Activities" to generate a fresh random instance.
Activities are sorted by finish time. Green = selected. Red = rejected (overlaps with a selected activity). Click "Run Greedy" to step through.
Notice the pattern. The greedy algorithm never backtracks. It makes one pass through the sorted activities, and each decision is final. It runs in O(n log n) time — the sort dominates. The scan is O(n). Compare this to a brute-force approach that would check all 2n subsets. For 20 activities, brute force examines over one million subsets. For 50 activities, it is over a quadrillion. Greedy solves it in microseconds.
Let us trace through a concrete example by hand. Consider these activities sorted by finish time:
| Activity | Start | Finish | Greedy Decision |
|---|---|---|---|
| A | 1 | 4 | SELECT (first activity, always take it). last_finish = 4 |
| B | 3 | 5 | SKIP (start 3 < last_finish 4) |
| C | 0 | 6 | SKIP (start 0 < last_finish 4) |
| D | 5 | 7 | SELECT (start 5 ≥ last_finish 4). last_finish = 7 |
| E | 3 | 9 | SKIP (start 3 < last_finish 7) |
| F | 5 | 9 | SKIP (start 5 < last_finish 7) |
| G | 6 | 10 | SKIP (start 6 < last_finish 7) |
| H | 8 | 11 | SELECT (start 8 ≥ last_finish 7). last_finish = 11 |
| I | 8 | 12 | SKIP (start 8 < last_finish 11) |
| J | 2 | 14 | SKIP (start 2 < last_finish 11) |
| K | 12 | 16 | SELECT (start 12 ≥ last_finish 11). last_finish = 16 |
Result: {A, D, H, K} — four activities, no overlaps. This is indeed optimal for this input. Try finding five compatible activities; you will not succeed. Here is a proof by exhaustion for the skeptical: any five activities must include at least two from the dense middle region (B, C, E, F, G), but all pairs from that region overlap with each other or with A/D. The maximum is four.
The algorithm structure is worth appreciating. The entire decision process is a single left-to-right scan. Each activity gets exactly one look. The invariant maintained through the scan is simple: last_finish tracks when the most recently selected activity ends. An incoming activity is compatible if and only if it starts at or after last_finish. This invariant is trivially maintained — when we select an activity, we update last_finish to its finish time. When we skip, last_finish stays the same.
But why does this work? Why is "pick the earliest finisher" always optimal? We will prove this rigorously in Chapter 1, using a technique called the exchange argument.
In Chapter 0 we saw that the "earliest finish time" greedy strategy works for activity selection. But we handwaved the "why." Now we make it airtight. The key requirement for any greedy algorithm to produce an optimal solution is the greedy-choice property.
A problem has the greedy-choice property if there exists an optimal solution that includes the greedy choice. In other words: when we make our locally optimal decision, we are not painting ourselves into a corner. There is always a globally optimal solution that agrees with our first greedy pick.
This is different from dynamic programming. DP makes no commitment until it has explored all sub-problems. Greedy commits immediately and never looks back. For this boldness to pay off, we need to know that our first move is never a mistake.
More precisely: a problem exhibits the greedy-choice property if, for every instance of the problem, there exists at least one optimal solution that includes the locally optimal (greedy) choice. We do not need EVERY optimal solution to include it — just one. That is enough to justify making the greedy choice and then recursing on the sub-problem.
How do you prove the greedy-choice property holds? The standard technique is the exchange argument (also called a "swap argument" or "replacement argument"). The idea is deceptively simple: take any optimal solution, and show you can transform it into one that includes the greedy choice without making it any worse.
Here is the exchange argument for activity selection, stated with full rigor:
Theorem: Let S = {a1, a2, ..., an} be a set of activities sorted by finish time (f1 ≤ f2 ≤ ... ≤ fn). Then there exists an optimal solution that includes a1 (the activity with the earliest finish time).
Proof: Let O be an optimal solution. Let ak be the activity in O with the earliest finish time. If ak = a1, we are done. Otherwise, consider the set O' = (O \ {ak}) ∪ {a1} — we remove ak and add a1.
We claim O' is also a valid optimal solution. First, |O'| = |O| (same cardinality — we swapped one element for another). Second, O' has no conflicts. Why? Since a1 has the earliest finish time of all activities, f1 ≤ fk. Activity ak was compatible with all other activities in O. Every other activity aj in O has sj ≥ fk ≥ f1. So a1 is also compatible with every aj in O. Therefore O' is a valid solution with the same size as O that includes a1. QED.
Corollary: The greedy algorithm (repeatedly select the earliest-finishing compatible activity) produces an optimal solution. Proof by induction: the first greedy choice is safe (the theorem above). After making it, the remaining sub-problem has the same structure, and the same argument applies to the next greedy choice. By induction on the number of activities selected, the entire greedy solution is optimal.
The canvas below shows two timelines. The top is an arbitrary optimal solution O. The bottom shows O' after swapping in the greedy choice. Watch how the swap cannot create any new conflicts.
Top: optimal solution O with first activity a1. Bottom: O' after swapping a1 for g (earliest-finish). The swap is always safe because g finishes no later than a1.
The greedy-choice property is necessary but not sufficient. We also need optimal substructure: after making the greedy choice, the remaining problem must itself have an optimal solution that, combined with the greedy choice, gives an optimal solution to the whole problem.
For activity selection: after picking the earliest-finishing activity g, we are left with the sub-problem of all activities that start after g finishes. An optimal solution to the original problem consists of g plus an optimal solution to this sub-problem. This is easy to prove by contradiction: if the remaining activities were not solved optimally, we could improve the overall solution by improving the sub-solution.
Both conditions must hold. Many problems have optimal substructure (like the 0/1 knapsack) but lack the greedy-choice property. For those, you need DP. We will see concrete examples in Chapter 3.
The exchange argument is so central to greedy algorithms that it is worth understanding at a meta-level. The argument has a specific logical structure:
Goal: Show that the greedy choice g is in some optimal solution.
Setup: Let O be any optimal solution. We consider two cases.
Case 1: g ∈ O. Done — we already have an optimal solution containing g.
Case 2: g ∉ O. This is the interesting case. We need to:
The result is a new optimal solution O' = (O \ {x}) ∪ {g} that contains g. Since O was arbitrary, this proves that no matter which optimal solution we start with, we can always "exchange in" the greedy choice.
Different problems use different exchange arguments. For activity selection, we swap the first activity. For Huffman coding, we swap the positions of the two least-frequent characters. For MST, we swap an edge on a cycle. The structure is always the same: swap, show feasibility, show no-worse.
Here is an exercise. Consider the problem: "Given n ropes with lengths L1, ..., Ln, connect them into one rope. The cost of connecting two ropes of lengths a and b is a + b. Minimize total cost."
The greedy strategy: always connect the two shortest ropes first (same as Huffman!). Can you construct the exchange argument? Hint: assume an optimal solution connects some non-shortest pair first. Show you can swap this connection with the shortest-pair connection and reduce the total cost. The argument is almost identical to the Huffman optimality proof — the two shortest "ropes" should be merged first because they contribute their length to every subsequent merge step, so minimizing their individual contribution minimizes the total.
This problem (LeetCode #1167, "Minimum Cost to Connect Sticks") is literally Huffman coding in disguise. Recognizing structural equivalences like this is what turns algorithm knowledge from memorized patterns into genuine problem-solving power.
Other Huffman-equivalent problems:
We have established that the greedy algorithm is correct. Now let us understand it as an algorithm engineer: exact implementation, complexity analysis, and the road not taken (DP).
python def activity_selection_greedy(activities): # activities: list of (start, finish) tuples # Sort by finish time activities.sort(key=lambda x: x[1]) selected = [activities[0]] last_finish = activities[0][1] for i in range(1, len(activities)): start, finish = activities[i] if start >= last_finish: # compatible selected.append(activities[i]) last_finish = finish return selected # Example acts = [(1,4), (3,5), (0,6), (5,7), (3,9), (5,9), (6,10), (8,11), (8,12), (2,14), (12,16)] result = activity_selection_greedy(acts) print(result) # [(1,4), (5,7), (8,11), (12,16)] — 4 activities
Activity selection can be solved with dynamic programming. Define sub-problem Sij as the set of activities that start after activity i finishes and finish before activity j starts. Let c[i, j] = size of the maximum compatible subset of Sij. The recurrence is:
This recurrence considers every possible "next activity" ak in Sij. For each candidate, we split into left sub-problem Sik (activities between i and k) and right sub-problem Skj (activities between k and j), solve both optimally, and take the best overall split. The number of sub-problems is O(n2) (one for each (i, j) pair). Each sub-problem considers up to O(n) candidates for ak. Total: O(n3) in the worst case, or O(n2) with optimized implementation. Either way, it works — but it does far more work than necessary.
The DP approach treats the problem as if we do not know which activity to pick first. It considers ALL possibilities and chooses the best. Greedy exploits the insight that we DO know which to pick first: the one that finishes earliest. This collapses n choices at the first step into exactly one, and the problem becomes a single chain of decisions rather than a branching tree of sub-problems.
The greedy algorithm exploits the fact that we never need to consider all possible next activities. The greedy choice property tells us exactly which one to pick: the one that finishes earliest. No table, no backtracking, no quadratic time. Just O(n log n).
The canvas below runs the same set of activities through both algorithms. On the left, DP fills a table. On the right, greedy makes a single pass. Both find the same optimal solution, but greedy gets there faster.
Left: DP fills an n×n table (O(n²)). Right: greedy scans once (O(n)). Both produce the optimal solution.
| Property | DP Approach | Greedy Approach |
|---|---|---|
| Time complexity | O(n2) | O(n log n) |
| Space complexity | O(n2) for the table | O(n) for the sorted array + O(k) for the result |
| Correctness | Always optimal (considers all sub-problems) | Always optimal (proven by exchange argument) |
| Implementation | Moderate (need to define sub-problems carefully) | Simple (sort + one pass) |
| When to use | When greedy-choice property does not hold | When greedy-choice property holds |
CLRS presents activity selection in both recursive and iterative forms. The recursive version mirrors the proof structure directly:
python def recursive_activity_select(activities, k, n): # activities: sorted by finish time # k: index of last selected activity # n: total number of activities m = k + 1 # Find the first activity that starts after activity k finishes while m < n and activities[m][0] < activities[k][1]: m += 1 if m < n: return [activities[m]] + recursive_activity_select(activities, m, n) else: return [] # Usage: prepend a dummy activity (0, 0) to bootstrap acts = [(0, 0), (1,4), (3,5), (0,6), (5,7), (8,11), (12,16)] result = recursive_activity_select(acts, 0, len(acts)) print(result) # [(1,4), (5,7), (8,11), (12,16)]
The recursive version makes the inductive structure explicit: select the first compatible activity (the greedy choice), then recurse on the remaining sub-problem. But it is tail-recursive — the recursive call is the last operation — so it converts trivially to the iterative version we saw earlier. In practice, always use the iterative version. It avoids stack depth issues and is equally clear.
A harder variant: each activity also has a weight (profit), and you want to maximize total weight rather than count. Now greedy fails — you might skip a high-weight activity to fit two low-weight ones. This problem requires DP:
where p(i) is the latest activity that finishes before activity i starts (found by binary search). This runs in O(n log n). The weighted variant is a classic interview problem (LeetCode #1235 "Maximum Profit in Job Scheduling").
python import bisect def weighted_activity_selection(activities): # activities: list of (start, finish, weight) activities.sort(key=lambda x: x[1]) # sort by finish n = len(activities) finishes = [a[1] for a in activities] dp = [0] * (n + 1) for i in range(1, n + 1): # p(i): latest job ending <= start of job i s_i = activities[i - 1][0] p = bisect.bisect_right(finishes, s_i, 0, i - 1) dp[i] = max(dp[i - 1], activities[i - 1][2] + dp[p]) return dp[n]
Activity selection has several important variants, each requiring a different technique:
| Variant | Goal | Algorithm | Complexity |
|---|---|---|---|
| Maximum count (original) | Most activities, unweighted | Greedy by earliest finish | O(n log n) |
| Maximum weight | Most total weight | DP with binary search | O(n log n) |
| Minimum rooms | Schedule ALL activities, minimize rooms | Greedy with heap (interval partitioning) | O(n log n) |
| Maximum overlap | Find time with most simultaneous activities | Sweep line | O(n log n) |
| Minimum removals | Remove fewest to eliminate all overlaps | n - (greedy max count) | O(n log n) |
The key insight: the original activity selection problem (max count, unweighted) is one of the few variants where greedy is optimal. Most other variants require DP, heap-based approaches, or sweep lines. Knowing which variant you face is essential for choosing the right algorithm.
In an interview, when you hear "schedule activities/jobs/meetings," immediately ask: "Are we maximizing count, maximizing total value/weight, minimizing rooms, or something else?" The answer determines your algorithm:
Both greedy and DP exploit optimal substructure: the optimal solution to the whole problem contains optimal solutions to sub-problems. The difference lies in how they make choices.
| Property | Greedy | Dynamic Programming |
|---|---|---|
| Choice strategy | Make the best-looking choice NOW, never reconsider | Consider all choices, pick the best after seeing all sub-problem solutions |
| Sub-problems | One sub-problem remains after each choice (no branching) | Many overlapping sub-problems, solved bottom-up or top-down with memoization |
| Required properties | Greedy-choice property + optimal substructure | Optimal substructure + overlapping sub-problems |
| Typical complexity | O(n log n) or O(n) | O(n2) or O(nW) etc. |
The most famous example of "greedy works here but fails there" is the knapsack problem. There are two variants:
Fractional knapsack: You have a knapsack with capacity W. There are n items, each with weight wi and value vi. You can take fractions of items. Goal: maximize total value.
0/1 knapsack: Same setup, but you must take an item entirely or not at all. No fractions.
For fractional knapsack, the greedy strategy is simple: compute the value density vi/wi for each item, sort by density descending, and greedily take as much of the densest item as possible, then the next, until the knapsack is full. This is optimal.
For 0/1 knapsack, the same greedy strategy fails. Here is the classic counterexample:
| Item | Weight | Value | Density |
|---|---|---|---|
| A | 10 | 60 | 6.0 |
| B | 20 | 100 | 5.0 |
| C | 30 | 120 | 4.0 |
Knapsack capacity: W = 50. Greedy by density picks A (weight 10, value 60), then B (weight 20, value 100). Total weight 30, value 160. Remaining capacity 20 -- item C weighs 30, does not fit. Final: value 160.
But the optimal 0/1 solution is B + C: weight 50, value 220. Greedy missed it because it committed to A too early, and in the 0/1 world you cannot take a fraction of C to fill the gap.
For fractional knapsack, greedy would take all of A (60), all of B (100), and 20/30 of C (80) = total value 240. That is optimal. The ability to take fractions means the greedy choice never wastes capacity.
Let us trace through the fractional knapsack algorithm step by step:
| Step | Item | Density | Weight Available | Take | Value Gained | Remaining Capacity |
|---|---|---|---|---|---|---|
| 1 | A | 6.0 | 50 | All 10 kg | 60 | 40 |
| 2 | B | 5.0 | 40 | All 20 kg | 100 | 20 |
| 3 | C | 4.0 | 20 | 20/30 = 66.7% | 80 | 0 |
Total value: 60 + 100 + 80 = 240. The knapsack is perfectly full (50 kg). No capacity is wasted.
Why does the exchange argument work here? Suppose some optimal solution does not take the highest-density item first. Then it must allocate some capacity to a lower-density item instead. Swapping that capacity to the higher-density item increases the value (higher density = more value per kilogram). So the original solution was not optimal — contradiction. The key: "swapping capacity" is possible because items are divisible. In 0/1 knapsack, you cannot transfer a fraction of one item's allocation to another.
Since greedy fails for 0/1 knapsack, we need DP. Define dp[i][w] = maximum value achievable using items 1..i with capacity w. The recurrence is:
The first term (dp[i-1][w]) corresponds to NOT taking item i. The second term corresponds to taking item i (if it fits: wi ≤ w). We check both options and take the better one. This is what makes DP fundamentally different from greedy: DP considers both "take" and "don't take" at every step, using a table to avoid recomputation.
python def knapsack_01(items, W): # items: list of (weight, value) n = len(items) dp = [[0] * (W + 1) for _ in range(n + 1)] for i in range(1, n + 1): w_i, v_i = items[i - 1] for w in range(W + 1): dp[i][w] = dp[i - 1][w] # don't take item i if w_i <= w: dp[i][w] = max(dp[i][w], dp[i - 1][w - w_i] + v_i) return dp[n][W] # Example: greedy gets 160, DP gets 220 print(knapsack_01([(10,60), (20,100), (30,120)], 50)) # 220
Time complexity: O(nW). Space: O(nW) for the full table (optimizable to O(W) with a rolling array). This is pseudo-polynomial — polynomial in the numeric value of W, not in the number of bits needed to represent W. For small W, this is fast. For W in the billions, you need approximation algorithms.
The canvas below shows both variants side by side. On the left, fractional knapsack with greedy filling. On the right, 0/1 knapsack showing why greedy fails. Adjust the knapsack capacity with the slider.
Left: fractional (greedy works). Right: 0/1 (greedy fails). Green = taken, orange = partial, gray = skipped.
Think of solving an optimization problem as navigating a tree of decisions. At each node, you choose which branch to take. The leaves are complete solutions, and you want the leaf with the best value.
Brute force explores the entire tree — every possible combination of choices. Exponential time.
Dynamic programming prunes the tree by recognizing that many branches lead to the same sub-problem. It solves each sub-problem once, stores the result, and reuses it. The tree collapses from exponential to polynomial size.
Greedy does not explore the tree at all. At each node, it takes one branch — the locally optimal one — and never looks at the others. It walks a single path from root to leaf. If the greedy-choice property holds, this one path happens to reach an optimal leaf. If not, it might reach a suboptimal leaf and never know.
When faced with an optimization problem, ask these questions in order:
python def fractional_knapsack(items, W): # items: list of (weight, value) tuples # Sort by value density descending items.sort(key=lambda x: x[1] / x[0], reverse=True) total_value = 0 remaining = W for weight, value in items: if remaining <= 0: break take = min(weight, remaining) # take as much as fits total_value += value * (take / weight) remaining -= take return total_value # Example: items = [(10, 60), (20, 100), (30, 120)], W = 50 print(fractional_knapsack([(10,60), (20,100), (30,120)], 50)) # 240.0 — take all A, all B, 2/3 of C
You need to compress a file. The file contains characters, each appearing with some frequency. A naive encoding uses a fixed number of bits per character (like ASCII's 8 bits). But if some characters appear much more often than others, this wastes bits. Frequent characters should get short codes; rare characters should get longer ones.
This is exactly what Huffman coding achieves. It is the optimal prefix-free variable-length code for a given character frequency distribution. And it is built by a greedy algorithm.
A prefix-free code (sometimes called a "prefix code") is a code where no codeword is a prefix of another codeword. Why does this matter? Because it makes decoding unambiguous. When you read a bitstream, you can identify each codeword immediately without needing a separator or lookahead.
Example of a prefix-free code: a=0, b=10, c=110, d=111. When you read "010110111", you can decode it left to right without ambiguity: 0|10|110|111 = "abcd".
Example of a NON-prefix-free code: a=0, b=01, c=010. Reading "010" is ambiguous: is it "ab" (0, 10)? Wait, 10 is not a code. Is it "a, b, ..." starting with 0, 1, 0? This is a mess. Prefix-free codes avoid this entirely.
David Huffman discovered this algorithm as a term paper assignment in 1952. His professor, Robert Fano, had been working on the same problem with Claude Shannon. Huffman's algorithm beat their approach (Shannon-Fano coding) and became one of the most influential algorithms in computer science.
The string "abracadabra" has 11 characters. Let us count frequencies:
| Character | Frequency | Probability |
|---|---|---|
| a | 5 | 5/11 = 0.45 |
| b | 2 | 2/11 = 0.18 |
| r | 2 | 2/11 = 0.18 |
| c | 1 | 1/11 = 0.09 |
| d | 1 | 1/11 = 0.09 |
Now run Huffman's algorithm:
Step 1: Priority queue: c(1), d(1), b(2), r(2), a(5).
Step 2: Extract c(1) and d(1). Merge into [cd](2). Queue: b(2), r(2), [cd](2), a(5).
Step 3: Extract b(2) and r(2). Merge into [br](4). Queue: [cd](2), a(5), [br](4). Wait -- let us re-sort: [cd](2), [br](4), a(5).
Step 4: Extract [cd](2) and [br](4). Merge into [cdbr](6). Queue: a(5), [cdbr](6).
Step 5: Extract a(5) and [cdbr](6). Merge into root(11). Done!
Reading the codes from the tree (left = 0, right = 1, read the path from root to leaf):
| Character | Frequency | Huffman Code | Code Length | Bits Used |
|---|---|---|---|---|
| a | 5 | 0 | 1 | 5 × 1 = 5 |
| b | 2 | 110 | 3 | 2 × 3 = 6 |
| r | 2 | 111 | 3 | 2 × 3 = 6 |
| c | 1 | 100 | 3 | 1 × 3 = 3 |
| d | 1 | 101 | 3 | 1 × 3 = 3 |
Total bits: 5 + 6 + 6 + 3 + 3 = 23 bits. Fixed-length encoding would need ⌈log2(5)⌉ = 3 bits per character × 11 = 33 bits. Huffman saves 30%.
Notice the elegant tradeoff: the most frequent character (a, appearing 5 times) gets a 1-bit code. The rarest characters (c and d, appearing once each) get 3-bit codes. Frequent characters are close to the root; rare characters are deep in the tree. This is exactly what minimizes the total bit count.
The encoded string "abracadabra" becomes: 0 110 111 0 100 0 101 0 110 111 0 = 23 bits. To decode, read bit by bit from the root: 0 → leaf a. Reset to root. 1 → go right, 1 → go right, 0 → leaf b. Reset. And so on. The prefix-free property ensures this always works without ambiguity.
Building a Huffman tree processes n characters (where n is the alphabet size, not the text length). The algorithm performs n-1 merge operations. Each merge requires two extract-min operations and one insert, each taking O(log n) on a binary min-heap. Total: O(n log n).
If the frequencies are pre-sorted, a clever implementation using two queues can build the tree in O(n) time. Queue 1 holds the original leaves (sorted). Queue 2 holds merged nodes (automatically sorted because merge order is monotonic). At each step, pick the two smallest from the fronts of both queues. This is the van Leeuwen optimization, rarely needed in practice but elegant in theory.
| Operation | With Binary Heap | With Two Queues (pre-sorted) |
|---|---|---|
| Building the tree | O(n log n) | O(n) |
| Encoding the text | O(m) where m = text length | O(m) |
| Decoding | O(m) — walk tree per bit | O(m) |
| Total (build + encode) | O(n log n + m) | O(n + m) |
Enter character frequencies below and watch the Huffman tree build bottom-up. The algorithm merges the two smallest nodes at each step.
Watch the tree grow from leaves to root. Each merge step is shown. The resulting code table appears below the tree.
python import heapq from collections import Counter class HuffNode: def __init__(self, char, freq, left=None, right=None): self.char = char self.freq = freq self.left = left self.right = right def __lt__(self, other): return self.freq < other.freq def huffman_tree(text): freq = Counter(text) heap = [HuffNode(ch, f) for ch, f in freq.items()] heapq.heapify(heap) while len(heap) > 1: left = heapq.heappop(heap) right = heapq.heappop(heap) merged = HuffNode(None, left.freq + right.freq, left, right) heapq.heappush(heap, merged) return heap[0] # root of Huffman tree def huffman_codes(root, prefix="", codes=None): if codes is None: codes = {} if root.char is not None: codes[root.char] = prefix or "0" # single-char edge case return codes huffman_codes(root.left, prefix + "0", codes) huffman_codes(root.right, prefix + "1", codes) return codes root = huffman_tree("abracadabra") codes = huffman_codes(root) for ch, code in sorted(codes.items()): print(f"{ch}: {code}") # a: 0, b: 110, c: 100, d: 101, r: 111 (or similar — tree shape may vary)
Single-character alphabet: If the text contains only one unique character (e.g., "aaaa"), the Huffman tree has a single leaf. The convention is to assign it the code "0" (1 bit per character). Some implementations handle this by creating a dummy second character.
Equal frequencies: When multiple characters have the same frequency, the algorithm's behavior depends on tie-breaking. Different tie-breaking strategies produce different trees (different code assignments) but ALL have the same optimal cost B(T). Huffman coding is not unique in its tree structure, but it IS unique in its cost.
Empty input: Edge case. No characters to encode. Return empty code table. Your implementation should handle this gracefully.
Two-character alphabet: Characters get codes "0" and "1" — one bit each. Huffman reduces to the most basic possible encoding. Still optimal.
Large alphabets: With n = 256 characters (byte-level encoding), the Huffman tree has up to 255 internal nodes and 256 leaves. Building it takes O(256 log 256) = O(256 · 8) = ~2048 operations. Negligible. The encoding/decoding of the actual data (potentially gigabytes) dominates.
We have built Huffman trees. Now let us understand why Huffman coding is optimal, how it relates to information theory, and what its limits are.
The cost of a prefix-free code is the expected number of bits per character. If character ci has frequency fi and depth di in the tree (depth = codeword length), the cost is:
Huffman coding minimizes B(T) over all possible prefix-free codes. No other prefix-free code can achieve a lower expected bits per character for the given frequency distribution.
The proof uses two key lemmas:
Lemma 1: In an optimal tree, the two characters with the lowest frequencies are siblings at the maximum depth. Proof by exchange: if they were not at maximum depth, some higher-frequency character would be deeper, and swapping would reduce the cost (lower frequency × greater depth versus higher frequency × lesser depth).
Lemma 2 (Optimal Substructure): Let T be an optimal tree for alphabet C with |C| ≥ 2. Let x and y be the two least-frequent characters, siblings in T at maximum depth. Let C' = C \ {x, y} ∪ {z} where z has frequency f(z) = f(x) + f(y). Let T' be the tree for C' obtained by replacing the subtree containing x and y with the single leaf z. Then T' is an optimal tree for C'.
Proof of Lemma 2: B(T) = B(T') + f(x) + f(y). This is because x and y each contribute their frequency times one extra level of depth compared to z. If T' were not optimal for C', there would exist a better tree T'' for C'. Expanding z in T'' back into x and y would give a tree for C with cost B(T'') + f(x) + f(y) < B(T') + f(x) + f(y) = B(T), contradicting T's optimality.
Together: Huffman's greedy choice (merge the two least-frequent) satisfies the greedy-choice property (Lemma 1) and optimal substructure (Lemma 2). Therefore the algorithm is correct. The proof structure is identical to activity selection: exchange argument for the first choice, induction on the sub-problem size.
The canvas below shows the same text encoded with a fixed-length code and a Huffman code. The height of each bar represents the number of bits used. Frequent characters get shorter Huffman codes (shorter bars), rare characters get longer ones.
Each bar = one character occurrence. Height = bits used for that occurrence. Orange = Huffman, gray = fixed-length. Total bits shown below.
For "abracadabra" with probabilities a=5/11, b=2/11, r=2/11, c=1/11, d=1/11:
Huffman average: 23 bits / 11 characters = 2.09 bits per character. That is within 0.05 bits of the entropy — nearly perfect for a per-character code.
Fixed-length average: 33 bits / 11 characters = 3.0 bits per character. Huffman saves 30% over fixed-length and comes within a rounding error of the theoretical minimum.
Shannon's source coding theorem gives us a tight sandwich:
where H(X) = -∑ pi log2(pi) is the entropy and LHuffman is the expected Huffman code length. The lower bound says no prefix-free code can beat entropy. The upper bound says Huffman is never more than 1 bit above entropy per symbol.
When does the gap approach 1 bit? When one character has very high probability. If p1 = 0.95 and there is one other character with p2 = 0.05, the entropy is about 0.286 bits/symbol. But Huffman must use at least 1 bit for the frequent character (since the two codewords must be "0" and "1"). So L = 0.95(1) + 0.05(1) = 1.0 bits/symbol. The gap is 0.714 bits — close to the theoretical maximum of 1.
When does the gap vanish? When all probabilities are negative powers of 2. If you have 4 characters each with probability 1/4, entropy is exactly 2.0 bits and Huffman produces a balanced tree with exactly 2 bits per character. The gap is zero.
Huffman coding is everywhere:
| Format/Standard | How Huffman Is Used |
|---|---|
| DEFLATE (zip, gzip, PNG) | Huffman codes the output of LZ77 compression. Two Huffman trees: one for literal/length symbols, one for distance symbols. |
| JPEG | Huffman codes the quantized DCT coefficients. Two tables: one for DC coefficients, one for AC. |
| MP3 | Huffman codes the quantized frequency-domain samples. 32 predefined Huffman tables selected per granule. |
| HTTP/2 (HPACK) | Huffman codes HTTP header strings. Uses a fixed Huffman table derived from web traffic statistics. |
In modern compressors like zstd and brotli, Huffman has been partly replaced by Asymmetric Numeral Systems (ANS), which achieves closer-to-entropy coding with similar speed. But Huffman remains the standard for simplicity-critical applications.
The algorithm we described is static Huffman: you scan the entire input to compute frequencies, build the tree, then encode. This requires two passes over the data and you must transmit the tree (or frequency table) alongside the compressed data.
Adaptive (dynamic) Huffman (Vitter's algorithm) builds the tree on-the-fly, updating frequencies and restructuring the tree as each symbol is processed. The decoder mirrors this process, so no separate tree transmission is needed. Adaptive Huffman is more complex to implement but useful for streaming data where you cannot make two passes.
Let us decode a Huffman-encoded bitstream by hand. Using our "abracadabra" tree (a=0, c=100, d=101, b=110, r=111), decode the bitstream: 0 110 111 0 100 0 101 0 110 111 0.
Start at the root. Read one bit at a time:
| Bits Read | Tree Walk | Output |
|---|---|---|
| 0 | Root → left → LEAF 'a' | a |
| 1 | Root → right | (continue) |
| 1 | → right | (continue) |
| 0 | → left → LEAF 'b' | b |
| 1 | Root → right | (continue) |
| 1 | → right | (continue) |
| 1 | → right → LEAF 'r' | r |
| 0 | Root → left → LEAF 'a' | a |
| 1 | Root → right | (continue) |
| 0 | → left | (continue) |
| 0 | → left → LEAF 'c' | c |
And so on. The prefix-free property guarantees that we always know exactly when a codeword ends: it ends when we reach a leaf. No lookahead, no backtracking. The decoder runs in O(m) time where m is the number of bits in the encoded stream.
python def huffman_decode(root, bitstream): # bitstream: string of '0' and '1' result = [] node = root for bit in bitstream: if bit == '0': node = node.left else: node = node.right if node.char is not None: # leaf result.append(node.char) node = root # reset to root return ''.join(result)
Huffman coding is not limited to characters. You can Huffman-encode any discrete symbol set. In JPEG, the "symbols" are run-length-encoded DCT coefficients. In DEFLATE, the symbols are literal bytes, match lengths, and match distances. The algorithm is the same — only the alphabet changes.
For very long texts, you can extend Huffman to blocks of characters. Instead of encoding single characters, encode pairs (bigrams), triples (trigrams), or variable-length blocks. This approaches the entropy limit more closely because it captures inter-character dependencies. In practice, modern compressors achieve near-entropy coding through other means (LZ77 + Huffman in DEFLATE, ANS in zstd).
This is the big payoff. Four classic greedy problems in one interactive playground. Select a problem, configure the input, and watch the greedy algorithm work. For coin change, see exactly when and why greedy fails.
Select a problem below, adjust parameters, and watch the greedy strategy execute step by step.
The coin change problem is particularly instructive. With US coin denominations [25, 10, 5, 1], greedy always works: to make change for 30 cents, take one quarter (25) and one nickel (5). But with denominations [1, 3, 4], greedy fails for amount 6: greedy picks 4+1+1 = 3 coins, but optimal is 3+3 = 2 coins. The canvas shows this failure vividly.
Try different amounts with the "Bad Coins [1,3,4]" button. Amount 6 is the classic failure case. Amount 12 is another: greedy gives 4+4+4 = 3 coins, optimal is also 4+4+4 = 3 coins (greedy happens to match). Amount 10: greedy gives 4+4+1+1 = 4 coins, but optimal is 3+3+4 = 3 coins. The unpredictability is the point — without the greedy-choice property, you cannot trust the greedy answer.
Since greedy fails for arbitrary denominations, here is the DP solution. Define dp[i] as the minimum number of coins to make amount i. The recurrence is beautifully simple:
For each amount i, we try every coin denomination c. If we use coin c, we need dp[i - c] additional coins for the remaining amount, plus 1 for the coin we just used. We take the minimum over all valid coins.
python def coin_change_dp(coins, amount): dp = [float('inf')] * (amount + 1) dp[0] = 0 for i in range(1, amount + 1): for c in coins: if c <= i and dp[i - c] + 1 < dp[i]: dp[i] = dp[i - c] + 1 return dp[amount] if dp[amount] != float('inf') else -1 # coins = [1,3,4], amount = 6 # dp: [0, 1, 2, 1, 1, 2, 2] # dp[6] = 2 (coins 3+3), greedy would give 4+1+1 = 3 print(coin_change_dp([1,3,4], 6)) # 2
Let us trace through dp for coins [1, 3, 4] and amount 6:
| i | Try c=1 | Try c=3 | Try c=4 | dp[i] | Coins used |
|---|---|---|---|---|---|
| 0 | — | — | — | 0 | — |
| 1 | dp[0]+1=1 | — | — | 1 | 1 |
| 2 | dp[1]+1=2 | — | — | 2 | 1+1 |
| 3 | dp[2]+1=3 | dp[0]+1=1 | — | 1 | 3 |
| 4 | dp[3]+1=2 | dp[1]+1=2 | dp[0]+1=1 | 1 | 4 |
| 5 | dp[4]+1=2 | dp[2]+1=3 | dp[1]+1=2 | 2 | 4+1 |
| 6 | dp[5]+1=3 | dp[3]+1=2 | dp[2]+1=3 | 2 | 3+3 |
DP finds dp[6] = 2 (using coins 3+3). Greedy would have picked 4 first, then needed 1+1 for the remaining 2, giving 3 coins. The DP table considers all possibilities at each step — it never commits prematurely.
This showcase demonstrates the core lesson of CLRS Chapter 16. Greedy algorithms are:
| Property | What It Means |
|---|---|
| Simple to implement | Sort + single pass. Rarely more than 15 lines of code. |
| Fast | O(n log n) in most cases, dominated by sorting. |
| Fragile | Only work when the greedy-choice property holds. Small changes to the problem (fractional → 0/1, US coins → arbitrary coins) can break them completely. |
| Provable | When they work, you can prove optimality via the exchange argument. This is what separates greedy algorithms from greedy heuristics. |
Greedy algorithms appear across computer science. Here are the major applications beyond what we have covered, each using the same pattern: make a locally optimal choice, prove it is globally safe.
You have n jobs, each with a deadline di and a penalty wi if the job is late. Each job takes unit time. Schedule jobs to minimize total penalty.
The greedy strategy: schedule jobs in order of decreasing penalty (highest penalty first). For each job, place it in the latest available slot before its deadline. If no slot is available, the job is late.
python def job_scheduling(jobs): # jobs: list of (id, deadline, penalty) # Sort by penalty descending jobs.sort(key=lambda x: x[2], reverse=True) n = len(jobs) max_deadline = max(d for _, d, _ in jobs) slots = [None] * (max_deadline + 1) # 1-indexed scheduled = [] late = [] for job_id, deadline, penalty in jobs: # Try to place in latest slot <= deadline placed = False for t in range(min(deadline, max_deadline), 0, -1): if slots[t] is None: slots[t] = job_id scheduled.append(job_id) placed = True break if not placed: late.append((job_id, penalty)) total_penalty = sum(p for _, p in late) return scheduled, late, total_penalty
Why does this greedy strategy work? The exchange argument: suppose the optimal schedule does not include the highest-penalty job j in its on-time set. If j can be swapped into the schedule (displacing a lower-penalty job), the total penalty decreases. By processing jobs in penalty-descending order and greedily assigning each to the latest available slot, we ensure the highest-value jobs get first priority for on-time slots.
With a naive implementation, assigning each job takes O(n) to find a free slot, giving O(n2) total. Using a Union-Find data structure (same one used in Kruskal's MST), we can find the latest available slot in nearly O(1) amortized time, giving O(n log n) overall (dominated by the initial sort).
Given a weighted, connected graph G = (V, E, w), find a spanning tree T (a connected, acyclic subgraph containing all vertices) that minimizes the total edge weight ∑e ∈ T w(e). Two classic algorithms, both greedy:
Kruskal's algorithm: Sort all edges by weight. Process edges from lightest to heaviest. Add each edge to the tree if it does not create a cycle (checked using Union-Find). Stop when you have |V|-1 edges. The greedy choice: always pick the cheapest safe edge globally. Time: O(E log E) for sorting + nearly O(E) for union-find operations with path compression and union by rank.
Prim's algorithm: Start from any vertex. Maintain a priority queue of edges crossing the "cut" between tree vertices and non-tree vertices. Repeatedly extract the cheapest such edge and add the non-tree endpoint to the tree. Time: O(E log V) with a binary heap, or O(E + V log V) with a Fibonacci heap.
Both work because the MST problem has the greedy-choice property, established by the cut property: for any cut of the graph (any partition of vertices into two non-empty sets), the minimum-weight edge crossing the cut is in some MST. This is the MST exchange argument: if the lightest crossing edge were not in some MST T, adding it creates a cycle. That cycle must contain another edge crossing the same cut. Removing that other edge (which is heavier) gives a spanning tree with lower total weight — contradicting T's optimality.
python # Kruskal's MST with Union-Find class UnionFind: def __init__(self, n): self.parent = list(range(n)) self.rank = [0] * n def find(self, x): if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) # path compression return self.parent[x] def union(self, x, y): px, py = self.find(x), self.find(y) if px == py: return False if self.rank[px] < self.rank[py]: px, py = py, px self.parent[py] = px if self.rank[px] == self.rank[py]: self.rank[px] += 1 return True def kruskal(n, edges): # edges: list of (weight, u, v) edges.sort() # sort by weight uf = UnionFind(n) mst = [] for w, u, v in edges: if uf.union(u, v): mst.append((w, u, v)) if len(mst) == n - 1: break return mst
Find shortest paths from a source vertex to all other vertices in a graph with non-negative edge weights. Dijkstra's algorithm is greedy: at each step, extract the vertex with the smallest tentative distance and finalize it. The greedy choice is safe because edge weights are non-negative -- no future path can improve on the shortest tentative distance.
python import heapq def dijkstra(graph, source): # graph: dict of {node: [(neighbor, weight), ...]} dist = {node: float('inf') for node in graph} dist[source] = 0 pq = [(0, source)] # (distance, node) visited = set() while pq: d, u = heapq.heappop(pq) if u in visited: continue visited.add(u) for v, w in graph[u]: if d + w < dist[v]: dist[v] = d + w heapq.heappush(pq, (dist[v], v)) return dist
Given a universe U of elements and a collection of subsets S1, S2, ..., Sn whose union is U, find the minimum number of subsets that cover all elements in U. This problem is NP-hard — no polynomial-time exact algorithm is known unless P = NP.
But the greedy strategy gives a remarkably good approximation. The greedy set cover algorithm:
This greedy algorithm uses at most H(|U|) = ln(|U|) + 1 times as many sets as the optimal solution, where H(n) is the n-th harmonic number. This is tight: there are instances where greedy uses exactly H(n) times optimal. Moreover, under standard complexity assumptions, no polynomial algorithm can achieve a better ratio than (1 - ε) ln(|U|). So greedy is essentially the best polynomial algorithm for set cover.
python def greedy_set_cover(universe, subsets): # universe: set of elements # subsets: list of sets uncovered = set(universe) chosen = [] while uncovered: # Pick subset covering the most uncovered elements best = max(subsets, key=lambda s: len(s & uncovered)) chosen.append(best) uncovered -= best return chosen
This is a greedy approximation algorithm: it does not guarantee optimality, but it guarantees a bounded suboptimality. Many NP-hard problems admit greedy approximations with provable bounds.
When a problem is NP-hard, we cannot hope for a polynomial-time exact algorithm (assuming P ≠ NP). But greedy can often provide approximation guarantees: a provable bound on how far from optimal the greedy solution can be.
| Problem | Greedy Strategy | Approximation Ratio | Is This Tight? |
|---|---|---|---|
| Set cover | Cover most uncovered elements | H(n) ≈ ln(n) + 1 | Yes — cannot do better in poly time |
| Vertex cover | Pick endpoints of maximal matching | 2 (not greedy per se, but simple) | Tight for matching-based approach |
| TSP (metric) | Nearest neighbor | O(log n) in theory, ~1.25 in practice | Christofides gives 1.5 |
| Bin packing | First Fit Decreasing | 11/9 · OPT + 6/9 | Nearly tight |
| Max-cut | Random or greedy flip | 1/2 | SDP-based: 0.878 (Goemans-Williamson) |
The key insight: even when greedy is not optimal, it is often the simplest algorithm with a non-trivial approximation guarantee. This makes greedy the go-to starting point for NP-hard optimization problems. You can always improve later with more sophisticated techniques (LP relaxation, semidefinite programming, local search), but greedy gives you a baseline with minimal effort.
In interviews for systems or ML roles, you will rarely need to implement these approximation algorithms. But knowing that greedy set cover exists and has an O(ln n) guarantee is the kind of algorithmic literacy that separates strong candidates from average ones in system design discussions about, say, feature selection or resource allocation.
The relationship between exact and approximate greedy is important to understand:
Exact greedy: The exchange argument proves the greedy solution is OPTIMAL. Examples: activity selection, Huffman, fractional knapsack, MST, Dijkstra. You get the best possible answer.
Approximate greedy: The analysis proves a BOUND on suboptimality. Examples: set cover (O(ln n)), vertex cover (2x), bin packing (11/9). You get a provably good answer, but not necessarily the best.
Both are greedy in strategy (make local choices, never backtrack). The difference is in what you can prove about the outcome.
| Problem | Greedy Strategy | Complexity | Optimal? |
|---|---|---|---|
| Activity selection | Earliest finish time | O(n log n) | Yes |
| Huffman coding | Merge two lowest frequencies | O(n log n) | Yes |
| Fractional knapsack | Highest value density first | O(n log n) | Yes |
| Job scheduling | Highest penalty first | O(n2) naive, O(n log n) with union-find | Yes |
| Kruskal's MST | Lightest non-cycle edge | O(E log E) | Yes |
| Prim's MST | Cheapest edge to non-tree vertex | O(E log V) | Yes |
| Dijkstra's paths | Smallest tentative distance | O(E log V) | Yes (non-negative weights) |
| Coin change (US coins) | Largest denomination first | O(n) | Yes (US coins only) |
| Set cover | Cover most uncovered elements | O(|U| · n) | No — O(ln n) approx |
CLRS Chapter 16.4 introduces matroids — the abstract mathematical structure that unifies ALL greedy algorithms that produce optimal solutions. A matroid is a pair M = (S, I) where S is a finite set and I is a family of "independent" subsets satisfying three axioms:
The fundamental theorem: a greedy algorithm on a weighted matroid always produces an optimal solution. The algorithm: sort elements by weight (descending for maximization), and greedily add each element if it keeps the set independent.
Examples of matroids in disguise:
| Problem | Ground Set S | Independent Sets I | Weight |
|---|---|---|---|
| MST (Kruskal) | Edges of the graph | Acyclic subsets (forests) | Negative edge weight (minimize) |
| Job scheduling | Jobs | Sets of jobs schedulable before deadlines | Penalties |
| Maximum weight forest | Edges | Acyclic subsets | Edge weight |
Not every greedy-solvable problem is a matroid (Huffman coding is not, for instance), but matroids provide a powerful lens for understanding when and why greedy works. If you can show your problem's feasible sets form a matroid, optimality of the greedy approach follows immediately from the theorem.
The matroid framework also explains why greedy FAILS on certain problems. The 0/1 knapsack feasible sets ({items with total weight ≤ W}) do NOT form a matroid. Specifically, the exchange axiom fails: if A = {item1} with weight 10 and B = {item2, item3} each with weight 5, and W = 10, then |A| < |B| but we cannot add either element of B to A without exceeding W. The exchange axiom is violated, so this is not a matroid, and greedy is not guaranteed optimal.
The practical takeaway: you do not need to verify matroid axioms in an interview. But understanding that matroids are the theoretical foundation of greedy optimality deepens your intuition for when greedy will and will not work. Problems with "capacity constraints" that cannot be decomposed (like 0/1 knapsack) usually fail the matroid test. Problems where "adding one more element is always possible if the set is small enough" (like forests in a graph) usually pass it.
Jobs sorted by penalty (highest first). Green slots = scheduled on time. Red = late (penalty incurred). Click "New Jobs" to regenerate.
Greedy problems are among the most common in coding interviews. They are fast to implement once you recognize the pattern, but recognizing the pattern is the hard part. This chapter is your cheat sheet.
Most greedy interview problems fall into three families. Learn the family, and you can solve any member:
Family 1: Interval scheduling. You have intervals. You want to select, merge, partition, or cover them optimally. The universal technique: sort by one endpoint (usually the end), then scan.
Family 2: Greedy on sorted arrays. Sort the input by some criterion, then make one pass making irrevocable decisions. Jump Game, Gas Station, Candy, Task Scheduler.
Family 3: Priority-queue greedy. Maintain a heap and repeatedly extract the optimal element. Huffman coding, Dijkstra, Prim's MST, merge K sorted lists, reorganize string (LC #767).
Recognizing which family a problem belongs to is 80% of the solution. The remaining 20% is choosing the right sorting criterion or heap key.
A huge family of greedy problems involves intervals. The core technique is always the same: sort by one endpoint, then scan.
| Problem | Sort By | Greedy Rule | LeetCode |
|---|---|---|---|
| Max non-overlapping intervals | End time | Pick if start ≥ last end | #435 |
| Min intervals to remove | End time | Count overlaps = n - max non-overlapping | #435 |
| Merge intervals | Start time | Extend end if overlap | #56 |
| Min meeting rooms (interval partitioning) | Start time + min-heap of end times | Reuse room if earliest end ≤ current start | #253 |
| Insert interval | Already sorted | Binary search + merge | #57 |
| Min arrows to burst balloons | End coordinate | Same as activity selection | #452 |
This is the "dual" of activity selection. Instead of maximizing activities in one room, minimize the number of rooms needed for all activities. Greedy solution: sort by start time, use a min-heap to track when each room becomes free.
python # LeetCode 253: Meeting Rooms II # Minimum number of conference rooms required import heapq def minMeetingRooms(intervals): if not intervals: return 0 intervals.sort(key=lambda x: x[0]) # sort by start rooms = [] # min-heap of end times for start, end in intervals: if rooms and rooms[0] <= start: heapq.heapreplace(rooms, end) # reuse room else: heapq.heappush(rooms, end) # need new room return len(rooms)
The greedy choice: for each meeting, check if the earliest-freeing room is available. If yes, reuse it (greedy — do not allocate a new room when an existing one works). If no, allocate a new room. The heap tracks end times, so `rooms[0]` is always the room that frees up soonest.
Why is this optimal? The number of rooms needed equals the maximum number of simultaneously occurring meetings. This is a lower bound — no algorithm can do better. Our greedy approach achieves this lower bound because it reuses rooms as aggressively as possible (always checking the earliest-freeing room first).
An alternative approach uses a sweep line: create events for each start (+1) and end (-1), sort them, scan left to right tracking the running count. The peak count is the answer. Both approaches are O(n log n).
The sweep line approach is particularly elegant because it makes the lower bound obvious. At any point in time, the number of concurrent meetings is a lower bound on rooms needed (you cannot schedule two concurrent meetings in one room). The peak concurrency is therefore a lower bound. The greedy approach achieves this bound, so it is optimal.
python # Sweep-line alternative def minMeetingRoomsSweep(intervals): events = [] for s, e in intervals: events.append((s, 1)) # meeting starts events.append((e, -1)) # meeting ends events.sort() # ties: ends before starts (process -1 before +1) peak = curr = 0 for _, delta in events: curr += delta peak = max(peak, curr) return peak
| Problem | Greedy Insight | LeetCode |
|---|---|---|
| Jump Game | Track farthest reachable index | #55 |
| Jump Game II | BFS-style: current reach, next reach | #45 |
| Gas Station | If total gas ≥ total cost, start from where tank would be most empty | #134 |
| Candy | Two passes: left-to-right and right-to-left | #135 |
| Task Scheduler | Most frequent task determines idle slots | #621 |
| Partition Labels | Track last occurrence of each char, extend partition | #763 |
python # LeetCode 435: Non-overlapping Intervals # Given intervals, return min number to remove for no overlaps def eraseOverlapIntervals(intervals): intervals.sort(key=lambda x: x[1]) # sort by end count = 0 last_end = float('-inf') for start, end in intervals: if start >= last_end: last_end = end # keep this interval else: count += 1 # remove this interval return count
python # LeetCode 55: Jump Game # Can you reach the last index? def canJump(nums): farthest = 0 for i in range(len(nums)): if i > farthest: return False # can't reach here farthest = max(farthest, i + nums[i]) return True
python # LeetCode 45: Jump Game II # Minimum number of jumps to reach the last index def jump(nums): jumps = 0 current_end = 0 # farthest we can reach with current jumps farthest = 0 # farthest we can reach with one more jump for i in range(len(nums) - 1): farthest = max(farthest, i + nums[i]) if i == current_end: # must jump now jumps += 1 current_end = farthest if current_end >= len(nums) - 1: break return jumps
This is a BFS-style greedy. Think of each jump as a "level" in BFS. At each level, you explore all positions reachable from the current level and compute the farthest position reachable at the next level. When you exhaust the current level (i == current_end), you increment jumps and extend to the next level (current_end = farthest). This guarantees minimum jumps because BFS always finds shortest paths in unweighted graphs.
python # LeetCode 134: Gas Station # Find starting station for circular route, or -1 def canCompleteCircuit(gas, cost): if sum(gas) < sum(cost): return -1 # impossible tank = 0 start = 0 for i in range(len(gas)): tank += gas[i] - cost[i] if tank < 0: start = i + 1 # can't start at or before i tank = 0 return start
The greedy insight has two parts. First: if total gas ≥ total cost, a valid starting station exists (provable by considering the cumulative surplus around the circuit — there must be a point where the running total is at its minimum, and starting just after that point keeps the tank non-negative throughout). Second: if starting from station s causes the tank to go negative at station k, then no station between s and k can be a valid start either. Why? The prefix sum from s to any intermediate station j (where s < j ≤ k) is non-negative (we had not yet gone negative at j). Starting from j gives a prefix sum at k equal to (prefix from j to k) = (prefix from s to k) - (prefix from s to j) ≤ (negative) - (non-negative) = still negative. So we skip all of them and try station k+1 next. One pass, O(n), no backtracking.
This is a subtle but powerful greedy argument. The "skip ahead" step is the key — it discards multiple candidate starting stations at once based on local information. This is characteristic of many greedy array problems: the greedy algorithm processes each element at most once, but each step may eliminate multiple future candidates.
python # Fractional knapsack — greedy is optimal def fractionalKnapsack(items, capacity): # items: [(weight, value), ...] items.sort(key=lambda x: x[1]/x[0], reverse=True) total = 0 for w, v in items: if capacity <= 0: break take = min(w, capacity) total += v * (take / w) capacity -= take return total
python # LeetCode 763: Partition Labels # Partition string so each letter appears in at most one part def partitionLabels(s): last = {ch: i for i, ch in enumerate(s)} partitions = [] start = end = 0 for i, ch in enumerate(s): end = max(end, last[ch]) # extend partition if i == end: # reached the end of current partition partitions.append(end - start + 1) start = i + 1 return partitions # "ababcbacadefegdehijhklij" -> [9, 7, 8]
The greedy insight: track the last occurrence of each character. As you scan, the current partition must extend to at least the last occurrence of every character you have seen. When your index reaches the current partition boundary, cut.
python # LeetCode 621: Task Scheduler # Minimum intervals to complete all tasks with cooldown n from collections import Counter def leastInterval(tasks, n): freq = Counter(tasks) max_freq = max(freq.values()) # Count how many tasks have the maximum frequency max_count = sum(1 for v in freq.values() if v == max_freq) # The most frequent task creates (max_freq - 1) gaps of size n # Each gap can be filled with other tasks result = (max_freq - 1) * (n + 1) + max_count return max(result, len(tasks)) # at least len(tasks)
This is greedy because the most frequent task determines the structure. You build a frame: (max_freq - 1) blocks of size (n + 1) plus one final block. The other tasks fill in the idle slots greedily.
python # LeetCode 767: Reorganize String # Rearrange so no two adjacent characters are the same import heapq from collections import Counter def reorganizeString(s): freq = Counter(s) # Check if possible: most frequent char can't exceed ceil(n/2) max_freq = max(freq.values()) if max_freq > (len(s) + 1) // 2: return "" # Max-heap (negate for min-heap) heap = [(-cnt, ch) for ch, cnt in freq.items()] heapq.heapify(heap) result = [] prev_cnt, prev_ch = 0, '' while heap: cnt, ch = heapq.heappop(heap) # Re-add the previous character if it still has count if prev_cnt < 0: heapq.heappush(heap, (prev_cnt, prev_ch)) result.append(ch) prev_cnt, prev_ch = cnt + 1, ch # decrement (negated) return ''.join(result)
This is a priority-queue greedy algorithm: always place the most frequent remaining character, but never place the same character twice in a row. The "previous character" is held out of the heap for one round, then re-added. This ensures no adjacent duplicates while using the most frequent characters as fast as possible.
Not all "Jump Game" variants are greedy. LeetCode #1306 (Jump Game III) asks: "Can you reach index 0 starting from any given index, where you can jump +arr[i] or -arr[i]?" This is BFS/DFS, not greedy — you may need to explore multiple paths. Recognizing when a problem is NOT greedy is as important as recognizing when it is.
python # LeetCode 1306: Jump Game III — BFS, NOT greedy from collections import deque def canReach(arr, start): n = len(arr) visited = set() q = deque([start]) while q: i = q.popleft() if arr[i] == 0: return True if i in visited: continue visited.add(i) for ni in [i + arr[i], i - arr[i]]: if 0 <= ni < n and ni not in visited: q.append(ni) return False
The difference: Jump Game I/II have a monotonic structure (you always move rightward, farthest-reachable is non-decreasing). This monotonicity enables greedy. Jump Game III allows bidirectional movement, breaking monotonicity and requiring full graph search.
Print this mental checklist. Run through it for every optimization problem you encounter:
| Step | Question | If YES | If NO |
|---|---|---|---|
| 1 | Can I sort the input by some criterion? | Promising for greedy. Try step 2. | Maybe DP or graph search. |
| 2 | Can I make one local choice and never undo it? | Strong greedy signal. Try step 3. | Need DP or backtracking. |
| 3 | Can I construct a quick exchange argument? | CODE IT. Greedy is almost certainly correct. | Try a small counterexample. |
| 4 | Does my counterexample break greedy? | Pivot to DP. State why greedy fails. | Greedy probably works. Code it, prove later. |
In a 45-minute interview, spending 3 minutes on this checklist saves you from 20 minutes down a wrong path. The most common mistake is jumping to DP when greedy suffices (wasting time on unnecessary complexity) or jumping to greedy when DP is needed (getting wrong answers).
| Dimension | What They Test | Example Question | Strong Answer Pattern |
|---|---|---|---|
| Concept | First-principles understanding | "When does greedy produce optimal solutions?" | State greedy-choice property + optimal substructure. Give exchange argument for a concrete example. |
| Design | Choosing the right approach | "Design a system to schedule N meeting rooms" | Recognize as interval partitioning. Sort by start, use min-heap of end times. O(n log n). |
| Code | Clean implementation | "Implement Huffman coding" | Use heapq, define comparable node class, handle edge cases (single character, empty input). |
| Debug | Finding why greedy fails | "Your greedy coin change gives wrong answer for denominations [1,5,8]" | Construct counterexample (amount 10: greedy gives 8+1+1=3, optimal is 5+5=2). Explain missing greedy-choice property. Pivot to DP. |
| Frontier | Advanced awareness | "How is Huffman used in modern compression?" | DEFLATE, JPEG, then discuss ANS as the modern successor. Mention that LLM tokenizers use BPE (a greedy algorithm). |
Mistake 1: No proof sketch. You code a greedy solution but cannot explain why it works. The interviewer asks "how do you know this is optimal?" and you say "it seems right." Always prepare a 2-sentence exchange argument: "Assume an optimal solution without our greedy choice. We can swap X for our choice without increasing cost because..."
Mistake 2: Wrong sorting criterion. You recognize a greedy problem but sort by the wrong attribute. Activity selection by duration, knapsack by value instead of density. When in doubt, try 2-3 small examples to validate your criterion before coding.
Mistake 3: Greedy when DP is needed. You force a greedy approach on a DP problem (like 0/1 knapsack or edit distance). If you cannot construct an exchange argument in 30 seconds, it is probably not greedy. Check for the telltale sign: can you make an irrevocable local choice? If the optimal decision at step i depends on future steps, greedy will not work.
Each cell shows the max jump length. Green = reachable. The moving bar shows the farthest reachable index. Can you reach the end?
Greedy algorithms do not exist in isolation. They connect deeply to dynamic programming, graph algorithms, and information theory. Here is how this chapter fits into the broader CLRS landscape and beyond.
This lesson covered the full depth of CLRS Chapter 16. We started with the activity selection problem, proved greedy correctness via the exchange argument, explored Huffman coding as a completely different application of the same greedy paradigm, drew sharp boundaries between greedy and DP using the knapsack litmus test, and built a practical interview toolkit. Along the way, we touched on matroids, approximation algorithms, and the deep connections between greedy and graph algorithms in later CLRS chapters.
| Chapter | Topic | Greedy Role |
|---|---|---|
| Ch 15 | Dynamic Programming | DP is the "safe fallback" when greedy-choice property fails. Activity selection appears in both chapters to illustrate the relationship. |
| Ch 16 | Greedy Algorithms (this lesson) | Core theory: greedy-choice property, optimal substructure, exchange arguments. |
| Ch 23 | Minimum Spanning Trees | Both Kruskal's and Prim's are greedy. The cut property is the MST exchange argument. |
| Ch 24 | Single-Source Shortest Paths | Dijkstra's algorithm is greedy (works for non-negative weights). Bellman-Ford is DP (handles negative weights). |
| Ch 25 | All-Pairs Shortest Paths | Floyd-Warshall is DP. Johnson's algorithm uses Dijkstra (greedy) after reweighting. |
Think of algorithms on a spectrum from most constrained to most flexible:
When designing an algorithm, start at the top. If greedy works, use it -- it is the simplest and fastest. If not, try DP. Only resort to backtracking if the problem has no useful substructure.
| Application | Algorithm | Why Greedy Works |
|---|---|---|
| Data compression (zip, gzip) | Huffman coding | Prefix-free codes have the greedy-choice property |
| Network routing (OSPF) | Dijkstra | Non-negative link weights guarantee greedy correctness |
| Network design (fiber optic) | Kruskal/Prim MST | Cut property ensures cheapest spanning tree |
| CPU scheduling | Shortest Job First | Minimizes average wait time (exchange argument) |
| Bandwidth allocation | Fractional knapsack | Resources are divisible |
| Compiler register allocation | Graph coloring heuristic | Greedy approximation (not always optimal) |
Several important greedy algorithms appear outside the standard algorithms curriculum but are worth knowing:
BPE (Byte Pair Encoding). The tokenizer behind GPT, LLaMA, and most modern LLMs is a greedy algorithm. It repeatedly merges the most frequent pair of adjacent tokens in the training corpus. This is structurally similar to Huffman — both build a hierarchy by merging pairs. But BPE merges the most frequent pair (top-down optimization for compression), while Huffman merges the least frequent pair (bottom-up tree construction for optimal prefix codes). The result is a vocabulary of subword tokens optimized for the training distribution. When you type "unbelievable" into ChatGPT, BPE might tokenize it as ["un", "believ", "able"] — common subwords get their own tokens, just as frequent characters get short Huffman codes.
Greedy feature selection. Forward selection in machine learning adds features one at a time, always picking the feature that most improves model accuracy on a validation set. This is greedy and not guaranteed optimal — the greedy-choice property does not hold because features can have synergistic effects (two features useless alone but powerful together). But it is fast (O(n · k) model fits for n features and k selected) and usually produces good results. Backward elimination (start with all features, greedily remove the least useful) is the dual approach.
A* search. A* is a "best-first" search that greedily expands the node with the lowest f(n) = g(n) + h(n), where g(n) is the cost so far and h(n) is the heuristic estimate of remaining cost. With an admissible heuristic (h never overestimates), A* is optimal. With a consistent heuristic (h satisfies the triangle inequality), A* never re-expands a node. This is another example of "greedy with the right criterion = optimal." Dijkstra is the special case where h(n) = 0 — no heuristic, just greedily expand the nearest unexplored node.
Greedy algorithms are beautiful when they work. But they have real limitations:
The fundamental insight of this chapter: greedy is the simplest paradigm that achieves optimality. It does not always apply. When it does, it gives you elegant, fast, simple code with provable guarantees. The skill is recognizing when the exchange argument holds — and having the discipline to reach for DP when it does not.
The takeaway: greedy is a tool, not a religion. Use it when the exchange argument holds. Reach for DP when it does not. Know the difference, and you will solve optimization problems faster than anyone in the room.
| # | Concept | One-Sentence Summary |
|---|---|---|
| 1 | Greedy-choice property | There exists an optimal solution containing the locally optimal (greedy) choice. |
| 2 | Exchange argument | Take any optimal solution; show you can swap in the greedy choice without making it worse. |
| 3 | Optimal substructure | After the greedy choice, the remaining sub-problem has the same structure. |
| 4 | Activity selection | Sort by finish time, scan once. O(n log n). Exchange argument: earliest finisher can replace any first choice. |
| 5 | Huffman coding | Merge two lowest-frequency nodes greedily. Optimal prefix-free code. O(n log n). |
| 6 | Fractional knapsack | Sort by value density, take greedily. Works because items are divisible. |
| 7 | 0/1 knapsack | Greedy fails. Need DP: O(nW) pseudo-polynomial. |
| 8 | Matroids | The abstract structure guaranteeing greedy optimality. Greedy on weighted matroid = optimal. |
| 9 | Greedy vs DP | Both need optimal substructure. Greedy additionally needs greedy-choice property. DP needs overlapping sub-problems. |
If this chapter sparked your interest, here are the most impactful next steps:
| Resource | What You Will Learn |
|---|---|
| CLRS Ch. 23 (MST) | The cut property — the most elegant exchange argument in all of algorithms. Kruskal's and Prim's algorithms with full proofs. |
| CLRS Ch. 24 (Shortest Paths) | Dijkstra's algorithm: greedy on graphs. Bellman-Ford: DP on graphs. When to use which. |
| CLRS Ch. 15 (DP) revisited | After understanding greedy, re-read Chapter 15. The contrast between "make one choice" and "consider all choices" will be crystal clear. |
| Kleinberg & Tardos Ch. 4 | More greedy algorithms with exchange argument proofs. Excellent exposition of interval scheduling variants. |
| Dasgupta, Papadimitriou & Vazirani Ch. 5 | Huffman coding, set cover, and the approximation ratio analysis. |
| LeetCode Greedy tag | 300+ problems. Start with the "Easy" ones, work up. Focus on intervals, arrays, and strings. |
If you remember only three things from this entire lesson, make them these: