CLRS Algorithms Workbook — Engineermaxxing

Chapter 0: Asymptotic Analysis

You have two algorithms for the same problem. Algorithm A runs in 3n² + 5n + 100 operations. Algorithm B runs in 0.01 · 2ⁿ operations. For n=10, A does 450 operations and B does about 10. B looks faster. But for n=50, A does 7,850 operations while B does over 10 trillion. Asymptotic analysis strips away the constants and lower-order terms so you can see which algorithm survives at scale.

Big-O (upper bound): f(n) = O(g(n)) means ∃ c > 0, n₀ ≥ 0 such that f(n) ≤ c · g(n) for all n ≥ n₀
Big-Ω (lower bound): f(n) = Ω(g(n)) means ∃ c > 0, n₀ ≥ 0 such that f(n) ≥ c · g(n) for all n ≥ n₀
Big-Θ (tight bound): f(n) = Θ(g(n)) means f(n) = O(g(n)) AND f(n) = Ω(g(n))

The intuition. O(g(n)) means "grows no faster than g(n)." Ω(g(n)) means "grows at least as fast as g(n)." Θ(g(n)) means "grows at exactly the same rate as g(n)." When someone says an algorithm is "O(n log n)," they mean the runtime grows at most as fast as n log n. If they say Θ(n log n), they mean it grows at exactly that rate — not faster, not slower.

Exercise 0.1: Is 3n² + 5n = O(n²)? Trace

To prove 3n² + 5n = O(n²), you need constants c and n₀ such that 3n² + 5n ≤ c · n² for all n ≥ n₀. Which of these works?

c = 3, n₀ = 1 — but 3n² + 5n > 3n² for all n ≥ 1, so this fails c = 4, n₀ = 5 — for n ≥ 5: 5n ≤ n², so 3n² + 5n ≤ 3n² + n² = 4n² c = 8, n₀ = 0 — 3(0)² + 5(0) = 0 ≤ 8(0)² = 0, checks out

Show reasoning

For n ≥ 5, we have 5n ≤ n² (since 5 ≤ n). Therefore 3n² + 5n ≤ 3n² + n² = 4n². So c = 4, n₀ = 5 is a valid witness. Option (a) fails because c = 3 is too small to absorb the 5n term. Option (c) uses n₀ = 0, but we need the inequality to hold for all n ≥ n₀, not just at n₀ = 0.

Exercise 0.2: Rank by Growth Rate Design

Arrange these functions from slowest growth (left) to fastest growth (right).

→

log n n n log n n² 2ⁿ n!

Show ordering

log n < n < n log n < n² < 2ⁿ < n!

This is the fundamental hierarchy. Logarithmic < linear < linearithmic < quadratic < exponential < factorial. Every algorithm designer needs this ordering burned into memory.

Exercise 0.3: Which Is Faster for n = 1000? Derive

Algorithm A: 100n log₂ n operations. Algorithm B: 2n² operations. For n = 1000, which does fewer operations? Enter the operation count for algorithm A (in thousands).

thousand operations (A)

Show derivation

A: 100 × 1000 × log₂(1000) = 100,000 × 9.97 ≈ 997,000

B: 2 × 1000² = 2,000,000

A ≈ 997K operations, B = 2000K operations. A is about 2× faster despite the 100× constant factor, because n log n grows slower than n². The crossover point is around n ≈ 54: for n < 54, B is faster (smaller constant), and for n > 54, A wins.

Exercise 0.4: Tight Bound Trace

What is the tightest asymptotic bound for f(n) = 5n³ + 2n² + 7n + 13?

O(n³) — correct but not the tightest statement O(n⁴) — technically true but unnecessarily loose Θ(n³) — it is both O(n³) and Ω(n³) Ω(n³) — correct but incomplete

Show reasoning

The dominant term is 5n³. Upper bound: 5n³ + 2n² + 7n + 13 ≤ 5n³ + 2n³ + 7n³ + 13n³ = 27n³ for n ≥ 1, so it is O(n³). Lower bound: 5n³ + 2n² + 7n + 13 ≥ 5n³ for all n ≥ 0, so it is Ω(n³). Together: Θ(n³).

Exercise 0.5: Implement isBigO() Build

Write a function that empirically tests whether f(n) = O(g(n)) by checking if f(n)/g(n) stays bounded as n grows. Evaluate f and g at n = 100, 1000, 10000, 100000. If the ratio f(n)/g(n) is decreasing or bounded (max ratio < 1000), return true.

Return a boolean. The ratio should stay bounded, not grow without limit.

Show solution

javascript
function isBigO(f, g) {
  const ns = [100, 1000, 10000, 100000];
  let maxRatio = 0;
  for (const n of ns) {
    const ratio = f(n) / g(n);
    if (ratio > maxRatio) maxRatio = ratio;
  }
  return maxRatio < 1000;
}

Exercise 0.6: Max Operations in 1 Second Derive

A computer does 10⁹ operations per second. What is the largest n you can handle in 1 second with an O(n²) algorithm?

max n (approximate)

Show derivation

n² ≤ 10⁹

n ≤ √(10⁹) = 10^4.5 ≈ 31,623

For O(n log n): n ≈ 4 × 10⁷. For O(n³): n ≈ 1000. For O(2ⁿ): n ≈ 30. This is why complexity class matters: the difference between n² and n³ is the difference between handling 31K elements vs 1K elements in a second.

Chapter 1: Recurrences

Divide-and-conquer algorithms split problems into subproblems, solve each recursively, then combine the results. The runtime of such algorithms follows a recurrence relation. The Master Theorem gives you the answer instantly for a wide class of recurrences — no guessing, no induction proofs needed.

Master Theorem: For T(n) = aT(n/b) + Θ(n^d) where a ≥ 1, b > 1, d ≥ 0:

Case 1: If d < log_b a → T(n) = Θ(n^{log_b a}) // leaves dominate
Case 2: If d = log_b a → T(n) = Θ(n^d log n) // balanced
Case 3: If d > log_b a → T(n) = Θ(n^d) // root dominates

The intuition. At each level, you do n^d work and create a subproblems of size n/b. The number of leaves in the recursion tree is a^{log_b n} = n^{log_b a}. The Master Theorem compares the work at the root (n^d) to the number of leaves (n^{log_b a}). Whichever grows faster dominates.

Exercise 1.1: Merge Sort Recurrence Trace

T(n) = 2T(n/2) + Θ(n). Here a = 2, b = 2, d = 1. What is log_b a, and which case applies?

log₂ 2 = 1, d = 1 → Case 1 (d < log_ba) → Θ(n) log₂ 2 = 1, d = 1 → Case 2 (d = log_ba) → Θ(n log n) log₂ 2 = 1, d = 1 → Case 3 (d > log_ba) → Θ(n)

Show reasoning

a = 2, b = 2, d = 1. log₂ 2 = 1 = d. This is Case 2, so T(n) = Θ(n¹ log n) = Θ(n log n). The tree has log₂ n levels, each doing Θ(n) total merge work, for a total of Θ(n log n). This is the classic merge sort runtime.

Exercise 1.2: T(n) = 4T(n/2) + n² Derive

Apply the Master Theorem: a = 4, b = 2, d = 2. Compute log_b a and determine T(n).

log₂ 4 = ?

Show derivation

a = 4, b = 2, d = 2

log₂ 4 = 2 = d

Since d = log_b a, this is Case 2: T(n) = Θ(n² log n).

At each level, the total work quadruples (4 subproblems) but each subproblem is half the size (squared input = quarter the work). These effects exactly balance, giving log n levels of Θ(n²) work each.

Exercise 1.3: T(n) = 9T(n/3) + n Trace

a = 9, b = 3, d = 1. log₃ 9 = 2 > 1 = d. Which case and what is T(n)?

Case 1: d < log_b a → T(n) = Θ(n^{log₃ 9}) = Θ(n²) Case 2: d = log_b a → T(n) = Θ(n log n) Case 3: d > log_b a → T(n) = Θ(n)

Show reasoning

log₃ 9 = 2 > 1 = d, so Case 1 applies. The leaves dominate. There are 9^{log₃ n} = n² leaves, each doing Θ(1) work. The combine work at each level (n^d = n) is overwhelmed by the explosive branching. T(n) = Θ(n²).

Exercise 1.4: Merge Sort Levels for n = 16 Derive

Merge sort splits n = 16 elements in half at each level. How many levels of recursion are there, and what is the total merge work across all levels?

total merge comparisons (approx)

Show derivation

Levels = log₂(16) = 4

Level 0: merge 1 pair of size 16 → 16 comparisons

Level 1: merge 2 pairs of size 8 → 16 comparisons

Level 2: merge 4 pairs of size 4 → 16 comparisons

Level 3: merge 8 pairs of size 2 → 16 comparisons

Total = 4 × 16 = 64 ≈ n log₂ n = 16 × 4 = 64

Each level does exactly n comparisons total (each element participates in one merge). With log₂ n levels, the total is n log₂ n. This is the cleanest proof of merge sort's Θ(n log n) complexity.

Exercise 1.5: Binary Search Recurrence Derive

Binary search: T(n) = T(n/2) + Θ(1). What are a, b, d? Apply the Master Theorem.

log₂ a = ?

Show derivation

a = 1, b = 2, d = 0

log₂ 1 = 0 = d

Case 2: T(n) = Θ(n⁰ log n) = Θ(log n). One subproblem, half the size, constant work per level. The tree has log₂ n levels, each with Θ(1) work.

Exercise 1.6: Strassen's Matrix Multiply Trace

Strassen: T(n) = 7T(n/2) + Θ(n²). a = 7, b = 2, d = 2. log₂ 7 ≈ 2.81. Which case?

Case 1: d = 2 < log₂ 7 ≈ 2.81 → T(n) = Θ(n^2.81) Case 2: d ≈ log₂ 7 → T(n) = Θ(n² log n) Case 3: d > log₂ 7 → T(n) = Θ(n²)

Show reasoning

d = 2 < log₂ 7 ≈ 2.807, so Case 1 applies. T(n) = Θ(n^{log₂ 7}) ≈ Θ(n^2.81). Strassen reduces the exponent from 3 (naive) to 2.81 by doing 7 recursive multiplications instead of 8. The savings become significant for large matrices.

Chapter 2: Sorting

Sorting is the most studied problem in computer science. Comparison-based sorts cannot do better than Θ(n log n), but non-comparison sorts like counting sort can achieve Θ(n + k) by exploiting the structure of the input. These exercises test whether you can trace each algorithm by hand and understand why each step happens.

Exercise 2.1: Quicksort Partition Derive

Trace Lomuto partition on [3, 8, 2, 5, 1, 4, 7, 6] with pivot = last element (6). The partition places all elements ≤ pivot before it and all elements > pivot after. After partitioning, what is the index (0-based) of the pivot?

pivot final index

Show trace

Lomuto partition: i starts at lo - 1 = -1, j scans from lo to hi - 1. If A[j] ≤ pivot, increment i and swap A[i] with A[j].

Initial: [3, 8, 2, 5, 1, 4, 7, 6] pivot = A[7] = 6, i = -1
j=0: A[0]=3 ≤ 6 → i=0, swap(0,0) → [3, 8, 2, 5, 1, 4, 7, 6]
j=1: A[1]=8 > 6 → skip
j=2: A[2]=2 ≤ 6 → i=1, swap(1,2) → [3, 2, 8, 5, 1, 4, 7, 6]
j=3: A[3]=5 ≤ 6 → i=2, swap(2,3) → [3, 2, 5, 8, 1, 4, 7, 6]
j=4: A[4]=1 ≤ 6 → i=3, swap(3,4) → [3, 2, 5, 1, 8, 4, 7, 6]
j=5: A[5]=4 ≤ 6 → i=4, swap(4,5) → [3, 2, 5, 1, 4, 8, 7, 6]
j=6: A[6]=7 > 6 → skip
Final: swap A[i+1]=A[5] with A[7] (pivot) → [3, 2, 5, 1, 4, 6, 7, 8]
Pivot index = i + 1 = 5

Five elements [3,2,5,1,4] are all ≤ 6, placed at indices 0-4. Pivot 6 goes to index 5. Elements [7,8] are at indices 6-7. The partition made 7 comparisons (one per element except the pivot).

Exercise 2.2: Quicksort Comparisons Derive

In the partition above (n = 8 elements), the pivot is compared to every other element exactly once. How many comparisons does the partition step perform?

comparisons

Show derivation

Comparisons = n - 1 = 8 - 1 = 7

Lomuto partition compares each element (except the pivot itself) to the pivot exactly once. For n elements, that is n - 1 comparisons per partition call. In the best case (balanced splits), quicksort does Θ(n log n) total. In the worst case (already sorted + last-element pivot), the splits are maximally unbalanced: n-1, n-2, ..., 1 comparisons = n(n-1)/2 = Θ(n²).

Exercise 2.3: Counting Sort Derive

Run counting sort on A = [4, 2, 2, 8, 3, 3, 1]. Values range from 0 to 8, so the count array has 9 slots. After building the count array (before prefix sums), what is count[3]?

count[3]

Show trace

A = [4, 2, 2, 8, 3, 3, 1]

count[0]=0, count[1]=1, count[2]=2, count[3]=2, count[4]=1, count[5]=0, count[6]=0, count[7]=0, count[8]=1

count[3] = 2 because the value 3 appears twice in A. After computing prefix sums, count[i] tells the position where element i should go in the output. Counting sort runs in Θ(n + k) where k is the range of values. It is NOT comparison-based, so it bypasses the Ω(n log n) lower bound.

Exercise 2.4: Implement partition() Build

Implement Lomuto partition. Use the last element as pivot. Return the final index of the pivot.

Modify arr in-place and return the pivot index.

Show solution

javascript
function partition(arr, lo, hi) {
  const pivot = arr[hi];
  let i = lo - 1;
  for (let j = lo; j < hi; j++) {
    if (arr[j] <= pivot) {
      i++;
      [arr[i], arr[j]] = [arr[j], arr[i]];
    }
  }
  [arr[i + 1], arr[hi]] = [arr[hi], arr[i + 1]];
  return i + 1;
}

Exercise 2.5: Find the Bug in Merge Sort Debug

This merge function produces wrong results on some inputs. Click the buggy line.

function merge(arr, lo, mid, hi) {
  let left = arr.slice(lo, mid + 1);
  let right = arr.slice(mid + 1, hi + 1);
  let i = 0, j = 0, k = lo;
  while (i < left.length && j < right.length) {
    if (left[i] < right[j]) {
      arr[k++] = left[i++];
    } else {
      arr[k++] = right[j++];
    }
  }
  while (i < left.length) arr[k++] = left[i++];
}

Show explanation

Line 13 is the bug — there is no loop to copy remaining elements from the right subarray. When all left elements are placed but right elements remain, those right elements are lost. The fix: add while (j < right.length) arr[k++] = right[j++]; before the closing brace.

Exercise 2.6: Comparison-Based Lower Bound Trace

Any comparison-based sort must make at least ⌈log₂(n!)⌉ comparisons in the worst case. For n = 8, what is this lower bound?

15 comparisons 16 comparisons — since log₂(8!) = log₂(40320) ≈ 15.3, ceiling gives 16 24 comparisons

Show derivation

8! = 40,320

log₂(40,320) ≈ 15.30

⌈15.30⌉ = 16

The decision tree for sorting n elements has n! leaves (one per permutation). A binary tree with n! leaves has height at least ⌈log₂(n!)⌉. By Stirling's approximation: log₂(n!) ≈ n log₂ n - n log₂ e ≈ Θ(n log n). This proves the Ω(n log n) lower bound for comparison sorts.

Chapter 3: Hash Tables

Hash tables give O(1) average-case lookup, insert, and delete. But that "average" hides important details: collision resolution, load factor, and the gap between theory and practice. These exercises test whether you can trace hash operations by hand and predict performance under load.

Exercise 3.1: Division Method Hashing Derive

Hash function h(k) = k mod 13. Insert keys in order: [5, 28, 19, 15, 20, 33, 12, 17] into a table of size 13 (indices 0-12) with chaining. How many keys hash to slot 7?

keys in slot 7

Show trace

h(5) = 5 mod 13 = 5

h(28) = 28 mod 13 = 2

h(19) = 19 mod 13 = 6

h(15) = 15 mod 13 = 2 // collision with 28

h(20) = 20 mod 13 = 7

h(33) = 33 mod 13 = 7 // collision with 20

h(12) = 12 mod 13 = 12

h(17) = 17 mod 13 = 4

Slot 7 contains {20, 33} — 2 keys. Slot 2 also has 2 keys {28, 15}. With chaining, collisions just extend the linked list at that slot. The load factor α = 8/13 ≈ 0.62.

Exercise 3.2: Linear Probing Derive

Same keys [5, 28, 19, 15, 20, 33, 12, 17], h(k) = k mod 13, but now use open addressing with linear probing (if slot is full, try next slot). After inserting all 8 keys, which slot does key 33 occupy?

slot index (0-12)

Show trace

Insert 5 → slot 5 (empty) ✓

Insert 28 → slot 2 (empty) ✓

Insert 19 → slot 6 (empty) ✓

Insert 15 → slot 2 (full) → 3 (empty) ✓

Insert 20 → slot 7 (empty) ✓

Insert 33 → slot 7 (full) → 8 (empty) ✓

Insert 12 → slot 12 (empty) ✓

Insert 17 → slot 4 (empty) ✓

Key 33 hashes to 7, which is occupied by 20. Linear probing tries 8, which is empty. So 33 sits at slot 8. Notice the cluster forming at slots 2-3 and 5-8. This clustering is the Achilles' heel of linear probing — clusters grow and merge, degrading performance.

Exercise 3.3: Expected Probes Derive

For open addressing with uniform hashing, the expected number of probes for an unsuccessful search is 1/(1 - α), where α is the load factor. For α = 0.7, how many probes are expected?

expected probes

Show derivation

Expected probes = 1 / (1 - α) = 1 / (1 - 0.7) = 1 / 0.3 ≈ 3.33

At 70% load, you expect about 3.3 probes per lookup. At 90% load: 1/0.1 = 10 probes. At 95%: 20 probes. This is why practical hash tables resize when α exceeds 0.7-0.8. The formula assumes uniform hashing (each slot equally likely), which linear probing does NOT satisfy — linear probing clusters, making real performance worse than this formula predicts.

Exercise 3.4: Implement hashInsert() Build

Implement insert for open addressing with linear probing. The table is an array of size m; empty slots are null.

Return the index where the key ends up.

Show solution

javascript
function hashInsert(table, key) {
  const m = table.length;
  let idx = key % m;
  for (let i = 0; i < m; i++) {
    const probe = (idx + i) % m;
    if (table[probe] === null) {
      table[probe] = key;
      return probe;
    }
  }
  return -1;
}

Exercise 3.5: Load Factor and Resizing Trace

A hash table has 16 slots and 12 elements (α = 0.75). You insert one more, triggering a resize to 32 slots. What is the new load factor?

0.375 — the load factor halves 13/32 ≈ 0.406 — 13 elements in 32 slots 0.75 — load factor is preserved during resize

Show reasoning

After inserting, there are 13 elements. The table resizes to 32 slots. New α = 13/32 = 0.40625. The resize roughly halves the load factor (from ~0.81 to ~0.41), which restores fast O(1) operations. The amortized cost of all the resizes over a sequence of n insertions is O(n) total, or O(1) per insertion.

Chapter 5: Dynamic Programming

DP transforms exponential brute force into polynomial time by storing solutions to subproblems. These exercises require you to fill actual DP tables by hand — not just recognize the pattern, but compute every cell.

Exercise 5.1: LCS Table — Fill It Derive

Compute the LCS (Longest Common Subsequence) of X = "ABCBDAB" and Y = "BDCAB". Fill the DP table. What is the length of the LCS?

Recurrence:
c[i][j] = 0 if i = 0 or j = 0
c[i][j] = c[i-1][j-1] + 1 if X[i] = Y[j]
c[i][j] = max(c[i-1][j], c[i][j-1]) if X[i] ≠ Y[j]

LCS length

Show table

      "" B D C A B
"" 0 0 0 0 0 0
A    0 0 0 0 1 1
B    0 1 1 1 1 2
C    0 1 1 2 2 2
B    0 1 1 2 2 3
D    0 1 2 2 2 3
A    0 1 2 2 3 3
B    0 1 2 2 3 4

LCS length = c[7][5] = 4. The LCS itself is "BCAB" (trace back from c[7][5] following the arrows: diagonal when characters match, up/left otherwise). Time: Θ(mn) where m = 7, n = 5, so 35 cells. Space: Θ(mn) for the full table, but Θ(min(m,n)) if you only need the length (keep two rows).

Exercise 5.2: Rod Cutting Derive

Prices: p = [1, 5, 8, 9, 10, 17, 17, 20] (indices 1-8). What is the maximum revenue for a rod of length 4?

r[j] = max_{1 ≤ i ≤ j} (p[i] + r[j - i]), r[0] = 0

max revenue ($)

Show table

r[0] = 0

r[1] = p[1] + r[0] = 1 + 0 = 1

r[2] = max(p[1]+r[1], p[2]+r[0]) = max(1+1, 5+0) = max(2, 5) = 5

r[3] = max(p[1]+r[2], p[2]+r[1], p[3]+r[0]) = max(1+5, 5+1, 8+0) = max(6, 6, 8) = 8

r[4] = max(p[1]+r[3], p[2]+r[2], p[3]+r[1], p[4]+r[0]) = max(1+8, 5+5, 8+1, 9+0) = max(9, 10, 9, 9) = 10

Maximum revenue for rod length 4 is $10, achieved by cutting into two pieces of length 2 (each worth $5). Note that p[4] = $9 is less than the optimal $10, so leaving the rod uncut is suboptimal.

Exercise 5.3: Implement lcs() Build

Write a function that computes the length of the Longest Common Subsequence of two strings.

Use bottom-up DP with a 2D table.

Show solution

javascript
function lcs(a, b) {
  const m = a.length, n = b.length;
  const dp = Array.from({length: m + 1}, () => Array(n + 1).fill(0));
  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      if (a[i - 1] === b[j - 1]) {
        dp[i][j] = dp[i - 1][j - 1] + 1;
      } else {
        dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
      }
    }
  }
  return dp[m][n];
}

Exercise 5.4: 0/1 Knapsack Derive

Items: weights = [2, 3, 4, 5], values = [3, 4, 5, 6]. Knapsack capacity W = 5. What is the maximum value you can carry?

max value

Show table

K[i][w] = max value using items 1..i with capacity w

w=0 1 2 3 4 5
i=0 0 0 0 0 0 0
i=1 0 0 3 3 3 3 // item 1: w=2, v=3
i=2 0 0 3 4 4 7 // item 2: w=3, v=4 → K[2][5]=max(K[1][5], 4+K[1][2])=max(3,7)=7
i=3 0 0 3 4 5 7 // item 3: w=4, v=5
i=4 0 0 3 4 5 7 // item 4: w=5, v=6 → K[4][5]=max(K[3][5], 6+K[3][0])=max(7,6)=7

Maximum value = 7, achieved by taking items 1 and 2 (weights 2 + 3 = 5, values 3 + 4 = 7). Item 4 alone gives value 6 for weight 5, which is worse. The 0/1 knapsack has Θ(nW) time, which is pseudo-polynomial — it depends on the value of W, not its bit length.

Exercise 5.5: DP vs Greedy Trace

A greedy approach to the knapsack above would pick the item with the best value/weight ratio first. Item 1: 3/2 = 1.5, Item 2: 4/3 = 1.33, Item 3: 5/4 = 1.25, Item 4: 6/5 = 1.2. Greedy picks item 1 (w=2, v=3), then item 2 (w=3, v=4), total = 7. Does greedy always find the optimal 0/1 knapsack solution?

Yes — greedy by value/weight ratio always works for knapsack No — greedy works for fractional knapsack but fails for 0/1. Counterexample: items [(w=3,v=4), (w=2,v=3), (w=2,v=3)] with W=4. Greedy picks item 1 (ratio 1.33), value=4. Optimal: items 2+3, value=6 It depends on the input size

Exercise 5.6: Find the Bug in LCS Debug

This LCS function returns wrong values. Click the buggy line.

function lcs(a, b) {
  const m = a.length, n = b.length;
  const dp = Array.from({length: m+1}, () => Array(n+1).fill(0));
  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      if (a[i-1] === b[j-1]) {
        dp[i][j] = dp[i-1][j-1] + 1;
      } else {
        dp[i][j] = dp[i-1][j] + dp[i][j-1];
      }
    }
  }
  return dp[m][n];
}

Show explanation

Line 9 is the bug. When characters do not match, the recurrence should be Math.max(dp[i-1][j], dp[i][j-1]), not dp[i-1][j] + dp[i][j-1]. Adding the two values instead of taking the max double-counts shared subproblems and produces inflated results.

Chapter 6: Greedy Algorithms

Greedy algorithms make the locally optimal choice at each step and never reconsider. When they work, they are simpler and faster than DP. The trick is proving they actually produce a globally optimal solution. These exercises test both the mechanics and the reasoning.

Exercise 6.1: Activity Selection Derive

Activities with (start, finish) times: a1(1,4), a2(3,5), a3(0,6), a4(5,7), a5(3,9), a6(5,9), a7(6,10), a8(8,11), a9(8,12), a10(2,14), a11(12,16). Sort by finish time and greedily select non-overlapping activities. How many can you select?

activities selected

Show trace

Sorted by finish time: a1(1,4), a2(3,5), a3(0,6), a4(5,7), a5(3,9), a6(5,9), a7(6,10), a8(8,11), a9(8,12), a10(2,14), a11(12,16).

Select a1(1,4) ✓ last_finish = 4
a2(3,5): start 3 < 4 → skip
a3(0,6): start 0 < 4 → skip
Select a4(5,7) ✓ last_finish = 7
a5(3,9): start 3 < 7 → skip
a6(5,9): start 5 < 7 → skip
a7(6,10): start 6 < 7 → skip
Select a8(8,11) ✓ last_finish = 11
a9(8,12): start 8 < 11 → skip
a10(2,14): start 2 < 11 → skip
Select a11(12,16) ✓ last_finish = 16

4 activities selected: {a1, a4, a8, a11}. The greedy choice of earliest-finish-time is provably optimal via an exchange argument: swapping any selected activity for a later-finishing one can only reduce (never increase) the number of subsequent compatible activities.

Exercise 6.2: Huffman Coding — Build Tree Derive

Character frequencies: a:5, b:9, c:12, d:13, e:16, f:45. Build the Huffman tree by repeatedly merging the two lowest-frequency nodes. What is the total number of bits to encode a message where each character appears exactly as many times as its frequency?

total bits

Show tree construction

Step 1: Merge a(5) + b(9) = 14 → [c:12, d:13, ab:14, e:16, f:45]
Step 2: Merge c(12) + d(13) = 25 → [ab:14, e:16, cd:25, f:45]
Step 3: Merge ab(14) + e(16) = 30 → [cd:25, abe:30, f:45]
Step 4: Merge cd(25) + abe(30) = 56 → [f:45, cdabe:56]
Step 5: Merge f(45) + cdabe(56) = 100 → root

Code lengths: f=1, c=3, d=3, a=4, b=4, e=3.

Total bits = 45×1 + 12×3 + 13×3 + 5×4 + 9×4 + 16×3 = 45 + 36 + 39 + 20 + 36 + 48 = 224

Fixed-length codes would need ⌈log₂ 6⌉ = 3 bits per character, totaling 100 × 3 = 300 bits. Huffman saves 76 bits (25%) by giving shorter codes to more frequent characters.

Exercise 6.3: Huffman Code Length Trace

In the Huffman tree above, what is the code length (depth in the tree) for character 'f' with frequency 45?

1 bit — 'f' is the most frequent character, so it gets the shortest code 2 bits — typical for a 6-character alphabet 3 bits — same as fixed-length coding

Exercise 6.4: Arrange Greedy Algorithm Steps Design

Put these steps of the activity selection algorithm in the correct order.

→

Sort by finish time Select first activity Scan: skip overlapping, select compatible Return selected set

Exercise 6.5: Greedy Correctness Trace

Why does sorting by START time (instead of finish time) NOT yield an optimal solution for activity selection?

It always gives the same answer as sorting by finish time A long activity starting early can block many short activities. E.g., activities (0,100), (1,2), (3,4), (5,6): earliest-start picks only (0,100), missing 3 activities Sorting by start time is O(n²) instead of O(n log n)

Chapter 7: Graph Algorithms

BFS and DFS are the two fundamental graph traversals. BFS explores level by level (finding shortest paths in unweighted graphs). DFS explores as deep as possible (enabling cycle detection, topological sort, and SCC decomposition). These exercises require you to trace both algorithms on concrete graphs.

Graph for exercises 7.1-7.3. Directed graph with vertices {A, B, C, D, E, F} and edges: A→B, A→C, B→D, B→E, C→E, D→F, E→F.

Exercise 7.1: BFS Distances Derive

Run BFS from source A on the graph above. What is the shortest distance (number of edges) from A to F?

edges

Show BFS trace

Queue: [A] dist: A=0
Dequeue A, enqueue B(1), C(1) Queue: [B, C]
Dequeue B, enqueue D(2), E(2) Queue: [C, D, E]
Dequeue C, E already discovered Queue: [D, E]
Dequeue D, enqueue F(3) Queue: [E, F]
Dequeue E, F already discovered Queue: [F]
Dequeue F Queue: []

Distances: A=0, B=1, C=1, D=2, E=2, F=3

Shortest path A→F has 3 edges. Two paths achieve this: A→B→D→F and A→B→E→F. BFS guarantees shortest paths in unweighted graphs because it discovers all distance-k vertices before any distance-(k+1) vertex.

Exercise 7.2: DFS Edge Classification Trace

Run DFS from A (visiting neighbors in alphabetical order). The edge C→E leads to E, which was already discovered during B's exploration. What type of edge is C→E?

Tree edge — E is discovered for the first time via C Back edge — E is an ancestor of C in the DFS tree Forward or cross edge — E was already fully processed (black) when C examines it, and E is not an ancestor of C

Show DFS trace

DFS(A): discover A, visit B
  DFS(B): discover B, visit D
    DFS(D): discover D, visit F
      DFS(F): discover F, finish F
    finish D
  visit E
    DFS(E): discover E, visit F (already finished) → forward edge
    finish E
  finish B
visit C
  DFS(C): discover C, visit E (already finished) → cross edge
  finish C
finish A

C→E is a cross edge: E is in a different branch of the DFS tree (B's subtree) and is already finished when C examines it. Back edges point to ancestors (still gray/in-progress). A directed graph has a cycle if and only if DFS finds a back edge.

Exercise 7.3: Topological Sort Derive

The graph above is a DAG (no cycles). A topological sort orders vertices so that for every edge u→v, u appears before v. Using DFS finish times (reverse order), list a valid topological ordering. How many valid topological orderings exist for this graph?

valid orderings

Show orderings

One valid topological order (by reverse DFS finish time): A, C, B, E, D, F. The 4 valid orderings are:

1. A, B, C, D, E, F
2. A, B, C, E, D, F
3. A, B, D, C, E, F
4. A, C, B, D, E, F

A must come first (no incoming edges). F must come last (only incoming edges). B and C can be swapped when both follow A. D and E can be reordered under constraints. The total of 4 can be verified by enumeration. Topological sort runs in Θ(V + E).

Exercise 7.4: Implement bfs() Build

Write BFS that returns the shortest distance from a source to every reachable vertex. The graph is given as an adjacency list (object mapping vertex to array of neighbors).

Use a queue (array with shift/push is fine for this size).

Show solution

javascript
function bfs(graph, source) {
  const dist = {};
  dist[source] = 0;
  const queue = [source];
  while (queue.length > 0) {
    const u = queue.shift();
    for (const v of (graph[u] || [])) {
      if (dist[v] === undefined) {
        dist[v] = dist[u] + 1;
        queue.push(v);
      }
    }
  }
  return dist;
}

Exercise 7.5: Cycle Detection Trace

You add the edge F→A to the graph. DFS from A now discovers that A is still gray (in-progress) when F examines it. What does this mean?

The graph has a cycle (A→B→D→F→A). A back edge to a gray vertex proves a cycle exists. The graph is no longer a DAG. F→A is a cross edge — no cycle The graph is still a DAG because F comes after A alphabetically

Chapter 8: Shortest Paths & MST

Weighted graphs add a new dimension: finding the cheapest path, not just the shortest by hops. Dijkstra handles non-negative weights. Bellman-Ford handles negative weights and detects negative cycles. For minimum spanning trees, Kruskal's sorts edges and uses union-find, while Prim's grows a tree from a source.

Graph for exercises 8.1-8.2. Vertices: {A, B, C, D, E}. Edges with weights: A→B(10), A→C(3), B→C(1), B→D(2), C→B(4), C→D(8), C→E(2), D→E(7), E→D(9).

Exercise 8.1: Dijkstra Trace Derive

Run Dijkstra's algorithm from source A on the graph above. What is the shortest distance from A to D?

shortest distance A→D

Show Dijkstra trace

Init: dist = {A:0, B:∞, C:∞, D:∞, E:∞}

Extract A (d=0):
  Relax A→B: 0+10=10 < ∞ → dist[B]=10
  Relax A→C: 0+3=3 < ∞ → dist[C]=3

Extract C (d=3):
  Relax C→B: 3+4=7 < 10 → dist[B]=7
  Relax C→D: 3+8=11 < ∞ → dist[D]=11
  Relax C→E: 3+2=5 < ∞ → dist[E]=5

Extract E (d=5):
  Relax E→D: 5+9=14 > 11 → no update

Extract B (d=7):
  Relax B→C: 7+1=8 > 3 → no update
  Relax B→D: 7+2=9 < 11 → dist[D]=9

Extract D (d=9): done

Final: A=0, C=3, E=5, B=7, D=9

Shortest path to D: A→C→B→D with cost 3+4+2 = 9. The direct A→B→D costs 10+2 = 12. Going through C first is cheaper despite the detour. Dijkstra with a binary heap runs in O((V+E) log V).

Exercise 8.2: Bellman-Ford Negative Cycle Trace

You add edge D→C with weight -15 to the graph. After running V-1 = 4 rounds of Bellman-Ford, you perform one more relaxation pass. If any distance decreases, there is a negative cycle. Does this graph have a negative cycle?

Yes — C→B→D→C has total weight 4+2+(-15) = -9. Going around this cycle reduces distances indefinitely. No — negative edges do not imply negative cycles Cannot determine without running the algorithm fully

Show reasoning

The cycle C→B→D→C has total weight 4 + 2 + (-15) = -9 < 0. This is a negative-weight cycle. Any path that includes this cycle can be made arbitrarily cheap by going around it more times. Bellman-Ford detects this: after V-1 relaxation passes, a (V)th pass still finds improvements, which is the signal. Dijkstra CANNOT handle negative edges and would give wrong answers on this graph.

Exercise 8.3: Kruskal's MST Derive

Undirected graph: vertices {A,B,C,D,E}, edges with weights: AB(4), AC(2), BC(1), BD(5), CD(8), CE(10), DE(2). Sort edges by weight and apply Kruskal's algorithm. What is the total weight of the MST?

MST total weight

Show Kruskal trace

Sorted edges: BC(1), AC(2), DE(2), AB(4), BD(5), CD(8), CE(10)

BC(1): B,C in different sets → add ✓ MST = {BC}
AC(2): A,C in different sets → add ✓ MST = {BC, AC}
DE(2): D,E in different sets → add ✓ MST = {BC, AC, DE}
AB(4): A,B in same set (via A-C-B) → skip — would create cycle
BD(5): B,D in different sets → add ✓ MST = {BC, AC, DE, BD}

4 edges connect 5 vertices → MST complete. Total = 1 + 2 + 2 + 4 = 9.

MST total weight = 1 + 2 + 2 + 4 = 9. Kruskal's greedily adds the cheapest edge that does not create a cycle. Union-find enables cycle detection in nearly O(1) amortized time. The total runtime is O(E log E) dominated by sorting.

Exercise 8.4: Prim's vs Kruskal's Trace

Which statement about Prim's and Kruskal's algorithms is correct?

They always produce the same MST Prim's is always faster Both produce valid MSTs (same total weight) but may select different edges when ties exist. Prim's with binary heap is O((V+E) log V); Kruskal's is O(E log E). Prim's is better for dense graphs, Kruskal's for sparse.

Exercise 8.5: Dijkstra — Find the Bug Debug

This Dijkstra implementation sometimes gives wrong shortest distances. Click the buggy line.

function dijkstra(graph, source) {
  let dist = {}, visited = new Set();
  for (let v in graph) dist[v] = Infinity;
  dist[source] = 0;
  while (visited.size < Object.keys(graph).length) {
    let u = null, minD = Infinity;
    for (let v in dist) {
      if (!visited.has(v) && dist[v] < minD) { u = v; minD = dist[v]; }
    }
    if (u === null) break;
    visited.add(u);
    for (let [v, w] of graph[u]) {
      if (dist[u] + w < dist[v]) dist[v] = dist[u] + w;
    }
  }
  return dist;
}

Show explanation

Line 13 is the bug. The relaxation updates dist[v] even when v has already been visited. While this code looks like it might work (it does for non-negative weights), the correct Dijkstra skips already-visited vertices in the neighbor loop: if (!visited.has(v) && dist[u] + w < dist[v]). Without this check, previously finalized vertices can have their distances incorrectly re-updated if the graph has certain edge patterns, and with negative weights this can loop infinitely.

Exercise 8.6: Dijkstra Runtime Derive

A graph has V = 1000 vertices and E = 5000 edges. What is the runtime of Dijkstra with a binary heap, expressed as a number of operations? Use the formula (V + E) log V.

approximate operations (thousands)

Show derivation

(V + E) × log₂ V = (1000 + 5000) × log₂(1000) = 6000 × 9.97 ≈ 59,800

≈ 60K operations. Compare to the naive O(V²) implementation: 1,000,000 operations. The binary heap saves ~17× on this sparse graph. For dense graphs (E ≈ V²), the naive version is better because the heap overhead is not worth it.

Topic	Lesson
Dynamic Programming (CLRS Ch 15)	Dynamic Programming — From Absolute Zero
Graph Algorithms (CLRS Ch 22)	Graph Algorithms — From Absolute Zero
Shortest Paths (CLRS Ch 24)	Shortest Paths — From Absolute Zero
Transformer Math Workbook	Transformer Math Workbook

CLRS Algorithms

Chapter 0: Asymptotic Analysis

Chapter 1: Recurrences

Chapter 2: Sorting

Chapter 3: Hash Tables

Chapter 4: Binary Search Trees

Chapter 5: Dynamic Programming

Chapter 6: Greedy Algorithms

Chapter 7: Graph Algorithms

Chapter 8: Shortest Paths & MST

Chapter 9: Capstone

Related Lessons