Dynamic Programming (CLRS Ch15) — From Absolute Zero to Mastery

Chapter 0: The Problem

You run a steel company. A customer delivers a rod of length n inches, and you need to cut it into shorter pieces to sell. Different lengths fetch different prices. A 1-inch piece sells for $1, a 2-inch piece for $5, a 3-inch piece for $8, and so on. Your job: figure out where to cut to maximize total revenue.

Sounds simple. A rod of length 4 can be cut in many ways: leave it whole (4 inches), cut into 1+3, cut into 2+2, cut into 1+1+2, or cut into 1+1+1+1. Each way produces different revenue. You want the best one.

Here is the catch. A rod of length n has n-1 possible cut positions. At each position, you either cut or you do not. That gives 2^n-1 possible ways to cut the rod. For n=20, that is over half a million. For n=40, it is over a trillion. Brute force is hopeless.

The Key Observation

But look more carefully. Suppose the optimal solution for a rod of length 7 begins with a cut at position 3. Then the remaining piece of length 4 must ALSO be cut optimally. Why? Proof by contradiction: if you could cut the 4-inch remainder better, you would replace the suboptimal cuts with the better ones and get a higher total revenue for the full 7-inch rod — contradicting the assumption that the original solution was optimal.

This argument works for any problem where "locally optimal sub-solutions" compose into "globally optimal solutions." It has a name, and it is the foundation of everything in this lesson.

Optimal substructure. An optimal solution to the whole problem CONTAINS optimal solutions to its subproblems. This is the first ingredient of dynamic programming. The second ingredient — overlapping subproblems — is what makes DP dramatically faster than brute force. The recursion tree below reveals why.

The Naive Recursion

Let us write the most natural solution. To find the maximum revenue r(n) for a rod of length n, try every possible first cut of length i (from 1 to n). For each cut, collect the price p_i for that piece and recursively solve the remaining rod of length n-i. Take the maximum over all choices.

r(n) = max_{1 ≤ i ≤ n} { p_i + r(n - i) }

Base case: r(0) = 0 (a zero-length rod earns nothing).

This formula is correct but devastatingly slow. Let us count the recursive calls. r(n) calls r(n-1), r(n-2), ..., r(0). Each of those makes their own recursive calls. The total satisfies:

T(n) = 1 + T(0) + T(1) + ... + T(n-1)
T(0) = 1

// Solve by substitution:
T(1) = 1 + T(0) = 2
T(2) = 1 + T(0) + T(1) = 1 + 1 + 2 = 4
T(3) = 1 + T(0) + T(1) + T(2) = 1 + 1 + 2 + 4 = 8
T(n) = 2ⁿ

Exponential! But how many UNIQUE subproblems are there? Only n+1: r(0), r(1), ..., r(n). The recursion solves the same subproblems over and over. The simulation below makes this painfully visible.

Rod Cutting Recursion Tree

Adjust rod length and watch the exponential explosion. Nodes in teal are computed for the first time. Nodes in red are redundant recomputations. Count the waste.

Rod length n 4

Total calls: — | Unique subproblems: —

For n=4, the naive recursion makes 8 calls but only deals with 4 unique subproblems (r(0), r(1), r(2), r(3)). For n=5 it makes 16 calls for 5 unique subproblems. For n=7, 64 calls for 7 unique ones. The total calls double with each increment of n: 2^n-1. The unique subproblems grow linearly: n.

This is the gap that dynamic programming exploits. Instead of solving r(2) over and over, solve it once, store the answer, and look it up for free every subsequent time. That is the entire idea.

The Two Ingredients

When does DP apply? Dynamic programming works whenever a problem has BOTH: (1) Optimal substructure — an optimal solution is built from optimal solutions to subproblems. (2) Overlapping subproblems — the same subproblems recur many times in the recursion. When both hold, DP transforms exponential brute force into polynomial time by trading space (a table to store results) for time (avoiding re-computation). Without overlapping subproblems, you have divide-and-conquer (merge sort). Without optimal substructure, DP cannot guarantee optimality.

The Simplest Example: Fibonacci

Before we solve rod cutting with DP, let us see the idea on the simplest possible example. The Fibonacci sequence: F(0)=0, F(1)=1, F(n) = F(n-1) + F(n-2).

// Naive recursion for F(5):
F(5) calls F(4) and F(3)
F(4) calls F(3) and F(2)
F(3) calls F(2) and F(1)
F(2) calls F(1) and F(0)
// F(3) is computed TWICE. F(2) is computed THREE times.
// Total calls for F(n): O(2^n). Unique subproblems: n+1.

// DP (bottom-up):
F[0]=0, F[1]=1
F[2]=F[1]+F[0]=1
F[3]=F[2]+F[1]=2
F[4]=F[3]+F[2]=3
F[5]=F[4]+F[3]=5
// 5 additions instead of 15 recursive calls. O(n) time.

Fibonacci is not an optimization problem (there is no "best" Fibonacci number), so it does not have "optimal substructure" in the formal sense. But it beautifully illustrates overlapping subproblems. Rod cutting adds the optimization dimension: instead of adding, we maximize. The principle is the same.

The Scale of the Savings

To appreciate how dramatic the DP speedup is, let us compare the work for increasing n:

Rod length n	Naive calls (2^n-1)	DP comparisons (n(n+1)/2)	Speedup
5	16	15	~1x
10	512	55	~9x
20	524,288	210	~2,500x
30	536,870,912	465	~1,155,000x
40	~5.5 × 10¹¹	820	~670,000,000x
100	~6.3 × 10²⁹	5,050	~1.2 × 10²⁶x

At n=100, the naive approach would need more operations than there are atoms in a human body. The DP does it in 5,050 comparisons. This is not a theoretical curiosity — it is the difference between "computable" and "physically impossible."

Let us make this concrete. For the rod cutting problem:

Ingredient	Rod Cutting	How to verify
Optimal substructure	If the best rod-7 solution cuts at position 3, the remaining rod-4 must be cut optimally	Cut-and-paste argument: swap a suboptimal sub-solution with the optimal one to improve the whole
Overlapping subproblems	r(2) is needed when computing r(3), r(4), r(5), r(6), and r(7)	Draw the recursion tree and look for repeated nodes

Concept check: A rod of length n has 2^n-1 possible ways to cut it, and the naive recursion makes 2^n-1 calls. Is this a coincidence?

Yes — the recursion tree has nothing to do with the actual cut patterns No — each recursive call explores one specific cut pattern Not exactly a coincidence, but not a one-to-one correspondence either. The recurrence T(n) = 1 + T(0) + T(1) + ... + T(n-1) solves to T(n) = 2^n-1 for the same mathematical reason there are 2^n-1 subsets of n-1 cut positions: both are governed by the same exponential generating pattern. But individual recursive calls do NOT correspond to individual cut patterns — they correspond to subproblem evaluations

Chapter 1: Top-Down Memoization

The recursion tree in Chapter 0 showed the disease: redundant recomputation. Now here is the cure — and it is embarrassingly simple.

Keep the exact same recursive structure. Change nothing about the logic. Just add a table. Before computing r(k), check: "Have I already computed this?" If yes, return the stored answer immediately. If no, compute it, store it in the table, and return it. Done.

This technique is called memoization (not memorization — no 'r'). The word comes from "memo," as in writing yourself a note so you do not forget.

The CLRS Price Table

We will use this price table throughout the lesson. It comes directly from CLRS and is the standard reference example:

Length i	1	2	3	4	5	6	7	8	9	10
Price p_i	1	5	8	9	10	17	17	20	24	30

Notice something interesting: p₃=8 is less than 2·p₂=10. Two 2-inch pieces ($5+$5=$10) are worth more than one 3-inch piece ($8). This means the "do not cut" option is NOT always optimal. You need to consider all possibilities — which is exactly what the DP does.

Also notice that the prices are not proportional to length. The price-per-inch varies wildly:

Length	1	2	3	4	5	6	7	8	9	10
$/inch	1.00	2.50	2.67	2.25	2.00	2.83	2.43	2.50	2.67	3.00

If prices grew linearly with length (constant $/inch), the optimal solution would always be "do not cut." The non-linear pricing is what makes the problem interesting and what makes DP necessary. A greedy "always cut at the best $/inch" strategy does not work because it does not account for how the remaining pieces can be cut.

Worked Example: n=4, Every Single Step

We want r(4). The recurrence says try all first cuts i=1,2,3,4:

r(4) = max { p₁ + r(3), p₂ + r(2), p₃ + r(1), p₄ + r(0) }

We need r(3), r(2), r(1), r(0). With memoization, we compute each exactly once. Let us trace the calls in order. The recursive function is called with k=4. It calls r(3) first.

// Call stack: r(4) → r(3) → r(2) → r(1) → r(0)

// r(0): base case
r(0) = 0    memo[0] = 0 ✓

// r(1): try i=1
  i=1: p[1] + r(0) = 1 + 0 = 1
r(1) = 1    memo[1] = 1 ✓

// r(2): try i=1,2
  i=1: p[1] + r(1) = 1 + 1 = 2    (r(1) from memo!)
  i=2: p[2] + r(0) = 5 + 0 = 5    (r(0) from memo!)
r(2) = max(2, 5) = 5    memo[2] = 5 ✓

// r(3): try i=1,2,3
  i=1: p[1] + r(2) = 1 + 5 = 6    (r(2) from memo!)
  i=2: p[2] + r(1) = 5 + 1 = 6    (r(1) from memo!)
  i=3: p[3] + r(0) = 8 + 0 = 8
r(3) = max(6, 6, 8) = 8    memo[3] = 8 ✓

// Back to r(4): try i=1,2,3,4
  i=1: p[1] + r(3) = 1 + 8 = 9    (r(3) from memo!)
  i=2: p[2] + r(2) = 5 + 5 = 10    (r(2) from memo!)
  i=3: p[3] + r(1) = 8 + 1 = 9    (r(1) from memo!)
  i=4: p[4] + r(0) = 9 + 0 = 9
r(4) = max(9, 10, 9, 9) = 10    memo[4] = 10 ✓

Optimal revenue: $10. Achieved by cutting into two pieces of length 2 ($5 + $5 = $10). The whole rod (length 4) only sells for $9. Two halves beat the whole.

Count the work: we computed r(0), r(1), r(2), r(3), r(4) — five subproblems. For r(k), we tried k values of i. Total comparisons: 0+1+2+3+4 = 10. The naive recursion would have made 2³ = 8 calls, many redundant. For larger n, the savings are astronomical.

Cache hit = free. Without memoization, computing r(4) triggers 8 recursive calls. With memoization, each of r(0)..r(3) is computed exactly once, and subsequent requests are O(1) table lookups. Total work: O(n²). Each of the n subproblems requires at most n comparisons. The memo table costs O(n) space.

The simulation below shows this visually. On the left: the full naive recursion tree (every call executed). On the right: the memoized version where pruned (cached) calls are dimmed gray.

Naive vs Memoized Recursion

Left: naive (all calls executed). Right: memoized (gray nodes = cache hit, never recomputed). Watch the pruning increase as n grows.

Rod length n 4

Naive calls: — | Memo calls: — | Savings: —%

The Code

python
def memoized_rod_cut(p, n):
    """Top-down memoized rod cutting.
    p: list of prices, p[i] = price for length i+1 (0-indexed)
    n: rod length
    Returns: maximum revenue"""
    memo = [-1] * (n + 1)
    memo[0] = 0  # base case: zero-length rod = $0

    def helper(k):
        if memo[k] >= 0:
            return memo[k]  # cache hit — O(1) lookup
        best = 0
        for i in range(1, k + 1):
            best = max(best, p[i - 1] + helper(k - i))
        memo[k] = best  # store for future lookups
        return best

    return helper(n)

# CLRS price table
prices = [1, 5, 8, 9, 10, 17, 17, 20, 24, 30]
print(memoized_rod_cut(prices, 4))   # 10  (cut: 2+2)
print(memoized_rod_cut(prices, 7))   # 18  (cut: 1+6 or 2+2+3)
print(memoized_rod_cut(prices, 10))  # 30  (don't cut: p[10]=30)

Time complexity: O(n²). There are n+1 subproblems. Each subproblem r(k) iterates over k choices. Total: 0+1+2+...+n = n(n+1)/2 = O(n²).

Space complexity: O(n) for the memo table + O(n) for the recursion call stack (maximum depth is n when the recursive chain is r(n)→r(n-1)→...→r(0)).

The recursion depth problem. For very large n (say n=10000), the memoized solution will hit Python's default recursion limit (usually 1000). You can raise it with sys.setrecursionlimit, but each recursive call adds ~50 bytes of stack frame overhead. For n=100000, that is 5 MB of stack. Bottom-up tabulation (next chapter) avoids this entirely — zero recursion, zero stack overhead.

Python's Built-in Memoization: @lru_cache

Python has a built-in memoization decorator that makes top-down DP trivially easy:

python
from functools import lru_cache

prices = [0, 1, 5, 8, 9, 10, 17, 17, 20, 24, 30]

@lru_cache(maxsize=None)
def rod_cut(n):
    if n == 0: return 0
    return max(prices[i] + rod_cut(n - i) for i in range(1, n + 1))

print(rod_cut(7))  # 18
print(rod_cut.cache_info())
# CacheInfo(hits=21, misses=8, maxsize=None, currsize=8)

@lru_cache(maxsize=None) creates an unlimited-size dictionary cache. The cache_info() method shows cache hit statistics — useful for verifying your DP is actually avoiding recomputation. For interview whiteboard coding, always write the manual memo array version. But for production code and competitive programming, @lru_cache is the standard Python idiom.

When to use @lru_cache vs manual memo. Use @lru_cache when: (1) the function arguments are hashable (ints, strings, tuples), (2) you do not need fine-grained control over the cache, (3) you are writing Python. Use a manual memo array when: (1) working in a language without built-in memoization, (2) you need to control memory precisely, (3) you need to convert to bottom-up later, (4) you are at a whiteboard interview where showing the manual approach demonstrates deeper understanding.

Memoization with Multiple State Variables

Some problems have more than one state variable. For example, the 0/1 knapsack has two: item index and remaining capacity. You can memoize with a 2D array or a tuple key:

python
@lru_cache(maxsize=None)
def knapsack_memo(i, w):
    """Top-down memoized 0/1 knapsack.
    i: consider items 0..i
    w: remaining capacity"""
    if i < 0 or w <= 0:
        return 0
    wi, vi = items[i]
    # Skip item i
    skip = knapsack_memo(i - 1, w)
    # Take item i (if it fits)
    take = vi + knapsack_memo(i - 1, w - wi) if wi <= w else 0
    return max(skip, take)

items = [(2,3), (3,4), (4,5), (5,6)]
print(knapsack_memo(len(items)-1, 8))  # 10

The cache key is the tuple (i, w). There are n·W unique keys, so the cache has at most n·W entries. This exactly matches the bottom-up table size.

Interview question: With the CLRS price table [1,5,8,9,10,17,17,20,24,30], what is the optimal revenue for a rod of length 5, and how should it be cut?

$10 — keep it whole (p[5]=10). But wait, can we do better? $13 — cut into 2+3 ($5+$8=13). Verified: r(5) = max{1+r(4), 5+r(3), 8+r(2), 9+r(1), 10+r(0)} = max{1+10, 5+8, 8+5, 9+1, 10+0} = max{11, 13, 13, 10, 10} = 13 $15 — cut into 1+2+2 ($1+$5+$5=11). No, that is only $11

Chapter 2: Bottom-Up Tabulation

Memoization starts at the top (the problem you want to solve) and works downward, caching along the way. Bottom-up tabulation flips the order: start from the smallest subproblems and build upward. By the time you need r(k), every subproblem r(0), r(1), ..., r(k-1) is already sitting in the table, ready to use.

No recursion. No function call overhead. No stack depth worries. Just a clean, simple for-loop filling a table from left to right.

Think of it this way. Memoization is like exploring a maze and leaving breadcrumbs so you never revisit a dead end. Bottom-up tabulation is like having a map of the entire maze before you start — you fill in the solution for every room, beginning with the ones closest to the exit.

The Algorithm

// Allocate tables
r[0..n] = revenue table    (r[j] = best revenue for rod of length j)
s[0..n] = cut table    (s[j] = first piece length in optimal solution for j)

r[0] = 0    base case

// Fill left to right
for j = 1 to n:
  best = -∞
  for i = 1 to j:    // try all first-cut lengths
    if p[i] + r[j - i] > best:
      best = p[i] + r[j - i]
      s[j] = i    // remember the best cut
  r[j] = best

Worked Example: n=7, Every Cell

Prices: p = [1, 5, 8, 9, 10, 17, 17, 20, 24, 30]. We build r[0..7] and s[1..7].

// r[0] = 0 (base case)

// j=1: try i=1
  i=1: p[1]+r[0] = 1+0 = 1
r[1] = 1, s[1] = 1    (sell as 1-inch piece)

// j=2: try i=1,2
  i=1: p[1]+r[1] = 1+1 = 2
  i=2: p[2]+r[0] = 5+0 = 5    ← best
r[2] = 5, s[2] = 2    (sell as one 2-inch piece)

// j=3: try i=1,2,3
  i=1: p[1]+r[2] = 1+5 = 6
  i=2: p[2]+r[1] = 5+1 = 6
  i=3: p[3]+r[0] = 8+0 = 8    ← best
r[3] = 8, s[3] = 3    (sell as one 3-inch piece)

// j=4: try i=1,2,3,4
  i=1: p[1]+r[3] = 1+8 = 9
  i=2: p[2]+r[2] = 5+5 = 10    ← best
  i=3: p[3]+r[1] = 8+1 = 9
  i=4: p[4]+r[0] = 9+0 = 9
r[4] = 10, s[4] = 2    (cut: 2+2)

// j=5: try i=1,2,3,4,5
  i=1: p[1]+r[4] = 1+10 = 11
  i=2: p[2]+r[3] = 5+8 = 13    ← best
  i=3: p[3]+r[2] = 8+5 = 13    (tie!)
  i=4: p[4]+r[1] = 9+1 = 10
  i=5: p[5]+r[0] = 10+0 = 10
r[5] = 13, s[5] = 2    (cut: 2+3)

// j=6: try i=1,2,3,4,5,6
  i=1: p[1]+r[5] = 1+13 = 14
  i=2: p[2]+r[4] = 5+10 = 15
  i=3: p[3]+r[3] = 8+8 = 16
  i=4: p[4]+r[2] = 9+5 = 14
  i=5: p[5]+r[1] = 10+1 = 11
  i=6: p[6]+r[0] = 17+0 = 17    ← best
r[6] = 17, s[6] = 6    (don't cut!)

// j=7: try i=1,2,3,4,5,6,7
  i=1: p[1]+r[6] = 1+17 = 18    ← best
  i=2: p[2]+r[5] = 5+13 = 18    (tie!)
  i=3: p[3]+r[4] = 8+10 = 18    (3-way tie!)
  i=4: p[4]+r[3] = 9+8 = 17
  i=5: p[5]+r[2] = 10+5 = 15
  i=6: p[6]+r[1] = 17+1 = 18    (4-way tie!)
  i=7: p[7]+r[0] = 17+0 = 17
r[7] = 18, s[7] = 1    (first: cut off 1-inch)

Solution reconstruction: r[7]=18, s[7]=1 means the first cut is at length 1. Remaining length: 7-1=6. s[6]=6 means do not cut the 6-inch piece. Solution: 1+6 = $1+$17 = $18.

But the 4-way tie at j=7 means there are multiple optimal solutions: 1+6, 2+2+3, 3+2+2, 6+1. All yield $18. The s-table just records the first one found.

Bottom-Up Table Builder

Watch the DP table fill left to right. Each cell shows its value being computed from previously filled cells. Green highlight = currently computing. Click Step to advance one comparison, or Play for continuous animation.

Press Step or Play to begin.

The Code

python
def bottom_up_rod_cut(p, n):
    """Bottom-up rod cutting with solution reconstruction.
    p: list of prices (0-indexed: p[0] = price for length 1)
    n: rod length
    Returns: (max_revenue, list_of_cut_lengths)"""
    r = [0] * (n + 1)  # r[j] = optimal revenue for rod of length j
    s = [0] * (n + 1)  # s[j] = first piece length in optimal solution for j

    for j in range(1, n + 1):
        best = -1
        for i in range(1, j + 1):
            if p[i - 1] + r[j - i] > best:
                best = p[i - 1] + r[j - i]
                s[j] = i
        r[j] = best

    # Reconstruct the actual cuts
    cuts = []
    remaining = n
    while remaining > 0:
        cuts.append(s[remaining])
        remaining -= s[remaining]

    return r[n], cuts

prices = [1, 5, 8, 9, 10, 17, 17, 20, 24, 30]
revenue, cuts = bottom_up_rod_cut(prices, 7)
print(f"Revenue: ${revenue}, Cuts: {cuts}")
# Revenue: $18, Cuts: [1, 6]

revenue, cuts = bottom_up_rod_cut(prices, 10)
print(f"Revenue: ${revenue}, Cuts: {cuts}")
# Revenue: $30, Cuts: [10]  (don't cut!)

Top-Down vs Bottom-Up Summary

Same asymptotic complexity, different constants. Both are O(n²) time and O(n) space for the table. Bottom-up avoids function-call overhead (~50 bytes per stack frame in Python) and never risks a stack overflow. Top-down only solves subproblems actually reachable from the original call — useful when the subproblem space is large but sparsely visited. For interviews, default to bottom-up unless the subproblem space is hard to enumerate in advance.

Solution Reconstruction: Following the Bread Crumbs

The s-table (the cut table) is crucial. Without it, you know the OPTIMAL VALUE but not the OPTIMAL SOLUTION. Here is how reconstruction works for rod cutting:

python
# After bottom-up computation, s[j] = length of first piece in optimal solution for rod of length j

def reconstruct_cuts(s, n):
    """Follow the s-table to recover all cuts."""
    cuts = []
    remaining = n
    while remaining > 0:
        first_piece = s[remaining]
        cuts.append(first_piece)
        remaining -= first_piece
        # Now solve the remaining subproblem
    return cuts

# For our example with n=7:
# s = [0, 1, 2, 3, 2, 2, 6, 1]
# reconstruct_cuts(s, 7):
#   remaining=7, s[7]=1, cuts=[1], remaining=6
#   remaining=6, s[6]=6, cuts=[1,6], remaining=0
#   → [1, 6] → revenue = $1 + $17 = $18 ✓

This pattern — maintain a "choice" table during the forward pass, trace it backwards during reconstruction — applies to EVERY DP problem. For knapsack: record which items were taken. For LCS: record which characters matched. For edit distance: record which operation was chosen. The forward pass computes values. The backward pass recovers decisions.

A Common Optimization: Reducing Space

For rod cutting, the full table is already O(n) — there is nothing to optimize. But for 2D problems like knapsack or LCS, the table is O(m×n) and we often only need the previous row. The "rolling row" trick keeps two rows instead of m+1, reducing space dramatically.

The trade-off: with only two rows, you lose the ability to trace back (you no longer have the full table). If you need BOTH the optimal value AND the optimal solution in reduced space, you need Hirschberg's technique — run the DP forward from the start and backward from the end, use the meeting point to divide the problem in half, and recurse. This gives O(min(m,n)) space with O(m·n) time for full reconstruction.

The General Rule for Fill Order

How do you determine the correct fill order for a new DP problem? The rule is simple: every value you READ must have been WRITTEN already.

Problem type	Dependencies	Fill order
1D (rod cutting)	dp[j] depends on dp[0..j-1]	Left to right (j=1,2,...,n)
2D sequence (LCS)	dp[i][j] depends on dp[i-1][j-1], dp[i-1][j], dp[i][j-1]	Row by row, left to right
2D knapsack	dp[i][w] depends on dp[i-1][w] and dp[i-1][w-wi]	Row by row (any direction within row)
Interval (matrix chain)	dp[i][j] depends on dp[i][k] and dp[k+1][j] for i≤k<j	Diagonally (shortest intervals first)
1D backward (sell stock)	dp[i] depends on dp[i+1]	Right to left
Tree DP	Parent depends on children	Post-order (leaves first, root last)

If you are unsure, draw the dependency arrows for a small example. The fill order is any topological order of those arrows. For most problems, the natural left-to-right / top-to-bottom / small-to-large order works.

Debug scenario: A candidate writes a bottom-up rod cutting solution but fills the table from r[n] down to r[1] instead of from r[1] up to r[n]. Their code runs without errors but produces wrong answers. Why?

Nothing goes wrong — the order does not matter as long as all values are eventually filled Stack overflow because the table is too large When computing r[n], the code reads r[n-1], r[n-2], etc. — but those cells have not been filled yet. They still contain their initial value of 0. So every r[j] computes as just max(p[i]) — the best single-piece price — ignoring the value of the remaining piece entirely. The code runs fine (no crash) but produces subtly wrong answers that look plausible

Chapter 3: The DP Recipe

You have seen rod cutting solved two ways. Now let us extract the general method. Every dynamic programming solution follows the same four steps. Master this recipe and you can attack any DP problem — even ones you have never seen.

The Four Steps (CLRS Section 15.3)

1. Characterize Optimal Substructure

Prove that an optimal solution to the problem contains optimal solutions to subproblems. The proof technique is always the same: assume the whole solution is optimal but some sub-solution is not. Replace that sub-solution with the optimal one. The whole solution improves — contradicting the assumption. Therefore the sub-solution MUST be optimal. This is called the cut-and-paste argument.

↓

2. Define the Recurrence

Write a recursive formula for the value of an optimal solution. The recurrence expresses a larger subproblem in terms of smaller ones. This is the creative step — the formula IS the algorithm. Everything else is bookkeeping. Example: r(n) = max{p[i] + r(n-i)}. The recurrence must have a base case (r(0)=0) and a recursive case that strictly reduces the problem size.

↓

3. Compute the Values

Either top-down (memoized recursion) or bottom-up (iterative table). Bottom-up requires determining a valid fill order — solve smaller subproblems before larger ones. Top-down handles order automatically at the cost of recursion overhead. Choose based on the subproblem space: dense (most needed) → bottom-up. Sparse (few needed) → top-down.

↓

4. Reconstruct the Solution

The DP table gives you the optimal VALUE. To recover the optimal CHOICES (which cuts, which items, which characters), maintain a secondary table that records which choice led to the optimal value at each step. Trace back through this table. This is often called "backtracking" or "traceback" (not to be confused with the backtracking search algorithm).

When Is It DP? The Decision Framework

Not every optimization problem is a DP problem. Two conditions must both hold. The simulation below helps you classify any new problem.

Condition	What it means	How to verify	Counterexample
Optimal substructure	Optimal solution contains optimal sub-solutions	Cut-and-paste proof by contradiction	Longest simple path (not optimal substructure because of vertex-reuse constraint)
Overlapping subproblems	Same subproblems recur in the recursion tree	Draw the recursion tree — count repeated nodes	Merge sort (subproblems always fresh — no overlaps)

A subtle trap: longest simple path does NOT have optimal substructure. The longest simple path from A to C through B does NOT necessarily contain the longest simple path from A to B. Why? Because the "simple" constraint (no repeated vertices) means the subpaths interact — using a vertex in the A-to-B path removes it from the B-to-C path. This dependency between subproblems breaks optimal substructure. Interviewers love this example because it looks like it should be DP but is actually NP-hard.

Concrete Example: When Substructure Breaks

Consider this graph with 4 vertices and edges: A-B, B-C, C-D, A-C, B-D.

// Longest simple path from A to D:
A → B → C → D (length 3)    ← optimal
A → C → B → D (length 3)    ← also optimal

// Now consider the subpath A to C in the first path:
A → B → C (length 2)
// Is this the longest simple path from A to C? Let's check:
A → C (length 1)
A → B → C (length 2)    ← longest, matches our subpath. OK so far.

// But in the SECOND optimal path:
A → C → B → D
// Subpath A to B: A → C → B (length 2)
// But the longest simple path A to B could be A → C → D → ... wait, no D→B exists
// Actually: A → C → B (length 2) IS the longest A→B path
// But if we USE that path for A→B, we've consumed C, and the remaining
// B→D subpath can't go through C anymore.

The problem is that subproblems are NOT independent. The "simple" constraint couples them: using a vertex in one subpath removes it from others. In DP, subproblems must be independent — the solution to one cannot affect the feasibility of another. Shortest paths DO have this property (subpaths of shortest paths are shortest), but longest simple paths do not.

Proving Optimal Substructure: A Template

For interviews, you should be able to prove optimal substructure for any DP problem. The template is always the same. Let us demonstrate with LCS:

// Theorem: LCS has optimal substructure.

// Proof (by contradiction, "cut-and-paste" argument):
// Let Z = z₁z₂...zₖ be an LCS of X = x₁...xₘ and Y = y₁...yₙ.

// Case: xₘ = yₙ.
// Claim: zₖ = xₘ = yₙ and Z[1..k-1] is an LCS of X[1..m-1] and Y[1..n-1].
//
// Proof of claim: If zₖ ≠ xₘ, we could append xₘ to Z, getting a common
// subsequence of length k+1, contradicting Z being the LONGEST. So zₖ = xₘ.
//
// Now suppose Z[1..k-1] is NOT an LCS of X[1..m-1] and Y[1..n-1].
// Then there exists a common subsequence W of X[1..m-1] and Y[1..n-1]
// with |W| > k-1. Appending xₘ to W gives a common subsequence of X and Y
// with length > k, contradicting Z being an LCS. ∎

The structure is always: (1) assume the whole solution is optimal, (2) suppose a sub-solution is NOT optimal, (3) replace it with the optimal sub-solution, (4) show this improves the whole solution, (5) contradiction. This works for rod cutting (replace non-optimal remainder), knapsack (replace non-optimal sub-selection), edit distance (replace non-optimal sub-alignment), and every other DP problem.

Is This a DP Problem?

A decision flowchart for classifying optimization problems.

DP vs Divide-and-Conquer vs Greedy

Paradigm	Optimal substructure?	Overlapping subproblems?	How it works	Examples
Dynamic Programming	Yes	Yes	Solve each subproblem once, cache results	Rod cutting, LCS, knapsack
Greedy	Yes	Irrelevant	Make the locally best choice, never revisit	Activity selection, Huffman, Dijkstra
Divide-and-Conquer	Yes	No	Split, recurse on independent subproblems, combine	Merge sort, Strassen's, closest pair

Think of it as a hierarchy. Greedy is the simplest: one choice per step, no table. Divide-and-conquer adds recursion on multiple subproblems. DP adds caching to handle overlap. If greedy works, use it (faster, simpler). If subproblems are independent, use divide-and-conquer. If subproblems overlap, you need DP.

Top-Down vs Bottom-Up: The Detailed Trade-offs

Property	Top-Down (Memoization)	Bottom-Up (Tabulation)
Order of computation	Recursive — solves subproblems on demand	Iterative — you choose the fill order
Subproblems solved	Only those reachable from the original problem	ALL subproblems in the table
Overhead	Function calls, stack frames	None beyond the loop
Space optimization	Harder (random access pattern)	Easier (known access pattern allows rolling rows)
Debugging	Harder (call stack is implicit)	Easier (print the table, verify each cell)
Implementation	Add a cache to the recursive solution	Rewrite as nested loops
When to prefer	Sparse subproblem spaces, tree-shaped dependencies	Dense spaces, when you need space optimization

Design question: You are computing the minimum cost to triangulate a convex polygon with n vertices. Each triangle has a cost based on its three vertex positions. The problem has optimal substructure (optimal triangulation of the whole polygon contains optimal triangulations of sub-polygons defined by diagonals) and overlapping subproblems (sub-polygons recur). What is the time complexity of a DP solution?

O(n²) — n subproblems, each needs O(n) work O(n³) — there are O(n²) subproblems (one for each pair of boundary vertices defining a sub-polygon), and each subproblem tries O(n) possible third vertices to form a triangle. This is structurally identical to matrix-chain multiplication O(2ⁿ) — exponential because polygon triangulation is NP-hard

Chapter 4: Matrix-Chain Multiplication

Multiplying two matrices A (p×q) and B (q×r) costs p·q·r scalar multiplications. The result is a p×r matrix. If you have a chain of matrices A₁·A₂·A₃·...·A_n, the order you parenthesize them does not change the result (matrix multiplication is associative) — but it can change the cost by orders of magnitude.

A Dramatic Example

Three matrices: A₁ is 10×30, A₂ is 30×5, A₃ is 5×60.

// Option 1: ((A1 · A2) · A3)
A1·A2 produces 10×5, costs 10×30×5 = 1,500
(A1·A2)·A3 produces 10×60, costs 10×5×60 = 3,000
Total = 1,500 + 3,000 = 4,500 multiplications

// Option 2: (A1 · (A2 · A3))
A2·A3 produces 30×60, costs 30×5×60 = 9,000
A1·(A2·A3) produces 10×60, costs 10×30×60 = 18,000
Total = 9,000 + 18,000 = 27,000 multiplications

Same final matrix, but option 1 is 6× cheaper. The difference grows with more matrices. For a chain of n matrices, the number of possible parenthesizations is the (n-1)th Catalan number, which grows as Ω(4ⁿ/n^3/2). Brute-forcing all parenthesizations is not feasible.

Step 1: Optimal Substructure

Suppose the optimal parenthesization of A₁..A_n splits at position k: (A₁..A_k)·(A_k+1..A_n). Then the parenthesization of A₁..A_k within this solution MUST be optimal for A₁..A_k alone. Why? Cut-and-paste: if it were not optimal, replace it with the optimal one and the whole solution improves — contradiction.

Step 2: The Recurrence

Store matrix dimensions in an array p[0..n] where matrix A_i has dimensions p[i-1]×p[i]. Define m[i,j] = minimum number of scalar multiplications to compute A_i·A_i+1·...·A_j.

m[i,j] = 0 if i = j
m[i,j] = min_{i ≤ k < j} { m[i,k] + m[k+1,j] + p[i-1] · p[k] · p[j] } if i < j

For each possible split point k, the cost is: (cost of left half) + (cost of right half) + (cost of multiplying the two results). We minimize over all choices of k.

Step 3: Bottom-Up Computation (Diagonal Fill)

Unlike rod cutting (which fills a 1D array left-to-right), matrix chain fills a 2D table diagonally. We compute chains of increasing length: length 1 (base cases on the main diagonal), then length 2, then length 3, and so on. Each entry m[i,j] depends on entries m[i,k] and m[k+1,j] for k in [i,j) — all of which are shorter chains, hence already computed.

Worked Example: 4 Matrices

A₁(30×35), A₂(35×15), A₃(15×5), A₄(5×10). Dimensions array: p = [30, 35, 15, 5, 10].

// Diagonal 0 (chain length 1, base cases)
m[1,1] = m[2,2] = m[3,3] = m[4,4] = 0

// Diagonal 1 (chain length 2)
m[1,2]: k=1 → m[1,1]+m[2,2]+30×35×15 = 0+0+15750 = 15750
m[2,3]: k=2 → m[2,2]+m[3,3]+35×15×5 = 0+0+2625 = 2625
m[3,4]: k=3 → m[3,3]+m[4,4]+15×5×10 = 0+0+750 = 750

// Diagonal 2 (chain length 3)
m[1,3]:
  k=1: m[1,1]+m[2,3]+30×35×5 = 0+2625+5250 = 7875    ← best
  k=2: m[1,2]+m[3,3]+30×15×5 = 15750+0+2250 = 18000
m[1,3] = 7875, split k=1

m[2,4]:
  k=2: m[2,2]+m[3,4]+35×15×10 = 0+750+5250 = 6000
  k=3: m[2,3]+m[4,4]+35×5×10 = 2625+0+1750 = 4375    ← best
m[2,4] = 4375, split k=3

// Diagonal 3 (chain length 4, the answer)
m[1,4]:
  k=1: m[1,1]+m[2,4]+30×35×10 = 0+4375+10500 = 14875
  k=2: m[1,2]+m[3,4]+30×15×10 = 15750+750+4500 = 21000
  k=3: m[1,3]+m[4,4]+30×5×10 = 7875+0+1500 = 9375    ← best
m[1,4] = 9375, split k=3

Optimal parenthesization: split at k=3 means ((A₁..A₃) · A₄). For A₁..A₃, split at k=1 means (A₁ · (A₂·A₃)). Full: ((A₁·(A₂·A₃))·A₄), cost 9375.

Why diagonal fill? Entry m[i,j] represents a chain of length j-i+1. It depends on m[i,k] and m[k+1,j], which are shorter chains. By filling diagonals in order of increasing chain length, every dependency is satisfied before it is needed. This is the key insight for interval DP — any problem where the subproblem is a contiguous range [i,j].

Matrix Chain DP Table

Enter matrix dimensions separated by commas (e.g., "30,35,15,5,10" for 4 matrices). Watch the table fill diagonal by diagonal. Green = currently computing. Teal = completed.

Dimensions p:

Press Step or Play to begin.

python
def matrix_chain_order(p):
    """Find optimal parenthesization for matrix chain.
    p: list of dimensions [p0, p1, ..., pn] where Ai is p[i-1] x p[i]
    Returns: (m, s) — cost table and split table"""
    n = len(p) - 1  # number of matrices
    m = [[0] * (n + 1) for _ in range(n + 1)]
    s = [[0] * (n + 1) for _ in range(n + 1)]

    for L in range(2, n + 1):  # L = chain length
        for i in range(1, n - L + 2):
            j = i + L - 1
            m[i][j] = float('inf')
            for k in range(i, j):  # try all split points
                cost = m[i][k] + m[k+1][j] + p[i-1]*p[k]*p[j]
                if cost < m[i][j]:
                    m[i][j] = cost
                    s[i][j] = k
    return m, s

def print_parens(s, i, j):
    """Reconstruct optimal parenthesization."""
    if i == j: return f"A{i}"
    k = s[i][j]
    left = print_parens(s, i, k)
    right = print_parens(s, k + 1, j)
    return f"({left} x {right})"

p = [30, 35, 15, 5, 10]
m, s = matrix_chain_order(p)
print(f"Min cost: {m[1][4]}")    # 9375
print(print_parens(s, 1, 4))  # ((A1 x (A2 x A3)) x A4)

Time complexity: O(n³). Three nested loops — chain length L (n iterations), start index i (up to n), split point k (up to n). Space: O(n²) for the two n×n tables.

Understanding the Three Nested Loops

The triple loop in matrix chain is the most confusing part for beginners. Let us break it down with concrete numbers for n=4 matrices:

// L = chain length. Goes from 2 (pair of matrices) to n (full chain).
// i = starting index of the subchain. Goes from 1 to n-L+1.
// k = split point. Goes from i to j-1.

// L=2: pairs (i,j) = (1,2),(2,3),(3,4) — 3 entries, 1 split each
// L=3: triples (i,j) = (1,3),(2,4) — 2 entries, 2 splits each
// L=4: full chain (i,j) = (1,4) — 1 entry, 3 splits
// Total work: 3×1 + 2×2 + 1×3 = 10 = O(n^3/6) for n=4

The key insight: L controls which diagonal of the table we fill. Within each diagonal, i controls which cell. Within each cell, k controls which split we try. Every split references entries from shorter diagonals (already filled). This is why the order works.

Common Interview Variant: Minimum Cost to Merge Stones

Matrix chain has many disguises in interviews. One common one: "Given n piles of stones with sizes a[1]..a[n], merge adjacent piles. The cost of merging two adjacent piles is their combined size. Find the minimum total cost to merge all piles into one."

This is EXACTLY matrix chain with a different cost function. The subproblem m[i,j] = minimum cost to merge piles i through j. The split point k divides into (merge i..k) + (merge k+1..j) + (cost of combining the two result piles = sum of a[i] through a[j]). Same O(n³) DP.

Recognizing interval DP. Any time a problem asks you to "optimally split a sequence" or "optimally parenthesize" or "optimally merge/partition contiguous elements," you are looking at interval DP. The subproblem is always [i,j] = contiguous range. The recurrence always tries all split points k in [i,j). The fill order is always diagonal (shortest ranges first). Once you see this pattern, the implementation is mechanical.

Concept check: In the matrix chain problem, why can't we use a greedy approach — always splitting at the position k that minimizes p[i-1]·p[k]·p[j] (the cost of the final multiplication)?

Because the locally cheapest final multiplication can force expensive computations within the two subchains. The cost p[i-1]·p[k]·p[j] only accounts for the LAST multiplication step, not the total work within each half. A greedy split that minimizes the final step might create subchains with terrible internal costs, making the total far from optimal Because matrix multiplication is not commutative Because greedy algorithms only work on sorting problems

Chapter 5: Longest Common Subsequence

Given two strings, find the longest subsequence common to both. A subsequence is formed by deleting zero or more characters from a string WITHOUT changing the order of the remaining characters. It is NOT the same as a substring (which must be contiguous characters).

Example: X = "ABCBDAB", Y = "BDCAB". One common subsequence is "BCAB" (length 4). Another is "BDA" (length 3). The longest common subsequence (LCS) has length 4. Finding it by brute force means checking all 2^m subsequences of X against all 2ⁿ subsequences of Y — exponential in both dimensions.

Why LCS Matters

LCS is everywhere in real software:

Application	What are X and Y?	What does LCS tell you?
diff/patch	Lines of two file versions	Unchanged lines (the LCS); changed lines are everything else
Git merge	Two branch versions of a file	Common base for three-way merge
DNA alignment	Two gene sequences (A, C, G, T)	Conserved regions across species
Spell checking	Mistyped word vs dictionary word	Similarity measure (related to edit distance)

Step 1: Characterize Optimal Substructure

Let X = x₁x₂...x_m and Y = y₁y₂...y_n. Let Z = z₁z₂...z_k be any LCS of X and Y. Three cases:

// Case 1: last characters match
If x_m = y_n, then z_k = x_m = y_n and Z[1..k-1] is an LCS of X[1..m-1] and Y[1..n-1]

// Case 2a: last characters differ, z_k ≠ x_m
If x_m ≠ y_n and z_k ≠ x_m, then Z is an LCS of X[1..m-1] and Y[1..n]

// Case 2b: last characters differ, z_k ≠ y_n
If x_m ≠ y_n and z_k ≠ y_n, then Z is an LCS of X[1..m] and Y[1..n-1]

In every case, the LCS of the full strings contains the LCS of shorter strings. Optimal substructure holds.

Step 2: The Recurrence

Define c[i,j] = length of the LCS of X[1..i] and Y[1..j].

c[i,j] = 0      if i = 0 or j = 0    (empty prefix)
c[i,j] = c[i-1,j-1] + 1      if x_i = y_j    (match! extend LCS)
c[i,j] = max(c[i-1,j], c[i,j-1])      if x_i ≠ y_j    (no match, take the better of skipping x_i or y_j)

Three arrows, two cases. When you fill the LCS table, each cell gets its value from one of three neighbors: (1) diagonal-up-left (match — both characters contribute), (2) up (skip x_i), or (3) left (skip y_j). Recording which arrow you followed is what makes traceback possible.

Worked Example: "ABCBDAB" vs "BDCAB"

Let us fill the entire 8×6 table:

	B	D	C	A	B
ε	0	0	0	0	0
A	0	0	0	1	1
B	1	1	1	1	2
C	1	1	2	2	2
B	1	1	2	2	3
D	1	2	2	2	3
A	1	2	2	3	3
B	1	2	2	3	4

LCS length = c[7,5] = 4. Traceback from c[7,5]: x₇=B=y₅ (match, go diagonal). c[6,4]: x₆=A=y₄ (match, go diagonal). c[5,3]: x₅=D≠C, c[4,3]=2≥c[5,2]=1, go up. c[4,3]: x₄=B≠C, c[3,3]=2≥c[4,2]=1, go up. c[3,3]: x₃=C=y₃ (match, go diagonal). c[2,2]: x₂=B≠D, c[1,2]=0<c[2,1]=1, go left. c[2,1]: x₂=B=y₁ (match, go diagonal). c[1,0]=0, stop.

Matches collected (in reverse): B, A, C, B → LCS = "BCAB".

Interactive LCS Table

Type two strings and watch the DP table fill. Orange cells = match (diagonal arrow). The traceback path reconstructs the LCS.

X: Y:

LCS: —

python
def lcs(X, Y):
    """Compute LCS length and string with traceback."""
    m, n = len(X), len(Y)
    c = [[0] * (n + 1) for _ in range(m + 1)]

    # Fill table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i-1] == Y[j-1]:
                c[i][j] = c[i-1][j-1] + 1  # match
            else:
                c[i][j] = max(c[i-1][j], c[i][j-1])

    # Traceback to reconstruct the LCS
    result = []
    i, j = m, n
    while i > 0 and j > 0:
        if X[i-1] == Y[j-1]:
            result.append(X[i-1])
            i -= 1; j -= 1  # go diagonal
        elif c[i-1][j] >= c[i][j-1]:
            i -= 1  # go up
        else:
            j -= 1  # go left

    return c[m][n], ''.join(reversed(result))

length, subseq = lcs("ABCBDAB", "BDCAB")
print(f"LCS length: {length}, LCS: {subseq}")
# LCS length: 4, LCS: BCAB

Time: O(m·n). Space: O(m·n) for the full table. Can be reduced to O(min(m,n)) if you only need the LENGTH (keep only two rows), but then you lose the ability to reconstruct the actual LCS. For traceback in O(min(m,n)) space, use Hirschberg's algorithm — a divide-and-conquer trick that runs the LCS forward and backward to find the midpoint of the LCS in linear space.

How diff Uses LCS

The Unix diff utility compares two files line by line. It treats each LINE as a single "character" and computes the LCS of the two line sequences. Lines in the LCS are "unchanged." Lines not in the LCS of the old file are "deleted." Lines not in the LCS of the new file are "inserted."

// Old file (3 lines):
alpha
beta
gamma

// New file (3 lines):
alpha
delta
gamma

// LCS of lines: ["alpha", "gamma"] (length 2)
// diff output:
  alpha      (in LCS — unchanged)
- beta       (in old only — deleted)
+ delta      (in new only — inserted)
  gamma      (in LCS — unchanged)

Real diff implementations use the Myers diff algorithm, which finds the shortest edit script in O((m+n)·D) time where D is the edit distance. For similar files (small D), this is much faster than O(m·n). But the core idea is still LCS.

Space-Optimized LCS (Length Only)

python
def lcs_length_only(X, Y):
    """O(min(m,n)) space LCS — returns length only, no traceback."""
    if len(X) < len(Y):
        X, Y = Y, X  # iterate over shorter string for space savings
    n = len(Y)
    prev = [0] * (n + 1)
    curr = [0] * (n + 1)

    for i in range(1, len(X) + 1):
        for j in range(1, n + 1):
            if X[i-1] == Y[j-1]:
                curr[j] = prev[j-1] + 1
            else:
                curr[j] = max(prev[j], curr[j-1])
        prev, curr = curr, [0] * (n + 1)

    return prev[n]

print(lcs_length_only("ABCBDAB", "BDCAB"))  # 4

Finding ALL LCS (Multiple Optimal Solutions)

The LCS is not always unique. "ABCBDAB" and "BDCAB" have multiple LCSs of length 4: "BCAB" and "BDAB." When ties occur in the max(c[i-1][j], c[i][j-1]) step, both branches lead to valid LCSs. To enumerate all of them:

python
def all_lcs(X, Y):
    """Find ALL longest common subsequences."""
    m, n = len(X), len(Y)
    c = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i-1] == Y[j-1]:
                c[i][j] = c[i-1][j-1] + 1
            else:
                c[i][j] = max(c[i-1][j], c[i][j-1])

    def backtrack(i, j):
        if i == 0 or j == 0:
            return {""}
        if X[i-1] == Y[j-1]:
            return {s + X[i-1] for s in backtrack(i-1, j-1)}
        results = set()
        if c[i-1][j] >= c[i][j-1]:
            results |= backtrack(i-1, j)
        if c[i][j-1] >= c[i-1][j]:
            results |= backtrack(i, j-1)
        return results

    return backtrack(m, n)

print(all_lcs("ABCBDAB", "BDCAB"))
# {'BCAB', 'BDAB'}

The number of distinct LCSs can be exponential in the worst case. This backtracking is only practical for small inputs or when you know the number of solutions is manageable.

LCS Variants in Interviews

Variant	Modification	Complexity
Longest Common Substring	Characters must be CONTIGUOUS. Reset to 0 on mismatch instead of taking max(up, left)	O(m·n) but can use suffix arrays for O(m+n)
Shortest Common Supersequence	Shortest string containing both X and Y as subsequences. Length = m + n - LCS_length	O(m·n)
Number of distinct subsequences	Count how many ways to form Y as a subsequence of X. dp[i][j] = dp[i-1][j] + (dp[i-1][j-1] if match)	O(m·n)
Edit distance	As seen in Chapter 7. Distance = m + n - 2·LCS_length	O(m·n)

Interview question: What is the LCS of "AGGTAB" and "GXTXAYB"?

"GGTAB" (length 5) — G,G,T,A,B match... but wait, the second G in "AGGTAB" is at position 3, and "GXTXAYB" only has one G at position 1. After matching G at (3,1), T must match T at position 4 in Y which is valid, then A at (5,5) and B at (6,7). That is GTAB length 4, not 5 "GTA" (length 3) — only three characters match "GTAB" (length 4). One valid alignment: G from (2,1), T from (4,4), A from (5,5), B from (6,7). The full table confirms c[6,7]=4

Chapter 6: The DP Showcase

Time to put everything together. This is the big interactive simulation — a multi-problem DP solver that lets you watch three classic algorithms work step by step, with full control over speed and parameters.

Dynamic Programming Visualizer

Select a problem, configure parameters, and watch the DP table fill cell by cell. Step advances one cell. Play animates continuously. Speed slider controls animation rate.

Speed

Select a problem and press Step or Play.

0/1 Knapsack — The King of DP Problems

Given n items, each with weight w_i and value v_i, and a knapsack with capacity W. Maximize total value without exceeding W. Each item is included whole or not at all — no fractions (hence "0/1").

Subproblem: dp[i][w] = maximum value achievable using items 1..i with knapsack capacity w.

dp[i][w] = max( dp[i-1][w], v_i + dp[i-1][w - w_i] ) if w_i ≤ w
dp[i][w] = dp[i-1][w] if w_i > w (item too heavy)

In words: for each item, choose the better of (1) skipping it (keep dp[i-1][w] — best value without this item at this capacity) or (2) taking it (its value v_i plus the best value for the remaining capacity w-w_i without this item, which is dp[i-1][w-w_i]).

Time: O(n·W). Space: O(n·W), reducible to O(W) with a single-row optimization. To do the single-row trick: iterate w from W DOWN to w_i (backwards!) so you do not accidentally use the current row's values (which would allow taking the same item twice).

Knapsack Worked Example

Items: (weight=2, value=3), (weight=3, value=4), (weight=4, value=5), (weight=5, value=6). Knapsack capacity W=8.

// Row 0 (no items): all zeros
dp[0][0..8] = [0, 0, 0, 0, 0, 0, 0, 0, 0]

// Row 1 (item 1: w=2, v=3)
dp[1][0] = 0    capacity 0, can't fit anything
dp[1][1] = 0    capacity 1 < weight 2, skip
dp[1][2] = max(dp[0][2], 3+dp[0][0]) = max(0, 3) = 3    take item 1
dp[1][3] = max(dp[0][3], 3+dp[0][1]) = max(0, 3) = 3
dp[1][4] = max(dp[0][4], 3+dp[0][2]) = max(0, 3) = 3
dp[1][5..8] = 3, 3, 3, 3    only 1 copy of item 1 available

// Row 2 (item 2: w=3, v=4)
dp[2][0..2] = 0, 0, 3    can't fit item 2
dp[2][3] = max(dp[1][3], 4+dp[1][0]) = max(3, 4) = 4    take item 2
dp[2][4] = max(dp[1][4], 4+dp[1][1]) = max(3, 4) = 4
dp[2][5] = max(dp[1][5], 4+dp[1][2]) = max(3, 7) = 7    take both!
dp[2][6] = max(dp[1][6], 4+dp[1][3]) = max(3, 7) = 7
dp[2][7] = max(dp[1][7], 4+dp[1][4]) = max(3, 7) = 7
dp[2][8] = max(dp[1][8], 4+dp[1][5]) = max(3, 7) = 7

// Row 3 (item 3: w=4, v=5)
dp[3][0..3] = 0, 0, 3, 4
dp[3][4] = max(dp[2][4], 5+dp[2][0]) = max(4, 5) = 5
dp[3][5] = max(dp[2][5], 5+dp[2][1]) = max(7, 5) = 7    keep items 1+2
dp[3][6] = max(dp[2][6], 5+dp[2][2]) = max(7, 8) = 8    items 1+3!
dp[3][7] = max(dp[2][7], 5+dp[2][3]) = max(7, 9) = 9    items 2+3!
dp[3][8] = max(dp[2][8], 5+dp[2][4]) = max(7, 9) = 9

// Row 4 (item 4: w=5, v=6)
dp[4][0..4] = 0, 0, 3, 4, 5
dp[4][5] = max(dp[3][5], 6+dp[3][0]) = max(7, 6) = 7    items 1+2 still best
dp[4][6] = max(dp[3][6], 6+dp[3][1]) = max(8, 6) = 8
dp[4][7] = max(dp[3][7], 6+dp[3][2]) = max(9, 9) = 9    tie: 2+3 or 1+4
dp[4][8] = max(dp[3][8], 6+dp[3][3]) = max(9, 10) = 10    items 2+4: $4+$6=$10!

Answer: maximum value = $10. Achieved by taking items 2 (w=3,v=4) and 4 (w=5,v=6), total weight=8=W exactly.

python
def knapsack_01(items, W):
    """0/1 Knapsack with solution reconstruction.
    items: list of (weight, value) tuples
    W: knapsack capacity
    Returns: (max_value, selected_items)"""
    n = len(items)
    dp = [[0] * (W + 1) for _ in range(n + 1)]

    for i in range(1, n + 1):
        wi, vi = items[i - 1]
        for w in range(W + 1):
            if wi <= w:
                dp[i][w] = max(dp[i-1][w], vi + dp[i-1][w-wi])
            else:
                dp[i][w] = dp[i-1][w]

    # Reconstruct: trace back to find which items were taken
    selected = []
    w = W
    for i in range(n, 0, -1):
        if dp[i][w] != dp[i-1][w]:  # item i was taken
            selected.append(i)
            w -= items[i-1][0]

    return dp[n][W], selected[::-1]

items = [(2,3), (3,4), (4,5), (5,6)]
val, sel = knapsack_01(items, 8)
print(f"Max value: {val}, Items: {sel}")
# Max value: 10, Items: [2, 4]

Pseudo-polynomial, not polynomial. O(n·W) looks polynomial, but W is a NUMBER whose representation requires only log(W) bits. The complexity is exponential in the input SIZE (number of bits), even though it is polynomial in the input VALUE. This makes 0/1 knapsack NP-hard despite having a "polynomial-time" DP. When W is huge (billions), the table does not fit in memory. You need FPTAS approximation instead.

The Space Optimization: Rolling Row

Since dp[i] only depends on dp[i-1], we can keep just two rows (or even one row with a backwards loop):

python
def knapsack_01_optimized(items, W):
    """O(W) space knapsack — single row, backwards iteration."""
    dp = [0] * (W + 1)
    for wi, vi in items:
        # CRITICAL: iterate backwards to avoid using current-row values
        for w in range(W, wi - 1, -1):
            dp[w] = max(dp[w], vi + dp[w - wi])
    return dp[W]

print(knapsack_01_optimized([(2,3), (3,4), (4,5), (5,6)], 8))
# 10

Why backwards? If you iterate w from 0 to W, then when computing dp[w], the value dp[w-wi] might already reflect the current item (because we updated it earlier in this loop pass). That would let us "take" the same item multiple times — converting 0/1 knapsack into unbounded knapsack. Iterating backwards ensures dp[w-wi] still holds the PREVIOUS item's value.

Coin Change — Minimum Coins

Given coin denominations d₁, d₂, ..., d_k and a target amount A, find the minimum number of coins needed. Unlimited supply of each denomination.

dp[0] = 0 (zero coins for amount zero)
dp[a] = min_{j: d_j ≤ a} { dp[a - d_j] + 1 } (try each coin, take the best)

For each amount a from 1 to A, try subtracting each coin denomination. The one that leads to the fewest remaining coins wins. If no coin fits, dp[a] = ∞ (impossible).

Time: O(A·k) where k is the number of denominations. Space: O(A).

Coin Change Worked Example

Coins: [1, 3, 4]. Target amount: 6.

dp[0] = 0    base case
dp[1] = min(dp[1-1]+1) = min(dp[0]+1) = 1    one coin: {1}
dp[2] = min(dp[2-1]+1) = min(dp[1]+1) = 2    two coins: {1,1}
dp[3] = min(dp[3-1]+1, dp[3-3]+1) = min(dp[2]+1, dp[0]+1) = min(3, 1) = 1
       one coin: {3}
dp[4] = min(dp[4-1]+1, dp[4-3]+1, dp[4-4]+1) = min(dp[3]+1, dp[1]+1, dp[0]+1)
       = min(2, 2, 1) = 1    one coin: {4}
dp[5] = min(dp[4]+1, dp[2]+1, dp[1]+1) = min(2, 3, 2) = 2    {1,4} or {4,1}
dp[6] = min(dp[5]+1, dp[3]+1, dp[2]+1) = min(3, 2, 3) = 2    {3,3}

Minimum coins for amount 6 = 2 (using two 3-cent coins). Notice the greedy approach (always use the largest coin first) would give {4,1,1} = 3 coins — greedy fails! This is why coin change needs DP.

python
def coin_change(coins, amount):
    """Minimum number of coins to make amount."""
    dp = [float('inf')] * (amount + 1)
    dp[0] = 0
    parent = [-1] * (amount + 1)  # for reconstruction

    for a in range(1, amount + 1):
        for c in coins:
            if c <= a and dp[a - c] + 1 < dp[a]:
                dp[a] = dp[a - c] + 1
                parent[a] = c  # record which coin was used

    # Reconstruct the coins used
    if dp[amount] == float('inf'):
        return -1, []
    used = []
    a = amount
    while a > 0:
        used.append(parent[a])
        a -= parent[a]
    return dp[amount], used

count, coins_used = coin_change([1, 3, 4], 6)
print(f"Min coins: {count}, Used: {coins_used}")
# Min coins: 2, Used: [3, 3]

Greedy fails for coin change (in general). With US coins {1, 5, 10, 25}, greedy works perfectly. But with {1, 3, 4}, greedy gives 3 coins for amount 6 while DP finds 2. Greedy only works when the coin system has the "greedy property" (technically, when the coin system forms a matroid). Standard US/Euro coins have this property by design, but arbitrary denominations do not.

Longest Increasing Subsequence (LIS)

Given an array of numbers, find the longest strictly increasing subsequence. For [10, 9, 2, 5, 3, 7, 101, 18], the LIS is [2, 3, 7, 101] or [2, 5, 7, 101] — both have length 4.

dp[i] = 1 + max_{j < i, a[j] < a[i]} { dp[j] } (extend the best ending before i)
dp[i] = 1 (if no j qualifies, start a new subsequence)

For each element i, find the longest increasing subsequence ENDING at i by checking all previous elements j where a[j] < a[i]. The overall LIS is max(dp[0], dp[1], ..., dp[n-1]).

LIS Worked Example

Array: [10, 9, 2, 5, 3, 7, 101, 18].

dp[0] = 1    arr[0]=10, no previous element
dp[1] = 1    arr[1]=9, no j where arr[j]<9 (10 is not < 9)
dp[2] = 1    arr[2]=2, no j where arr[j]<2
dp[3] = 2    arr[3]=5, arr[2]=2<5, so dp[3]=dp[2]+1=2. LIS ending here: [2,5]
dp[4] = 2    arr[4]=3, arr[2]=2<3, so dp[4]=dp[2]+1=2. LIS ending here: [2,3]
dp[5] = 3    arr[5]=7, check: arr[2]=2<7(dp=1), arr[3]=5<7(dp=2), arr[4]=3<7(dp=2)
         best: dp[3]+1=3 or dp[4]+1=3. LIS: [2,5,7] or [2,3,7]
dp[6] = 4    arr[6]=101, every previous is <101. Best: dp[5]+1=4. LIS: [2,5,7,101]
dp[7] = 4    arr[7]=18, check all j: arr[5]=7<18(dp=3), best: dp[5]+1=4. LIS: [2,5,7,18]

// Answer: max(dp) = 4

python
def lis_dp(arr):
    """O(n^2) LIS with reconstruction."""
    n = len(arr)
    dp = [1] * n
    parent = [-1] * n

    for i in range(1, n):
        for j in range(i):
            if arr[j] < arr[i] and dp[j] + 1 > dp[i]:
                dp[i] = dp[j] + 1
                parent[i] = j

    # Find the index with max dp value
    best_idx = dp.index(max(dp))
    # Trace back
    result = []
    idx = best_idx
    while idx != -1:
        result.append(arr[idx])
        idx = parent[idx]
    return max(dp), result[::-1]

length, seq = lis_dp([10, 9, 2, 5, 3, 7, 101, 18])
print(f"LIS length: {length}, LIS: {seq}")
# LIS length: 4, LIS: [2, 5, 7, 101]

Time: O(n²) for the DP approach. Can be improved to O(n log n) with patience sorting — maintain a "tails" array where tails[k] is the smallest tail element of any increasing subsequence of length k+1. For each new element, binary search for its position in the tails array. This is the preferred interview solution for large inputs.

python
import bisect

def lis_binary(arr):
    """O(n log n) LIS using patience sorting."""
    tails = []
    for x in arr:
        pos = bisect.bisect_left(tails, x)
        if pos == len(tails):
            tails.append(x)  # extend the longest sequence
        else:
            tails[pos] = x   # improve an existing sequence
    return len(tails)

print(lis_binary([10, 9, 2, 5, 3, 7, 101, 18]))  # 4

The take-or-skip pattern. All three problems above share a common structure: at each step, you make a binary or multi-way choice (take/skip an item, use/skip a coin, extend/start a subsequence). This "take or skip" pattern appears in dozens of DP problems. When you see it in an interview, you are almost certainly looking at a DP problem.

How to Approach DP Problems in Interviews

Here is a step-by-step template that works for virtually every DP interview question:

1. Start with brute force

Write the naive recursive solution first. This accomplishes two things: it proves you understand the problem, and it reveals the subproblem structure. Do NOT jump to DP immediately — interviewers want to see your thought process.

↓

2. Identify repeated subproblems

Ask: "Is this recursive function called with the same arguments multiple times?" If yes (and it almost always is), those are your overlapping subproblems. The arguments to the recursive function define the "state" — the key into your DP table.

↓

3. Add memoization

Add a cache (dictionary or array). Before computing, check the cache. After computing, store in the cache. This is the easiest step — you literally just add 3 lines to the brute force. Tell the interviewer: "This converts O(exponential) to O(number of unique states × work per state)."

↓

4. Convert to bottom-up (if time permits)

Determine the fill order (usually smallest subproblems first). Replace recursion with nested loops. Mention space optimization (rolling row). This shows you understand both approaches.

↓

5. State time and space complexity

Always: number of subproblems × time per subproblem. For knapsack: O(n) items × O(W) capacities × O(1) per cell = O(nW). Mention if it is polynomial or pseudo-polynomial.

The interview anti-pattern: jumping straight to the DP table. Many candidates try to write the bottom-up DP immediately without first showing the recursive structure. This is risky because: (1) if you get the recurrence wrong, you have no fallback. (2) the interviewer cannot follow your logic. (3) you skip the chance to earn "thought process" points. Always start with recursion, then optimize. The interviewer WANTS to see the progression from brute force to DP.

Subset Sum: A Building Block

Given a set of positive integers, is there a subset that sums to a target T? This is the boolean version of knapsack and appears constantly as a sub-problem in harder DP problems.

dp[i][s] = True if a subset of items 1..i sums to s
dp[i][s] = dp[i-1][s] (skip item i) OR dp[i-1][s - a_i] (include item i)

python
def subset_sum(nums, target):
    """Can a subset of nums sum to target?"""
    dp = [False] * (target + 1)
    dp[0] = True
    for num in nums:
        # Iterate backwards (0/1 knapsack pattern)
        for s in range(target, num - 1, -1):
            dp[s] = dp[s] or dp[s - num]
    return dp[target]

print(subset_sum([3, 34, 4, 12, 5, 2], 9))   # True (4+5 or 3+4+2)
print(subset_sum([3, 34, 4, 12, 5, 2], 30))  # False

Subset sum is used as a subroutine in "Partition Equal Subset Sum" (can you split an array into two subsets with equal sum? — just check if a subset sums to total/2) and "Target Sum" (assign + or - to each number to hit a target).

Climbing Stairs with Variable Step Sizes

The classic climbing stairs (1 or 2 steps) is just Fibonacci. But with variable step sizes [1, 3, 5], it becomes coin change without the "minimize" — just count the ways:

python
def climb_stairs(n, steps):
    """Number of ways to climb n stairs with given step sizes."""
    dp = [0] * (n + 1)
    dp[0] = 1  # one way to stay at ground: do nothing
    for i in range(1, n + 1):
        for s in steps:
            if i >= s:
                dp[i] += dp[i - s]
    return dp[n]

print(climb_stairs(4, [1, 2]))      # 5 (Fibonacci!)
print(climb_stairs(10, [1, 3, 5]))  # 47

Debugging DP in Interviews

If your DP gives wrong answers during an interview, here is the 30-second diagnostic checklist:

Check	Common bug	Fix
Base cases	dp[0] = 0 when it should be 1, or vice versa	Trace the smallest input by hand
Loop bounds	Off-by-one: `range(n)` vs `range(n+1)`	Verify first and last iterations match the recurrence
Fill direction	Reading dp[j] before it is computed	Print the table for a small input and check each cell
State definition	Ambiguous "what does dp[i] mean?"	Write the EXACT English meaning of dp[i] as a comment
Reconstruction	Using dp table for reconstruction instead of separate choice table	Keep a separate table recording WHICH choice led to each optimal value

Chapter 7: Edit Distance

How many single-character operations does it take to transform one string into another? This question has a name — the edit distance (also called Levenshtein distance) — and it is one of the most practical DP problems in all of computer science. Your spell checker uses it. Your diff tool uses it. DNA alignment tools use a weighted version of it. Even fuzzy search engines like Elasticsearch use it under the hood.

The three allowed operations, each costing 1:

Operation	What it does	Example
Insert	Add a character at any position	"cat" → "cart"
Delete	Remove a character from any position	"cart" → "cat"
Replace	Change one character to another	"cat" → "cut"

A Worked Transformation

Transform "kitten" into "sitting":

kitten → sitten    (replace 'k' with 's')
sitten → sittin    (replace 'e' with 'i')
sittin → sitting    (insert 'g' at end)

Edit distance = 3 operations

But how do we know 3 is optimal? Maybe there is a way to do it in 2 operations? The DP table proves no shorter sequence of operations exists.

The Recurrence

Define d[i,j] = edit distance between X[1..i] and Y[1..j].

d[i,0] = i    (delete all i characters of X)
d[0,j] = j    (insert all j characters of Y)

d[i,j] = min {
  d[i-1,j] + 1      (delete x_i from X)
  d[i,j-1] + 1      (insert y_j into X)
  d[i-1,j-1] + [x_i ≠ y_j]      (replace if different, free if same)
}

The [x_i ≠ y_j] notation means: cost 0 if the characters match (no operation needed), cost 1 if they differ (replace). A match is essentially a "free diagonal step."

Full Table: "kitten" → "sitting"

	ε	s	i	t	t	i	n	g
ε	0	1	2	3	4	5	6	7
k	1	1	2	3	4	5	6	7
i	2	2	1	2	3	4	5	6
t	3	3	2	1	2	3	4	5
t	4	4	3	2	1	2	3	4
e	5	5	4	3	2	2	3	4
n	6	6	5	4	3	3	2	3

d[6,7] = 3. Three operations needed. Let us verify this is optimal by considering alternatives:

// Can we do it in 2 operations?
// "kitten" has 6 chars, "sitting" has 7 chars.
// Length difference = 1, so at least 1 insert (or delete) is needed.
// After accounting for the length change, the strings differ in at least
// 2 positions (k≠s, e≠i), so at least 2 more operations are needed.
// Total minimum: at least 3. Our DP found exactly 3. Optimal confirmed.

Let us trace back through the table to find the exact operations:

d[6,7]=3: came from d[5,6]+1=2+1=3 (insert 'g')
d[5,6]=2: d[4,5]+[e≠i]=1+1=2 (replace 'e'→'i')
d[4,5]=1: d[3,4]+[t=t]=1+0=1 (match 't'='t', free)
d[3,4]=1: d[2,3]+[t=t]=1+0=1 (match 't'='t', free)
d[2,3]=1: d[1,2]+[i=i]=1+0=1 (match 'i'='i', free)
d[1,2]=1 came from: d[0,1]+[k≠s]=1+0? No. d[0,1]=1, d[0,0]+1=1.
Actually d[1,1]=1 via d[0,0]+[k≠s]=0+1=1 (replace 'k'→'s')
d[1,2]=2: d[1,1]+1=2 or d[0,2]+1=3 or d[0,1]+[k≠i]=2. Min=2...
// Let me re-trace more carefully from d[6,7]=3:
d[6,7]=3 ← d[5,6]+1 (insert g) since d[5,6]=2 and 2+1=3 is min
d[5,6]=2 ← d[4,5]+1 (replace e→i) since d[4,5]=1 and 1+1=2
d[4,5]=1 ← d[3,4]+0 (match t=t) since d[3,4]=1 and 1+0=1
d[3,4]=1 ← d[2,3]+0 (match t=t) since d[2,3]=1 and 1+0=1
d[2,3]=1 ← d[1,2]+0 (match i=i)? No, x2=i, y3=t, not a match.
// d[2,3]: x2='i', y3='t'. i≠t so cost=1. d[1,2]+1=3, d[2,2]+1=2, d[1,2]+1=3. Min=d[2,2]+1=1+1=2? But d[2,3]=2, not 1.
// Actually: d[2,3]=2. x2='i', y3='t'. d[1,3]+1=3, d[2,2]+1=2, d[1,2]+1=3. Min=2 via delete. Let me just note the operations: replace k→s, replace e→i, insert g. 3 ops, confirmed.

The key operations are: replace 'k'→'s', match i,t,t, replace 'e'→'i', match n, insert 'g'. Total non-free operations: 3.

Interactive Edit Distance

Type two strings and watch the edit distance table fill. Color coding: green = match (free), orange = replace, blue = insert/delete. The optimal alignment path is highlighted.

From: To:

Edit distance: —

Applications in the Real World

Application	What are the "strings"?	Cost model
Spell checking	Typed word vs dictionary entries	Uniform costs; suggest all words within distance ≤ 2
DNA alignment	Gene sequences (A,C,G,T)	Weighted: transitions (A↔G) cheaper than transversions (A↔C)
diff/patch	Lines of code as "characters"	Lines are units; LCS gives unchanged lines, rest are edits
Fuzzy search	User query vs database records	Threshold: return records within distance k
OCR correction	Recognized text vs dictionary	Weighted by visual similarity: 'O'↔'0' is cheaper than 'O'↔'Z'
Plagiarism detection	Document sentences	Sentence-level alignment; low distance = suspicious

python
def edit_distance(X, Y):
    """Compute Levenshtein distance between two strings."""
    m, n = len(X), len(Y)
    d = [[0] * (n + 1) for _ in range(m + 1)]

    # Base cases: transforming to/from empty string
    for i in range(m + 1): d[i][0] = i
    for j in range(n + 1): d[0][j] = j

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            cost = 0 if X[i-1] == Y[j-1] else 1
            d[i][j] = min(
                d[i-1][j] + 1,      # delete x_i
                d[i][j-1] + 1,      # insert y_j
                d[i-1][j-1] + cost  # replace (or free match)
            )

    return d[m][n]

print(edit_distance("kitten", "sitting"))   # 3
print(edit_distance("sunday", "saturday"))  # 3
print(edit_distance("horse", "ros"))       # 3

Time: O(m·n). Space: O(m·n), reducible to O(min(m,n)) since each row only depends on the previous row.

Edit distance and LCS are two sides of the same coin. If the edit distance between X and Y is d, and the LCS has length L, then d = m + n - 2L (where m and n are the string lengths). Every matched character in the LCS corresponds to a "free" diagonal step. Every non-matched character requires an insert or delete. This means solving LCS also solves edit distance, and vice versa.

Variants and Weighted Edit Distance

The standard Levenshtein distance treats all operations as cost 1. But in practice, costs are often weighted:

Variant	Modification	Use case
Damerau-Levenshtein	Add "transpose" operation (swap two adjacent characters, cost 1)	Spell checking (typos often transpose: "teh" → "the")
Weighted edit distance	Different costs for different operations/characters	DNA alignment (transition mutations A↔G cost less than transversions A↔C)
Hamming distance	Only substitutions allowed (strings must be same length)	Error-correcting codes, binary string comparison
Jaro-Winkler	Different formula emphasizing matching characters and their order	Record linkage, deduplication (names, addresses)

The DP approach handles all weighted variants — just change the costs in the min() computation. The structure of the table and traceback remain identical.

Edit Distance: The Full Traceback

Knowing the distance is useful, but often you need the ACTUAL sequence of operations. Here is the complete code with operation reconstruction:

python
def edit_distance_with_ops(X, Y):
    """Edit distance with full operation traceback."""
    m, n = len(X), len(Y)
    d = [[0] * (n + 1) for _ in range(m + 1)]
    ops = [[''] * (n + 1) for _ in range(m + 1)]

    for i in range(m + 1):
        d[i][0] = i
        ops[i][0] = 'D'  # delete
    for j in range(n + 1):
        d[0][j] = j
        ops[0][j] = 'I'  # insert
    ops[0][0] = ''

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i-1] == Y[j-1]:
                d[i][j] = d[i-1][j-1]
                ops[i][j] = 'M'  # match (free)
            else:
                choices = [
                    (d[i-1][j] + 1, 'D'),   # delete x_i
                    (d[i][j-1] + 1, 'I'),   # insert y_j
                    (d[i-1][j-1] + 1, 'R')  # replace
                ]
                d[i][j], ops[i][j] = min(choices)

    # Trace back to find operations
    result = []
    i, j = m, n
    while i > 0 or j > 0:
        op = ops[i][j]
        if op == 'M':
            result.append(f"match '{X[i-1]}'")
            i -= 1; j -= 1
        elif op == 'R':
            result.append(f"replace '{X[i-1]}' with '{Y[j-1]}'")
            i -= 1; j -= 1
        elif op == 'D':
            result.append(f"delete '{X[i-1]}'")
            i -= 1
        elif op == 'I':
            result.append(f"insert '{Y[j-1]}'")
            j -= 1

    return d[m][n], result[::-1]

dist, operations = edit_distance_with_ops("kitten", "sitting")
print(f"Distance: {dist}")
for op in operations:
    print(f"  {op}")
# Distance: 3
#   replace 'k' with 's'
#   match 'i'
#   match 't'
#   match 't'
#   replace 'e' with 'i'
#   match 'n'
#   insert 'g'

Second Worked Example: "horse" → "ros"

This is a popular LeetCode problem (LC #72). Let us trace it completely:

	ε	r	o	s
ε	0	1	2	3
h	1	1	2	3
o	2	2	1	2
r	3	2	2	2
s	4	3	3	2
e	5	4	4	3

d[5,3] = 3. Operations: replace 'h'→'r', match 'o', delete 'r', match 's', delete 'e'. But wait, a simpler reading: delete 'h', match 'o'... let us trace properly from d[5,3]=3. d[5,3] came from d[4,2]+1=3+... Actually the min of d[4,3]+1=3, d[5,2]+1=5, d[4,2]+[e≠s]=3+1=4. So d[5,3]=3 via delete. d[4,3]=2 via d[3,2]+[s=s]=2+0=2, match. d[3,2]=2 via d[2,2]+1=2 (delete r), or d[2,1]+1=3, or d[2,1]+[r≠o]=... d[3,2] came from d[2,1]+[r≠o]=2+1=3, d[3,1]+1=3, d[2,2]+1=2. Via delete. d[2,2]=1 via d[1,1]+[o=o? no, o≠r]=1+1=2, or d[1,2]+1=3, or d[2,1]+1=3. Wait, d[2,2]: x₂='o', y₂='o', match! d[1,1]+0=1. So d[2,2]=1 via match. d[1,1]=1 via d[0,0]+[h≠r]=0+1=1, replace. Operations: replace h→r, match o, delete r, match s, delete e. 3 operations confirmed.

Space Optimization for Edit Distance

Since each row d[i] only depends on d[i-1], we can use two rows instead of m+1 rows, reducing space from O(m·n) to O(n). This is critical when comparing long sequences (e.g., DNA with millions of base pairs):

python
def edit_distance_space_opt(X, Y):
    """O(min(m,n)) space edit distance."""
    if len(X) < len(Y):
        X, Y = Y, X  # ensure Y is shorter (iterate over shorter dimension)
    m, n = len(X), len(Y)
    prev = list(range(n + 1))  # previous row
    curr = [0] * (n + 1)

    for i in range(1, m + 1):
        curr[0] = i
        for j in range(1, n + 1):
            cost = 0 if X[i-1] == Y[j-1] else 1
            curr[j] = min(prev[j] + 1, curr[j-1] + 1, prev[j-1] + cost)
        prev, curr = curr, prev  # swap rows

    return prev[n]

print(edit_distance_space_opt("kitten", "sitting"))  # 3

Bonus: Wildcard Matching (LeetCode #44)

A closely related problem: given a string s and a pattern p with '?' (matches any single character) and '*' (matches any sequence including empty), determine if s matches p. This is string DP disguised as pattern matching.

dp[i][j] = True if s[1..i] matches p[1..j]

If p[j] = '*': dp[i][j] = dp[i][j-1] (* matches empty) OR dp[i-1][j] (* matches s[i] and more)
If p[j] = '?' or p[j] = s[i]: dp[i][j] = dp[i-1][j-1] (single char match)
Otherwise: dp[i][j] = False (mismatch)

python
def wildcard_match(s, p):
    """Wildcard pattern matching: ? = any char, * = any sequence."""
    m, n = len(s), len(p)
    dp = [[False] * (n + 1) for _ in range(m + 1)]
    dp[0][0] = True
    # Handle leading *s in pattern
    for j in range(1, n + 1):
        if p[j-1] == '*': dp[0][j] = dp[0][j-1]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if p[j-1] == '*':
                dp[i][j] = dp[i][j-1] or dp[i-1][j]
            elif p[j-1] == '?' or p[j-1] == s[i-1]:
                dp[i][j] = dp[i-1][j-1]

    return dp[m][n]

print(wildcard_match("adceb", "*a*b"))     # True
print(wildcard_match("acdcb", "a*c?b"))    # False

The '*' case is the tricky one. dp[i][j-1] means '*' matches empty (skip it). dp[i-1][j] means '*' matches s[i] and potentially more characters (the '*' stays active). This is elegant because it naturally handles '*' matching any number of characters through the chain of dp[i-1][j] ← dp[i-2][j] ← ... lookups.

Interview question: What is the edit distance between "sunday" and "saturday"?

3 — the operations are: (1) insert 'a' after 's' to get "saunday", (2) insert 't' after 'a' to get "satunday", (3) replace 'n' with 'r' to get "saturday". Three operations, and the table confirms d[6,8]=3 4 — four operations are needed 2 — only two changes from sunday to saturday

Chapter 8: DP on Trees and DAGs

So far, every DP problem has used a 1D array or a 2D table. But dynamic programming is far more general than that. It works on ANY structure whose dependency graph is a DAG (directed acyclic graph). Trees are the most natural example: the value at a node depends only on the values of its children.

Optimal Binary Search Tree (CLRS 15.5)

You are building a binary search tree for a fixed set of keys k₁ < k₂ < ... < k_n. Each key has a known search probability p_i (how often it is searched for). You want the BST that minimizes the expected search cost: the sum of (depth + 1) × p_i for each key.

If all keys are equally likely, a balanced BST is optimal. But when probabilities are skewed, you want frequently-accessed keys near the root, even at the cost of pushing rare keys deeper. This is exactly how Huffman coding works for prefix codes, but for BSTs the constraint is different: the search-tree property must hold.

The subproblem is identical in structure to matrix-chain multiplication. Define e[i,j] = minimum expected search cost for a BST containing keys k_i..k_j. Try each key k_r (i ≤ r ≤ j) as the root.

Why Not Just Use a Balanced BST?

Consider keys {A, B, C} with search probabilities p_A=0.7, p_B=0.2, p_C=0.1. A balanced BST puts B at the root (depth 1), A left (depth 2), C right (depth 2). Expected cost = 0.7×2 + 0.2×1 + 0.1×2 = 1.8.

But putting A at the root (depth 1), B right (depth 2), C right-right (depth 3): cost = 0.7×1 + 0.2×2 + 0.1×3 = 1.4. Much better! The most frequently accessed key should be near the root, even at the cost of making the tree unbalanced.

Finding the optimal arrangement is the optimal BST problem:

e[i,j] = min_{i ≤ r ≤ j} { e[i,r-1] + e[r+1,j] + w(i,j) }

where w(i,j) = p_i + p_i+1 + ... + p_j is the sum of frequencies in the range. The w(i,j) term arises because every node in the subtree drops one level when placed under root r, adding its probability to the cost. Time: O(n³), Space: O(n²).

Tree DP: Maximum Weight Independent Set

This is the classic "company party" problem. A company has a tree-shaped hierarchy (each employee has exactly one direct boss, except the CEO at the root). Each employee has a "fun rating." You want to invite a subset of employees to a party that maximizes the total fun, subject to one rule: no employee can attend if their direct boss also attends.

This is an independent set problem on a tree (NP-hard on general graphs, but polynomial on trees thanks to DP!).

For each node v, define two values:

// If we INCLUDE v in the party, v's children CANNOT attend
include[v] = fun[v] + ∑_{c ∈ children(v)} exclude[c]

// If we EXCLUDE v, each child can be included OR excluded (take the better)
exclude[v] = ∑_{c ∈ children(v)} max(include[c], exclude[c])

Process bottom-up via post-order traversal. Leaves first, then their parents, up to the root. For leaf nodes: include[leaf] = fun[leaf], exclude[leaf] = 0. The answer for the whole company is max(include[root], exclude[root]).

python
class TreeNode:
    def __init__(self, val, children=None):
        self.val = val
        self.children = children or []

def max_independent_set(root):
    """Maximum weight independent set on a tree.
    Returns: (max_value, set_of_selected_nodes)"""

    def dp(node):
        """Returns (include_value, exclude_value)"""
        if not node.children:
            return node.val, 0  # leaf: include=val, exclude=0

        incl = node.val
        excl = 0
        for child in node.children:
            c_incl, c_excl = dp(child)
            incl += c_excl  # if we include node, must exclude children
            excl += max(c_incl, c_excl)  # if we exclude node, children are free
        return incl, excl

    incl, excl = dp(root)
    return max(incl, excl)

# Build the example tree from the diagram
tree = TreeNode(8, [
    TreeNode(3, [TreeNode(7), TreeNode(2, [TreeNode(5)])]),
    TreeNode(4, [TreeNode(6), TreeNode(1)])
])
print(max_independent_set(tree))  # 27

The time complexity of this DFS is O(n) — each node is visited exactly once. The space is O(n) for the recursion stack (O(h) where h is tree height, worst case O(n) for a skewed tree).

This same pattern — define include[v] and exclude[v] for each node, compute via post-order DFS — solves many tree DP problems:

Problem	include[v] =	exclude[v] =
Max independent set	val[v] + sum(exclude[child])	sum(max(incl, excl) for child)
House Robber III (LeetCode)	Same as above (binary tree version)	Same as above
Min vertex cover	1 + sum(min(incl, excl) for child)	sum(include[child]) — all children MUST be covered
Min dominating set	More complex: 3 states (dominated-included, dominated-excluded, not-dominated)	Requires careful case analysis

Post-order = natural bottom-up for trees. In a flat 1D array, "bottom-up" means left-to-right. In a 2D table, it means row-by-row or diagonal-by-diagonal. In a tree, "bottom-up" means post-order: process all children before the parent. This guarantees every dependency is resolved before it is needed — the universal DP invariant.

Tree DP: Maximum Weight Independent Set

A tree where each node has a "fun value." Watch the post-order traversal compute include/exclude values. Orange nodes = selected for the party. I=include value, E=exclude value.

Press Step or Play to begin post-order traversal.

Longest Path in a DAG

The longest path problem is NP-hard on general graphs (it subsumes the Hamiltonian path problem). But on a DAG, it is solvable in O(V+E) time using DP over a topological sort.

dist[v] = max_{(u,v) ∈ E} { dist[u] + weight(u,v) }

Process vertices in topological order. For each vertex v, look at all incoming edges and take the maximum. This is fundamentally DP: the "table" is indexed by vertices, the "fill order" is topological order, and each vertex depends only on its predecessors.

Every DP IS a shortest/longest path in a DAG. Think of each subproblem as a node. Draw a directed edge from subproblem A to subproblem B if B directly depends on A (B reads A's value). The dependencies form a DAG (if there were a cycle, you would have a circular definition — the recurrence would be unsolvable). Bottom-up DP computation is simply a topological-order traversal of this subproblem DAG. This is not an analogy — it is a formal equivalence. If you can draw the subproblem DAG, you can do the DP.

The Subproblem DAGs for Our Problems

Problem	Nodes (subproblems)	Edges (dependencies)	DAG shape
Rod cutting	r(0), r(1), ..., r(n)	r(j) depends on r(0) through r(j-1)	Linear chain with extra back-edges — O(n²) edges total
LCS	c[i][j] for 0≤i≤m, 0≤j≤n	Each cell depends on up to 3 neighbors	Grid DAG with diagonal, up, and left edges
Matrix chain	m[i][j] for 1≤i≤j≤n	m[i][j] depends on O(n) entries on same row/column	Upper triangle of a grid, filled diagonally
Fibonacci	F(0), F(1), ..., F(n)	F(k) depends on F(k-1) and F(k-2)	Linear chain — simplest possible DAG
Tree party	incl[v], excl[v] for each node v	Parent depends on children	The tree itself is the DAG

This DAG perspective explains WHY the DP works: we are simply computing values in topological order on the dependency graph. It also explains WHY circular dependencies are forbidden: a cycle in the DAG would mean a subproblem depends on itself, which is undefined.

DP on DAGs: Concrete Example

Consider a DAG with vertices A, B, C, D, E and weighted edges A→B(3), A→C(6), B→D(4), B→E(1), C→D(2), C→E(7), D→E(5). The longest path from A to E:

// Topological order: A, B, C, D, E (or A, C, B, D, E)

dist[A] = 0    (source)
dist[B] = dist[A] + 3 = 3    (only edge A→B)
dist[C] = dist[A] + 6 = 6    (only edge A→C)
dist[D] = max(dist[B]+4, dist[C]+2) = max(7, 8) = 8    (edges B→D, C→D)
dist[E] = max(dist[B]+1, dist[C]+7, dist[D]+5) = max(4, 13, 13) = 13

// Longest path A→E has length 13, via A→C→E (6+7=13) or A→C→D→E (6+2+5=13)

This is exactly the same as any other DP: define state (vertex), write recurrence (max over incoming edges), fill in topological order. The only difference is the "table" is indexed by graph vertices instead of integers or pairs.

Worked Example: Tree DP by Hand

Consider this tree (node values in parentheses):

        (8)
       /    \
     (3)     (4)
    / \    / \
(7) (2) (6) (1)
         |
        (5)

Post-order traversal visits: 7, 5, 2, 3, 6, 1, 4, 8.

// Leaves first (no children)
Node 7: incl=7, excl=0
Node 5: incl=5, excl=0

// Node 2 (child: 5)
Node 2: incl = 2 + excl[5] = 2 + 0 = 2
         excl = max(incl[5],excl[5]) = max(5,0) = 5

// Node 3 (children: 7, 2)
Node 3: incl = 3 + excl[7] + excl[2] = 3 + 0 + 5 = 8
         excl = max(7,0) + max(2,5) = 7 + 5 = 12

// Node 6, Node 1 (leaves)
Node 6: incl=6, excl=0
Node 1: incl=1, excl=0

// Node 4 (children: 6, 1)
Node 4: incl = 4 + excl[6] + excl[1] = 4 + 0 + 0 = 4
         excl = max(6,0) + max(1,0) = 6 + 1 = 7

// Root node 8 (children: 3, 4)
Node 8: incl = 8 + excl[3] + excl[4] = 8 + 12 + 7 = 27
         excl = max(8,12) + max(4,7) = 12 + 7 = 19

// Answer: max(27, 19) = 27
// Include root (8), must exclude 3 and 4
// Exclude 3 → take 7(incl) + max(2,5)=5(excl of 2, which means take 5)
// Exclude 4 → take 6(incl) + 1(incl) = 7
// Selected: 8, 7, 5, 6, 1 → 8+7+5+6+1 = 27 ✓

Time complexity of tree DP: O(n) where n is the number of nodes. Each node is visited exactly once in post-order, and the work at each node is proportional to its number of children. The sum of children counts over all nodes is n-1 (a tree with n nodes has n-1 edges). So total work is O(n).

Design question: You have a tree with 1,000,000 nodes. Each node has a value. You want the maximum weight independent set (no two adjacent nodes selected). Can you solve this, and what are the time and space requirements?

No — maximum weight independent set is NP-hard Yes — O(n) time, O(n) space. On a TREE (not a general graph), the include/exclude DP gives a linear-time exact solution. Visit each node once in post-order, storing two values per node. Total: 2n values stored, each computed in O(degree) time, summing to O(n) overall. The key insight is that tree structure removes the NP-hard complexity — the "independent" constraint only involves parent-child pairs, which have a clean local structure Yes — O(n log n) time because we need to sort the tree by value first

Chapter 9: Interview Arsenal

This is your reference sheet. Every classic DP problem, categorized by structure, with recurrence, complexity, and interview framing. Print this. Memorize it. Use it.

The Six DP Categories

Category	Problems	Table shape	Key pattern
1D Linear	Rod cutting, Coin change, LIS, House robber, Climbing stairs, Fibonacci, Decode ways, Word break	dp[i]	Each state depends on a fixed number of previous states
2D Grid/Sequence	LCS, Edit distance, 0/1 Knapsack, Unique paths, Minimum path sum, Regex matching, Interleaving strings	dp[i][j]	Two indices (two sequences, or items × capacity)
Interval	Matrix chain, Optimal BST, Palindrome partitioning, Burst balloons, Strange printer	dp[i][j] diagonal fill	Subproblem = contiguous range, try all split points
Tree/DAG	Tree diameter, Max independent set, House robber III, Binary tree cameras	Values per node	Post-order traversal, include/exclude at each node
State machine	Buy/sell stock (I through IV), String transformations	dp[i][state]	At each step, transition between K finite states
Bitmask	TSP, Task assignment, Hamiltonian path, Set partition	dp[mask][i]	Track visited subset as bitmask, O(2ⁿ·n)

Complete Cheat Sheet

Problem	Recurrence	Time	Space	Key trick
Rod cutting	r[n]=max{p[i]+r[n-i]}	O(n²)	O(n)	1D, try all first-cut lengths
Coin change	dp[a]=min{dp[a-d]+1}	O(Ak)	O(A)	1D, unbounded supply
0/1 Knapsack	max(skip, take)	O(nW)	O(nW)→O(W)	Backwards iteration for 1-row
LCS	match:diag+1, else:max(up,left)	O(mn)	O(mn)→O(min)	Hirschberg for linear space
Edit distance	min(del, ins, rep)	O(mn)	O(mn)→O(min)	d = m+n-2L (relates to LCS)
Matrix chain	min over split k	O(n³)	O(n²)	Diagonal fill order
LIS	dp[i]=1+max{dp[j]:a[j]<a[i]}	O(n²)→O(n log n)	O(n)	Patience sort + binary search
Optimal BST	min over root r + freq sum	O(n³)→O(n²)	O(n²)	Knuth's optimization
House robber	max(dp[i-1], dp[i-2]+v[i])	O(n)	O(1)	Only need prev two values
Climbing stairs	dp[n]=dp[n-1]+dp[n-2]	O(n)	O(1)	Literally Fibonacci

The Four Interview Patterns

Pattern 1: Take or Skip. At each element, you choose to include it or not. This creates a binary branching structure. Knapsack, house robber, subset sum, coin change (multi-way), LIS. The recurrence always has two (or more) branches. Interview tell: "maximize/minimize selection under constraint."

Pattern 2: Interval DP. The subproblem is a contiguous range [i,j]. You try all split points k in [i,j) and take the best. Matrix chain, optimal BST, burst balloons, palindrome partitioning. Fill order: diagonally (shortest intervals first). Interview tell: "parenthesize," "partition into groups," "process a range."

Pattern 3: State Machine. At each step, you are in one of K states and can transition to other states. Best time to buy/sell stock: states are {holding, not_holding}. At each day, you can transition from not_holding to holding (buy) or holding to not_holding (sell). dp[i][state] = best profit at day i in each state. Interview tell: "at most k transactions," "cooldown period," "fee per action."

Pattern 4: String DP. Two strings of length m and n produce an m×n DP table. Base cases fill the borders (one or both strings empty). Fill row by row. LCS, edit distance, regex matching, wildcard matching, distinct subsequences. Interview tell: any problem comparing, aligning, or transforming two sequences.

Debug Checklist: "My DP Gives Wrong Answers"

1. Wrong Base Cases

Are dp[0], dp[0][0] initialized correctly? The most common bug: dp[0]=0 when it should be 1, or dp[0][j]=j when it should be 0. Off-by-one here cascades through every subsequent cell.

↓

2. Wrong Fill Order

Bottom-up: are you computing dp[i] from dp[i-1] (correct) or dp[i+1] (wrong, not yet computed)? For 2D: are you filling row-by-row when you should be filling diagonally?

↓

3. Off-by-One Loop Bounds

Does i range from 0 to n or 1 to n? Does the inner loop include or exclude the boundary? This is the #1 DP implementation bug. Always verify bounds against a tiny hand-traced example.

↓

4. Print the Small Table

For n≤5, print the full DP table. Manually verify each cell against your hand calculation. If any cell differs, you have found the bug.

↓

5. Wrong Reconstruction

Value correct but solution wrong? Check the secondary table (the one recording choices). Common bug: storing the choice BEFORE updating the best value.

System Design: DP in Production

"Design an autocomplete with fuzzy matching."

Use edit distance with a BK-tree (ball tree for metric spaces). Index all dictionary words. Given a query, traverse the tree, pruning branches where the triangle inequality guarantees distance exceeds threshold. Reduces O(n) distance computations to O(n^0.6) average. For real-time autocomplete, precompute all words within distance 1 and 2 of common prefixes.

"Design a recommendation engine using sequence alignment."

Model user behavior as a sequence of actions (viewed product A, searched for B, purchased C). Compute an LCS-variant similarity between users: users with similar action subsequences get similar recommendations. Use the O(min(m,n)) space optimization since you only need the similarity score, not the full alignment.

Buy and Sell Stock: State Machine DP

This family of problems is a favorite at FAANG interviews. The simplest version: given an array of stock prices, find the maximum profit from at most one buy-sell transaction.

// At each day, you are in one of two states:
// not_holding: you have no stock. You can BUY (transition to holding).
// holding: you own stock. You can SELL (transition to not_holding).

not_holding[i] = max(not_holding[i-1], holding[i-1] + price[i])
// do nothing, or sell
holding[i] = max(holding[i-1], -price[i])
// do nothing, or buy (at most 1 transaction: start from 0)

For "at most k transactions," add a transaction counter dimension: dp[i][k][state]. For "with cooldown," add a "just_sold" state that prevents buying the next day. The state machine pattern handles all variants.

python
def max_profit(prices):
    """Best Time to Buy and Sell Stock (single transaction)."""
    min_price = float('inf')
    max_profit = 0
    for price in prices:
        min_price = min(min_price, price)
        max_profit = max(max_profit, price - min_price)
    return max_profit

def max_profit_k(prices, k):
    """Best Time to Buy and Sell Stock with at most k transactions."""
    n = len(prices)
    if k >= n // 2:  # unlimited transactions
        return sum(max(0, prices[i+1]-prices[i]) for i in range(n-1))
    dp = [[0] * n for _ in range(k + 1)]
    for t in range(1, k + 1):
        max_prev = -prices[0]
        for j in range(1, n):
            dp[t][j] = max(dp[t][j-1], prices[j] + max_prev)
            max_prev = max(max_prev, dp[t-1][j] - prices[j])
    return dp[k][n-1]

print(max_profit([7,1,5,3,6,4]))  # 5 (buy at 1, sell at 6)
print(max_profit_k([3,2,6,5,0,3], 2))  # 7 (buy 2 sell 6, buy 0 sell 3)

House Robber: The Simplest DP

A row of houses, each with a value. Rob houses to maximize total value, but you cannot rob two adjacent houses (alarm will trigger). This is the simplest interesting DP problem and a great warm-up:

dp[i] = max(dp[i-1], dp[i-2] + value[i])

Skip house i (take dp[i-1]) or rob house i (take dp[i-2] + value[i], since we skip i-1). Only need the last two values, so O(1) space.

python
def rob(nums):
    """House Robber — O(n) time, O(1) space."""
    prev2, prev1 = 0, 0
    for x in nums:
        prev2, prev1 = prev1, max(prev1, prev2 + x)
    return prev1

print(rob([2,7,9,3,1]))  # 12 (rob houses 1,3: 7+9=... wait, 2+9+1=12)

Word Break: A Practical DP Problem

Given a string s and a dictionary of words, determine if s can be segmented into space-separated dictionary words. "leetcode" with dictionary ["leet", "code"] returns True.

dp[i] = True if s[0..i-1] can be segmented
dp[i] = True if any dp[j] is True and s[j..i-1] is in the dictionary, for 0 ≤ j < i
dp[0] = True (empty string is always valid)

python
def word_break(s, wordDict):
    """Can string s be segmented into dictionary words?"""
    words = set(wordDict)
    n = len(s)
    dp = [False] * (n + 1)
    dp[0] = True  # empty string is valid

    for i in range(1, n + 1):
        for j in range(i):
            if dp[j] and s[j:i] in words:
                dp[i] = True
                break  # found one valid segmentation, enough

    return dp[n]

print(word_break("leetcode", ["leet", "code"]))     # True
print(word_break("applepenapple", ["apple", "pen"])) # True
print(word_break("catsandog", ["cats","dog","sand","and","cat"]))  # False

Time: O(n²) if word lookups are O(1) (hash set). Space: O(n). This is a 1D DP where dp[i] depends on all dp[j] for j < i — similar to LIS but with a string condition instead of a numerical comparison.

Unique Paths: 2D Grid DP

A robot on an m×n grid starts at top-left and can only move right or down. How many unique paths to the bottom-right? This is the simplest 2D DP and a great warm-up problem.

dp[i][j] = dp[i-1][j] + dp[i][j-1]
dp[0][j] = 1 for all j (only one way along top row)
dp[i][0] = 1 for all i (only one way along left column)

python
def unique_paths(m, n):
    dp = [[1] * n for _ in range(m)]
    for i in range(1, m):
        for j in range(1, n):
            dp[i][j] = dp[i-1][j] + dp[i][j-1]
    return dp[m-1][n-1]

print(unique_paths(3, 7))  # 28
print(unique_paths(3, 3))  # 6

Fun fact: the answer is the binomial coefficient C(m+n-2, m-1), computable in O(min(m,n)) without any DP. But the DP approach generalizes to grids with obstacles (set dp[i][j]=0 for obstacle cells), which the closed-form cannot handle.

Minimum Path Sum: Adding Weights to the Grid

Same grid, but each cell has a cost. Find the path from top-left to bottom-right that minimizes the total cost. Same recurrence, just replace addition with min:

python
def min_path_sum(grid):
    """Minimum cost path from top-left to bottom-right."""
    m, n = len(grid), len(grid[0])
    dp = [[0] * n for _ in range(m)]
    dp[0][0] = grid[0][0]
    for j in range(1, n): dp[0][j] = dp[0][j-1] + grid[0][j]
    for i in range(1, m): dp[i][0] = dp[i-1][0] + grid[i][0]
    for i in range(1, m):
        for j in range(1, n):
            dp[i][j] = grid[i][j] + min(dp[i-1][j], dp[i][j-1])
    return dp[m-1][n-1]

print(min_path_sum([[1,3,1],[1,5,1],[4,2,1]]))  # 7 (1→3→1→1→1)

This is one of the cleanest interview problems because the 2D DP is so intuitive. Common follow-up: "What if you can move up, down, left, and right?" Then it is no longer DP (cycles are possible) — you need Dijkstra's algorithm.

Recognizing DP in Disguise

Many interview problems do not say "use DP." You have to recognize the pattern. Here are the tells:

Tell	Why it suggests DP	Example problems
"Minimum/maximum cost to..."	Optimization = DP or greedy; if greedy fails, DP	Minimum path sum, minimum coins
"Number of ways to..."	Counting problems are DP (base case: 1 way to do nothing)	Climbing stairs, unique paths, decode ways
"Is it possible to..."	Often boolean DP: dp[i] = True if possible	Word break, partition equal subset sum
"Given two strings..."	Almost always 2D string DP	LCS, edit distance, regex matching
"Can you break this into subproblems?"	Interviewer is literally telling you to use DP	Any problem where they hint at subproblems

Staff-level question: You have a DP solution for 0/1 knapsack with n=1000 items and W=10⁶. The O(n·W) table has 10⁹ entries — too much memory. What do you do?

Use a hash map instead of an array — only store non-zero entries Switch to greedy (sort by value/weight ratio) — always gives the optimal answer Use the rolling-row optimization: since dp[i] depends only on dp[i-1], keep just two rows, reducing space from O(nW) to O(W). For reconstruction, use Hirschberg-style divide-and-conquer. If W is truly too large for even one row, use an FPTAS: scale all values by epsilon/n, reducing the effective table size while guaranteeing a (1-epsilon) approximation. Greedy gives no optimality guarantee for 0/1 knapsack

Chapter 10: Connections

Dynamic programming is not an isolated technique. It sits at the center of a web of algorithmic ideas. Understanding where DP ends and other paradigms begin makes you a fundamentally better problem solver.

DP vs Everything Else — The Complete Map

Paradigm	Optimal substructure?	Overlapping subproblems?	When to use	CLRS chapter
Dynamic Programming	Yes	Yes	Optimization with repeated sub-work	Ch 15 (this lesson)
Greedy	Yes	Irrelevant	When local optimality ⇒ global optimality	Ch 16
Divide-and-Conquer	Yes	No	Independent subproblems (merge sort, FFT)	Ch 4
Backtracking	Not required	Maybe	Constraint satisfaction, N-queens, Sudoku	—
Branch-and-Bound	Yes	Yes	When DP table is too large; use bounds to prune	—
Linear Programming	Continuous relaxation	N/A	Optimization over continuous variables with linear constraints	Ch 29

DP in Machine Learning

If you are studying ML (which, given this site, you probably are), DP is everywhere:

ML Algorithm	DP Connection	Subproblem Structure
Viterbi (HMM decoding)	Longest path in a trellis DAG	Best path ending at state s at time t
CTC decoding	DP on a modified trellis	Best alignment score at position t with label l
Beam search	Pruned DP on a sequence DAG	Top-k partial sequences at each time step
Value iteration (RL)	IS bottom-up DP on the MDP state space	V(s) = max_a{R(s,a) + gamma * V(s')}
Smith-Waterman (bio)	Local alignment = edit distance variant	Best local alignment ending at (i,j)
CKY parsing (NLP)	Interval DP on parse trees	Best parse for words i..j under grammar rule

The Bellman connection. Richard Bellman invented dynamic programming in the 1950s while working at RAND Corporation. The Bellman equation V(s) = max_a{R(s,a) + γ∑_s'P(s'|s,a)V(s')} IS a DP recurrence. Value iteration fills a table (the value function). Policy iteration reconstructs the solution (the optimal policy). Everything in reinforcement learning — from tabular Q-learning to deep RL — traces back to this one DP recurrence. Bellman reportedly chose the name "dynamic programming" because it sounded impressive enough to avoid budget cuts. The "programming" refers to planning/scheduling (as in "linear programming"), not computer programming.

Where to Go Next

CLRS Chapter 16: Greedy Algorithms. When DP is overkill. Activity selection, Huffman coding, and fractional knapsack are all solvable with a single greedy pass because the "greedy choice property" holds — you never need to revisit a decision. The contrast with 0/1 knapsack (which requires DP because greedy fails) is the most important dichotomy in algorithm design.

CLRS Chapter 4: Divide-and-Conquer. Merge sort and Strassen's matrix multiplication have optimal substructure but NO overlapping subproblems. The subproblems at each recursion level are always fresh partitions of the input. Memoization would not help because no subproblem is ever repeated. This is why divide-and-conquer leads to recurrence relations (solved by the master theorem), while DP leads to tables.

Graph algorithms. Shortest-path algorithms (Dijkstra, Bellman-Ford, Floyd-Warshall) are all DP in disguise.

Floyd-Warshall: DP on Graphs

Floyd-Warshall computes all-pairs shortest paths in O(V³) time. The DP formulation:

d^(k)[i][j] = min( d^(k-1)[i][j], d^(k-1)[i][k] + d^(k-1)[k][j] )

The subproblem d^(k)[i][j] = shortest path from i to j using only vertices {1, 2, ..., k} as intermediates. At each step, we decide: does the shortest path from i to j go through vertex k (use it as a waypoint) or not?

This is structurally identical to 0/1 knapsack! Instead of "take or skip item k," it is "route through vertex k or not." The three nested loops (k, i, j) fill a 3D table, optimized to 2D because d^(k) only depends on d^(k-1). Note the k loop MUST be the outermost loop — this is a common interview trick question.

Why k must be outermost: d[i][j] at stage k needs d[i][k] and d[k][j] from stage k-1. If k were an inner loop, you would mix stages, using some values from the current k and others from a previous k. The outermost position ensures all d values are consistent within each stage.

python
def floyd_warshall(graph):
    """All-pairs shortest paths. graph[i][j] = edge weight (inf if no edge)."""
    n = len(graph)
    d = [row[:] for row in graph]  # copy
    for k in range(n):
        for i in range(n):
            for j in range(n):
                if d[i][k] + d[k][j] < d[i][j]:
                    d[i][j] = d[i][k] + d[k][j]
    return d

Bellman-Ford: DP on Edge Count

Bellman-Ford computes single-source shortest paths, even with negative edge weights. The DP interpretation: d^(k)[v] = shortest path from source s to v using at most k edges.

d^(k)[v] = min( d^(k-1)[v], min_(u,v)∈E { d^(k-1)[u] + w(u,v) } )

Run for k=1 to V-1 (since a shortest path in a graph with V vertices uses at most V-1 edges). This is bottom-up DP where the "subproblem size" is the number of edges allowed.

The DP Family Tree

Here is how all the problems in this lesson connect to each other and to the broader algorithm landscape:

1D Linear DP

Rod cutting, Fibonacci, coin change, climbing stairs, house robber. Table: dp[i]. Fill: left to right. Simplest category. Master these first.

↓ add a second dimension

2D Sequence DP

LCS, edit distance, wildcard matching, 0/1 knapsack. Table: dp[i][j]. Fill: row by row. The workhorse category — most interview DP is here.

↓ subproblem = contiguous range

Interval DP

Matrix chain, optimal BST, burst balloons, stone merge. Table: dp[i][j] (upper triangle). Fill: diagonally. The hardest to recognize but mechanical once you see the pattern.

↓ dependencies form a tree

Tree/DAG DP

Party problem, house robber III, tree diameter. Table: values per node. Fill: post-order. Elegant and efficient — always linear time.

↓ add finite-state transitions

State Machine DP

Buy/sell stock variants, string transformations. Table: dp[i][state]. Fill: left to right across states. The "advanced" category.

↓ exponential but exact

Bitmask DP

TSP, task assignment. Table: dp[mask][i]. Fill: by popcount(mask). O(2ⁿ · n) — only feasible for n ≤ 20-25. The boundary of tractability.

DP in Competitive Programming

In competitive programming, DP problems are categorized by difficulty:

Difficulty	What you need	Example
Easy (Div 2 A-B)	Recognize Fibonacci/climbing stairs pattern	Number of ways to reach step n
Medium (Div 2 C-D)	2D table, correct recurrence, space optimization	Knapsack, LCS, edit distance
Hard (Div 2 E, Div 1 C)	Bitmask DP, Digit DP, DP with data structures	TSP, count numbers with certain digit properties
Expert (Div 1 D-E)	DP optimization (Knuth, D&C, convex hull trick)	O(n³) → O(n²) via Knuth's optimization

For coding interviews, medium is the ceiling. You will almost never see bitmask DP or DP with advanced optimizations. Master the basics: 1D, 2D, and interval DP.

A Note on Practice

DP is one of those topics where reading about it is necessary but not sufficient. You need to solve problems. Here is a recommended progression:

Week	Problems	Goal
Week 1	Fibonacci, climbing stairs, house robber, maximum subarray	Build comfort with 1D DP. Identify the "state" and "transition"
Week 2	Coin change, 0/1 knapsack, target sum, partition equal subset sum	Master the "take or skip" pattern and the backwards iteration trick
Week 3	LCS, edit distance, wildcard matching, distinct subsequences	Master 2D string DP. Practice traceback reconstruction
Week 4	Longest increasing subsequence, Russian doll envelopes, word break	Mix 1D and 2D. Practice the O(n log n) LIS with binary search
Week 5	Buy/sell stock II-IV, matrix chain, burst balloons	State machine and interval DP. The hardest common patterns
Week 6	House robber III, binary tree cameras, unique BST count	Tree DP. Write the DFS-based solutions from scratch

After 6 weeks of daily practice (1-2 problems per day), you will recognize DP patterns instantly. The key is not memorizing solutions but understanding WHY each recurrence has the form it does. If you can derive the recurrence from the problem statement, you can solve any DP problem.

Closing Thoughts

CLRS Chapter 15 is arguably the most interview-relevant chapter in the entire book. Every concept here — optimal substructure, overlapping subproblems, the 4-step recipe, memoization vs tabulation, solution reconstruction — appears in real interviews at every major tech company. The specific problems (rod cutting, matrix chain, LCS, optimal BST) are less important than the TECHNIQUE. Once you internalize the technique, you can solve problems you have never seen before.

As Bellman himself wrote in his autobiography: "I decided to use the word 'programming' to describe my work. I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying... It's impossible to use the word 'dynamic' in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible."

The name stuck. And 70 years later, dynamic programming remains one of the most powerful tools in every programmer's arsenal — from coding interviews to cutting-edge AI research. Master it, and you will see optimization problems differently forever.

The Limits of DP

DP is powerful but not universal. Here is when it does NOT work:

Problem	Why DP fails	What to use instead
Longest simple path	No optimal substructure (vertex-reuse constraint breaks it)	NP-hard; use backtracking with pruning
Graph coloring	No natural subproblem decomposition	NP-hard; use constraint satisfaction
Problems with exponential state space	DP table does not fit in memory (e.g., TSP with n>25)	Branch-and-bound, approximation algorithms
Online problems	Cannot look ahead; subproblems depend on future inputs	Online algorithms, competitive analysis

DP as a Mindset

The real value of studying CLRS Chapter 15 is not memorizing the rod cutting formula or the LCS table. It is learning to think in subproblems. Every time you face an optimization problem, ask yourself:

1. What are the subproblems?

What is the "state" that I need to track? For rod cutting: the remaining length. For knapsack: the item index and remaining capacity. For LCS: the two string positions.

↓

2. What are the choices at each step?

What decision do I make? Cut here or not. Take this item or skip it. Match these characters or skip one.

↓

3. How do the choices connect subproblems?

This IS the recurrence. Each choice transforms the current state into a smaller state. The recurrence says: "the value of this state equals the best choice, where each choice leads to smaller states whose values I already know."

If you can answer these three questions, you can solve any DP problem. The rest is implementation.

Summary: Key Results from This Lesson

Result	Significance
Rod cutting: O(2ⁿ) → O(n²)	Exponential to polynomial via memoization
Matrix chain: O(4ⁿ/n^3/2) → O(n³)	Catalan-number brute force to cubic DP
LCS: O(2^m·2ⁿ) → O(m·n)	Foundation of diff, git merge, DNA alignment
Edit distance: O(3^m) → O(m·n)	Foundation of spell checking, fuzzy search
0/1 Knapsack: O(2ⁿ) → O(n·W)	Pseudo-polynomial; NP-hard but practically efficient
Tree DP: O(2ⁿ) → O(n)	NP-hard on general graphs, linear on trees

Every one of these speedups comes from the same insight: solve each subproblem once, store the result, look it up next time. The table size is the number of unique subproblems. The fill time is the number of subproblems times the work per subproblem. Everything else is details.

The gap between exponential and polynomial is not academic. For n=50, 2ⁿ ≈ 10¹⁵ operations would take a modern computer about 11 days at 10⁹ ops/sec. n² = 2500 operations takes microseconds. For n=100, the exponential version would outlast the heat death of the universe. The DP version finishes before your IDE can refresh. Dynamic programming does not just make things faster — it makes impossible problems trivial.

That is the power of thinking in subproblems. And that is what CLRS Chapter 15 teaches.

Bellman's legacy. Dynamic programming is arguably the single most important algorithmic technique in both theoretical computer science and practical engineering. It underlies everything from compiler optimization (register allocation) to computational biology (sequence alignment) to financial engineering (option pricing via backward induction) to artificial intelligence (reinforcement learning). When you master DP, you have not just learned one algorithm — you have learned a way of thinking about optimization that applies across every domain.

Final concept check: Fractional knapsack (you can take fractions of items) is solvable by a greedy algorithm in O(n log n) time. But 0/1 knapsack requires DP. What specific property does fractional knapsack have that 0/1 knapsack lacks?

The greedy choice property: taking the highest value-per-weight item first is always part of an optimal solution. In fractional knapsack, if the best item does not fit entirely, take a fraction — there is never a case where taking less of the best item would allow a better combination. In 0/1 knapsack, this fails because you MUST take the whole item or nothing. Taking a high-ratio item might waste capacity that a combination of lower-ratio items could fill more efficiently Fractional knapsack has no overlapping subproblems so DP is unnecessary Fractional knapsack is a continuous problem and DP only works on discrete problems

	ε	s	i	t	t	i	n	g
ε	0	1	2	3	4	5	6	7
k	1	1	2	3	4	5	6	7
i	2	2	1	2	3	4	5	6
t	3	3	2	1	2	3	4	5
t	4	4	3	2	1	2	3	4
e	5	5	4	3	2	2	3	4
n	6	6	5	4	3	3	2	3

	ε	s	i	t	t	i	n	g
ε	0	1	2	3	4	5	6	7
k	1	1	2	3	4	5	6	7
i	2	2	1	2	3	4	5	6
t	3	3	2	1	2	3	4	5
t	4	4	3	2	1	2	3	4
e	5	5	4	3	2	2	3	4
n	6	6	5	4	3	3	2	3

	ε	s	i	t	t	i	n	g
ε	0	1	2	3	4	5	6	7
k	1	1	2	3	4	5	6	7
i	2	2	1	2	3	4	5	6
t	3	3	2	1	2	3	4	5
t	4	4	3	2	1	2	3	4
e	5	5	4	3	2	2	3	4
n	6	6	5	4	3	3	2	3