Rod cutting, LCS, edit distance, knapsack — the algorithm design paradigm that dominates coding interviews.
You run a steel company. A customer delivers a rod of length n inches, and you need to cut it into shorter pieces to sell. Different lengths fetch different prices. A 1-inch piece sells for $1, a 2-inch piece for $5, a 3-inch piece for $8, and so on. Your job: figure out where to cut to maximize total revenue.
Sounds simple. A rod of length 4 can be cut in many ways: leave it whole (4 inches), cut into 1+3, cut into 2+2, cut into 1+1+2, or cut into 1+1+1+1. Each way produces different revenue. You want the best one.
Here is the catch. A rod of length n has n-1 possible cut positions. At each position, you either cut or you do not. That gives 2n-1 possible ways to cut the rod. For n=20, that is over half a million. For n=40, it is over a trillion. Brute force is hopeless.
But look more carefully. Suppose the optimal solution for a rod of length 7 begins with a cut at position 3. Then the remaining piece of length 4 must ALSO be cut optimally. Why? Proof by contradiction: if you could cut the 4-inch remainder better, you would replace the suboptimal cuts with the better ones and get a higher total revenue for the full 7-inch rod — contradicting the assumption that the original solution was optimal.
This argument works for any problem where "locally optimal sub-solutions" compose into "globally optimal solutions." It has a name, and it is the foundation of everything in this lesson.
Let us write the most natural solution. To find the maximum revenue r(n) for a rod of length n, try every possible first cut of length i (from 1 to n). For each cut, collect the price pi for that piece and recursively solve the remaining rod of length n-i. Take the maximum over all choices.
Base case: r(0) = 0 (a zero-length rod earns nothing).
This formula is correct but devastatingly slow. Let us count the recursive calls. r(n) calls r(n-1), r(n-2), ..., r(0). Each of those makes their own recursive calls. The total satisfies:
Exponential! But how many UNIQUE subproblems are there? Only n+1: r(0), r(1), ..., r(n). The recursion solves the same subproblems over and over. The simulation below makes this painfully visible.
Adjust rod length and watch the exponential explosion. Nodes in teal are computed for the first time. Nodes in red are redundant recomputations. Count the waste.
For n=4, the naive recursion makes 8 calls but only deals with 4 unique subproblems (r(0), r(1), r(2), r(3)). For n=5 it makes 16 calls for 5 unique subproblems. For n=7, 64 calls for 7 unique ones. The total calls double with each increment of n: 2n-1. The unique subproblems grow linearly: n.
This is the gap that dynamic programming exploits. Instead of solving r(2) over and over, solve it once, store the answer, and look it up for free every subsequent time. That is the entire idea.
Before we solve rod cutting with DP, let us see the idea on the simplest possible example. The Fibonacci sequence: F(0)=0, F(1)=1, F(n) = F(n-1) + F(n-2).
Fibonacci is not an optimization problem (there is no "best" Fibonacci number), so it does not have "optimal substructure" in the formal sense. But it beautifully illustrates overlapping subproblems. Rod cutting adds the optimization dimension: instead of adding, we maximize. The principle is the same.
To appreciate how dramatic the DP speedup is, let us compare the work for increasing n:
| Rod length n | Naive calls (2n-1) | DP comparisons (n(n+1)/2) | Speedup |
|---|---|---|---|
| 5 | 16 | 15 | ~1x |
| 10 | 512 | 55 | ~9x |
| 20 | 524,288 | 210 | ~2,500x |
| 30 | 536,870,912 | 465 | ~1,155,000x |
| 40 | ~5.5 × 1011 | 820 | ~670,000,000x |
| 100 | ~6.3 × 1029 | 5,050 | ~1.2 × 1026x |
At n=100, the naive approach would need more operations than there are atoms in a human body. The DP does it in 5,050 comparisons. This is not a theoretical curiosity — it is the difference between "computable" and "physically impossible."
Let us make this concrete. For the rod cutting problem:
| Ingredient | Rod Cutting | How to verify |
|---|---|---|
| Optimal substructure | If the best rod-7 solution cuts at position 3, the remaining rod-4 must be cut optimally | Cut-and-paste argument: swap a suboptimal sub-solution with the optimal one to improve the whole |
| Overlapping subproblems | r(2) is needed when computing r(3), r(4), r(5), r(6), and r(7) | Draw the recursion tree and look for repeated nodes |
The recursion tree in Chapter 0 showed the disease: redundant recomputation. Now here is the cure — and it is embarrassingly simple.
Keep the exact same recursive structure. Change nothing about the logic. Just add a table. Before computing r(k), check: "Have I already computed this?" If yes, return the stored answer immediately. If no, compute it, store it in the table, and return it. Done.
This technique is called memoization (not memorization — no 'r'). The word comes from "memo," as in writing yourself a note so you do not forget.
We will use this price table throughout the lesson. It comes directly from CLRS and is the standard reference example:
| Length i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Price pi | 1 | 5 | 8 | 9 | 10 | 17 | 17 | 20 | 24 | 30 |
Notice something interesting: p3=8 is less than 2·p2=10. Two 2-inch pieces ($5+$5=$10) are worth more than one 3-inch piece ($8). This means the "do not cut" option is NOT always optimal. You need to consider all possibilities — which is exactly what the DP does.
Also notice that the prices are not proportional to length. The price-per-inch varies wildly:
| Length | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| $/inch | 1.00 | 2.50 | 2.67 | 2.25 | 2.00 | 2.83 | 2.43 | 2.50 | 2.67 | 3.00 |
If prices grew linearly with length (constant $/inch), the optimal solution would always be "do not cut." The non-linear pricing is what makes the problem interesting and what makes DP necessary. A greedy "always cut at the best $/inch" strategy does not work because it does not account for how the remaining pieces can be cut.
We want r(4). The recurrence says try all first cuts i=1,2,3,4:
We need r(3), r(2), r(1), r(0). With memoization, we compute each exactly once. Let us trace the calls in order. The recursive function is called with k=4. It calls r(3) first.
Optimal revenue: $10. Achieved by cutting into two pieces of length 2 ($5 + $5 = $10). The whole rod (length 4) only sells for $9. Two halves beat the whole.
Count the work: we computed r(0), r(1), r(2), r(3), r(4) — five subproblems. For r(k), we tried k values of i. Total comparisons: 0+1+2+3+4 = 10. The naive recursion would have made 23 = 8 calls, many redundant. For larger n, the savings are astronomical.
The simulation below shows this visually. On the left: the full naive recursion tree (every call executed). On the right: the memoized version where pruned (cached) calls are dimmed gray.
Left: naive (all calls executed). Right: memoized (gray nodes = cache hit, never recomputed). Watch the pruning increase as n grows.
python def memoized_rod_cut(p, n): """Top-down memoized rod cutting. p: list of prices, p[i] = price for length i+1 (0-indexed) n: rod length Returns: maximum revenue""" memo = [-1] * (n + 1) memo[0] = 0 # base case: zero-length rod = $0 def helper(k): if memo[k] >= 0: return memo[k] # cache hit — O(1) lookup best = 0 for i in range(1, k + 1): best = max(best, p[i - 1] + helper(k - i)) memo[k] = best # store for future lookups return best return helper(n) # CLRS price table prices = [1, 5, 8, 9, 10, 17, 17, 20, 24, 30] print(memoized_rod_cut(prices, 4)) # 10 (cut: 2+2) print(memoized_rod_cut(prices, 7)) # 18 (cut: 1+6 or 2+2+3) print(memoized_rod_cut(prices, 10)) # 30 (don't cut: p[10]=30)
Time complexity: O(n2). There are n+1 subproblems. Each subproblem r(k) iterates over k choices. Total: 0+1+2+...+n = n(n+1)/2 = O(n2).
Space complexity: O(n) for the memo table + O(n) for the recursion call stack (maximum depth is n when the recursive chain is r(n)→r(n-1)→...→r(0)).
sys.setrecursionlimit, but each recursive call adds ~50 bytes of stack frame overhead. For n=100000, that is 5 MB of stack. Bottom-up tabulation (next chapter) avoids this entirely — zero recursion, zero stack overhead.Python has a built-in memoization decorator that makes top-down DP trivially easy:
python from functools import lru_cache prices = [0, 1, 5, 8, 9, 10, 17, 17, 20, 24, 30] @lru_cache(maxsize=None) def rod_cut(n): if n == 0: return 0 return max(prices[i] + rod_cut(n - i) for i in range(1, n + 1)) print(rod_cut(7)) # 18 print(rod_cut.cache_info()) # CacheInfo(hits=21, misses=8, maxsize=None, currsize=8)
@lru_cache(maxsize=None) creates an unlimited-size dictionary cache. The cache_info() method shows cache hit statistics — useful for verifying your DP is actually avoiding recomputation. For interview whiteboard coding, always write the manual memo array version. But for production code and competitive programming, @lru_cache is the standard Python idiom.
Some problems have more than one state variable. For example, the 0/1 knapsack has two: item index and remaining capacity. You can memoize with a 2D array or a tuple key:
python @lru_cache(maxsize=None) def knapsack_memo(i, w): """Top-down memoized 0/1 knapsack. i: consider items 0..i w: remaining capacity""" if i < 0 or w <= 0: return 0 wi, vi = items[i] # Skip item i skip = knapsack_memo(i - 1, w) # Take item i (if it fits) take = vi + knapsack_memo(i - 1, w - wi) if wi <= w else 0 return max(skip, take) items = [(2,3), (3,4), (4,5), (5,6)] print(knapsack_memo(len(items)-1, 8)) # 10
The cache key is the tuple (i, w). There are n·W unique keys, so the cache has at most n·W entries. This exactly matches the bottom-up table size.
Memoization starts at the top (the problem you want to solve) and works downward, caching along the way. Bottom-up tabulation flips the order: start from the smallest subproblems and build upward. By the time you need r(k), every subproblem r(0), r(1), ..., r(k-1) is already sitting in the table, ready to use.
No recursion. No function call overhead. No stack depth worries. Just a clean, simple for-loop filling a table from left to right.
Think of it this way. Memoization is like exploring a maze and leaving breadcrumbs so you never revisit a dead end. Bottom-up tabulation is like having a map of the entire maze before you start — you fill in the solution for every room, beginning with the ones closest to the exit.
Prices: p = [1, 5, 8, 9, 10, 17, 17, 20, 24, 30]. We build r[0..7] and s[1..7].
Solution reconstruction: r[7]=18, s[7]=1 means the first cut is at length 1. Remaining length: 7-1=6. s[6]=6 means do not cut the 6-inch piece. Solution: 1+6 = $1+$17 = $18.
But the 4-way tie at j=7 means there are multiple optimal solutions: 1+6, 2+2+3, 3+2+2, 6+1. All yield $18. The s-table just records the first one found.
Watch the DP table fill left to right. Each cell shows its value being computed from previously filled cells. Green highlight = currently computing. Click Step to advance one comparison, or Play for continuous animation.
python def bottom_up_rod_cut(p, n): """Bottom-up rod cutting with solution reconstruction. p: list of prices (0-indexed: p[0] = price for length 1) n: rod length Returns: (max_revenue, list_of_cut_lengths)""" r = [0] * (n + 1) # r[j] = optimal revenue for rod of length j s = [0] * (n + 1) # s[j] = first piece length in optimal solution for j for j in range(1, n + 1): best = -1 for i in range(1, j + 1): if p[i - 1] + r[j - i] > best: best = p[i - 1] + r[j - i] s[j] = i r[j] = best # Reconstruct the actual cuts cuts = [] remaining = n while remaining > 0: cuts.append(s[remaining]) remaining -= s[remaining] return r[n], cuts prices = [1, 5, 8, 9, 10, 17, 17, 20, 24, 30] revenue, cuts = bottom_up_rod_cut(prices, 7) print(f"Revenue: ${revenue}, Cuts: {cuts}") # Revenue: $18, Cuts: [1, 6] revenue, cuts = bottom_up_rod_cut(prices, 10) print(f"Revenue: ${revenue}, Cuts: {cuts}") # Revenue: $30, Cuts: [10] (don't cut!)
The s-table (the cut table) is crucial. Without it, you know the OPTIMAL VALUE but not the OPTIMAL SOLUTION. Here is how reconstruction works for rod cutting:
python # After bottom-up computation, s[j] = length of first piece in optimal solution for rod of length j def reconstruct_cuts(s, n): """Follow the s-table to recover all cuts.""" cuts = [] remaining = n while remaining > 0: first_piece = s[remaining] cuts.append(first_piece) remaining -= first_piece # Now solve the remaining subproblem return cuts # For our example with n=7: # s = [0, 1, 2, 3, 2, 2, 6, 1] # reconstruct_cuts(s, 7): # remaining=7, s[7]=1, cuts=[1], remaining=6 # remaining=6, s[6]=6, cuts=[1,6], remaining=0 # → [1, 6] → revenue = $1 + $17 = $18 ✓
This pattern — maintain a "choice" table during the forward pass, trace it backwards during reconstruction — applies to EVERY DP problem. For knapsack: record which items were taken. For LCS: record which characters matched. For edit distance: record which operation was chosen. The forward pass computes values. The backward pass recovers decisions.
For rod cutting, the full table is already O(n) — there is nothing to optimize. But for 2D problems like knapsack or LCS, the table is O(m×n) and we often only need the previous row. The "rolling row" trick keeps two rows instead of m+1, reducing space dramatically.
The trade-off: with only two rows, you lose the ability to trace back (you no longer have the full table). If you need BOTH the optimal value AND the optimal solution in reduced space, you need Hirschberg's technique — run the DP forward from the start and backward from the end, use the meeting point to divide the problem in half, and recurse. This gives O(min(m,n)) space with O(m·n) time for full reconstruction.
How do you determine the correct fill order for a new DP problem? The rule is simple: every value you READ must have been WRITTEN already.
| Problem type | Dependencies | Fill order |
|---|---|---|
| 1D (rod cutting) | dp[j] depends on dp[0..j-1] | Left to right (j=1,2,...,n) |
| 2D sequence (LCS) | dp[i][j] depends on dp[i-1][j-1], dp[i-1][j], dp[i][j-1] | Row by row, left to right |
| 2D knapsack | dp[i][w] depends on dp[i-1][w] and dp[i-1][w-wi] | Row by row (any direction within row) |
| Interval (matrix chain) | dp[i][j] depends on dp[i][k] and dp[k+1][j] for i≤k<j | Diagonally (shortest intervals first) |
| 1D backward (sell stock) | dp[i] depends on dp[i+1] | Right to left |
| Tree DP | Parent depends on children | Post-order (leaves first, root last) |
If you are unsure, draw the dependency arrows for a small example. The fill order is any topological order of those arrows. For most problems, the natural left-to-right / top-to-bottom / small-to-large order works.
You have seen rod cutting solved two ways. Now let us extract the general method. Every dynamic programming solution follows the same four steps. Master this recipe and you can attack any DP problem — even ones you have never seen.
Not every optimization problem is a DP problem. Two conditions must both hold. The simulation below helps you classify any new problem.
| Condition | What it means | How to verify | Counterexample |
|---|---|---|---|
| Optimal substructure | Optimal solution contains optimal sub-solutions | Cut-and-paste proof by contradiction | Longest simple path (not optimal substructure because of vertex-reuse constraint) |
| Overlapping subproblems | Same subproblems recur in the recursion tree | Draw the recursion tree — count repeated nodes | Merge sort (subproblems always fresh — no overlaps) |
Consider this graph with 4 vertices and edges: A-B, B-C, C-D, A-C, B-D.
The problem is that subproblems are NOT independent. The "simple" constraint couples them: using a vertex in one subpath removes it from others. In DP, subproblems must be independent — the solution to one cannot affect the feasibility of another. Shortest paths DO have this property (subpaths of shortest paths are shortest), but longest simple paths do not.
For interviews, you should be able to prove optimal substructure for any DP problem. The template is always the same. Let us demonstrate with LCS:
The structure is always: (1) assume the whole solution is optimal, (2) suppose a sub-solution is NOT optimal, (3) replace it with the optimal sub-solution, (4) show this improves the whole solution, (5) contradiction. This works for rod cutting (replace non-optimal remainder), knapsack (replace non-optimal sub-selection), edit distance (replace non-optimal sub-alignment), and every other DP problem.
A decision flowchart for classifying optimization problems.
| Paradigm | Optimal substructure? | Overlapping subproblems? | How it works | Examples |
|---|---|---|---|---|
| Dynamic Programming | Yes | Yes | Solve each subproblem once, cache results | Rod cutting, LCS, knapsack |
| Greedy | Yes | Irrelevant | Make the locally best choice, never revisit | Activity selection, Huffman, Dijkstra |
| Divide-and-Conquer | Yes | No | Split, recurse on independent subproblems, combine | Merge sort, Strassen's, closest pair |
Think of it as a hierarchy. Greedy is the simplest: one choice per step, no table. Divide-and-conquer adds recursion on multiple subproblems. DP adds caching to handle overlap. If greedy works, use it (faster, simpler). If subproblems are independent, use divide-and-conquer. If subproblems overlap, you need DP.
| Property | Top-Down (Memoization) | Bottom-Up (Tabulation) |
|---|---|---|
| Order of computation | Recursive — solves subproblems on demand | Iterative — you choose the fill order |
| Subproblems solved | Only those reachable from the original problem | ALL subproblems in the table |
| Overhead | Function calls, stack frames | None beyond the loop |
| Space optimization | Harder (random access pattern) | Easier (known access pattern allows rolling rows) |
| Debugging | Harder (call stack is implicit) | Easier (print the table, verify each cell) |
| Implementation | Add a cache to the recursive solution | Rewrite as nested loops |
| When to prefer | Sparse subproblem spaces, tree-shaped dependencies | Dense spaces, when you need space optimization |
Multiplying two matrices A (p×q) and B (q×r) costs p·q·r scalar multiplications. The result is a p×r matrix. If you have a chain of matrices A1·A2·A3·...·An, the order you parenthesize them does not change the result (matrix multiplication is associative) — but it can change the cost by orders of magnitude.
Three matrices: A1 is 10×30, A2 is 30×5, A3 is 5×60.
Same final matrix, but option 1 is 6× cheaper. The difference grows with more matrices. For a chain of n matrices, the number of possible parenthesizations is the (n-1)th Catalan number, which grows as Ω(4n/n3/2). Brute-forcing all parenthesizations is not feasible.
Suppose the optimal parenthesization of A1..An splits at position k: (A1..Ak)·(Ak+1..An). Then the parenthesization of A1..Ak within this solution MUST be optimal for A1..Ak alone. Why? Cut-and-paste: if it were not optimal, replace it with the optimal one and the whole solution improves — contradiction.
Store matrix dimensions in an array p[0..n] where matrix Ai has dimensions p[i-1]×p[i]. Define m[i,j] = minimum number of scalar multiplications to compute Ai·Ai+1·...·Aj.
For each possible split point k, the cost is: (cost of left half) + (cost of right half) + (cost of multiplying the two results). We minimize over all choices of k.
Unlike rod cutting (which fills a 1D array left-to-right), matrix chain fills a 2D table diagonally. We compute chains of increasing length: length 1 (base cases on the main diagonal), then length 2, then length 3, and so on. Each entry m[i,j] depends on entries m[i,k] and m[k+1,j] for k in [i,j) — all of which are shorter chains, hence already computed.
A1(30×35), A2(35×15), A3(15×5), A4(5×10). Dimensions array: p = [30, 35, 15, 5, 10].
Optimal parenthesization: split at k=3 means ((A1..A3) · A4). For A1..A3, split at k=1 means (A1 · (A2·A3)). Full: ((A1·(A2·A3))·A4), cost 9375.
Enter matrix dimensions separated by commas (e.g., "30,35,15,5,10" for 4 matrices). Watch the table fill diagonal by diagonal. Green = currently computing. Teal = completed.
python def matrix_chain_order(p): """Find optimal parenthesization for matrix chain. p: list of dimensions [p0, p1, ..., pn] where Ai is p[i-1] x p[i] Returns: (m, s) — cost table and split table""" n = len(p) - 1 # number of matrices m = [[0] * (n + 1) for _ in range(n + 1)] s = [[0] * (n + 1) for _ in range(n + 1)] for L in range(2, n + 1): # L = chain length for i in range(1, n - L + 2): j = i + L - 1 m[i][j] = float('inf') for k in range(i, j): # try all split points cost = m[i][k] + m[k+1][j] + p[i-1]*p[k]*p[j] if cost < m[i][j]: m[i][j] = cost s[i][j] = k return m, s def print_parens(s, i, j): """Reconstruct optimal parenthesization.""" if i == j: return f"A{i}" k = s[i][j] left = print_parens(s, i, k) right = print_parens(s, k + 1, j) return f"({left} x {right})" p = [30, 35, 15, 5, 10] m, s = matrix_chain_order(p) print(f"Min cost: {m[1][4]}") # 9375 print(print_parens(s, 1, 4)) # ((A1 x (A2 x A3)) x A4)
Time complexity: O(n3). Three nested loops — chain length L (n iterations), start index i (up to n), split point k (up to n). Space: O(n2) for the two n×n tables.
The triple loop in matrix chain is the most confusing part for beginners. Let us break it down with concrete numbers for n=4 matrices:
The key insight: L controls which diagonal of the table we fill. Within each diagonal, i controls which cell. Within each cell, k controls which split we try. Every split references entries from shorter diagonals (already filled). This is why the order works.
Matrix chain has many disguises in interviews. One common one: "Given n piles of stones with sizes a[1]..a[n], merge adjacent piles. The cost of merging two adjacent piles is their combined size. Find the minimum total cost to merge all piles into one."
This is EXACTLY matrix chain with a different cost function. The subproblem m[i,j] = minimum cost to merge piles i through j. The split point k divides into (merge i..k) + (merge k+1..j) + (cost of combining the two result piles = sum of a[i] through a[j]). Same O(n3) DP.
Given two strings, find the longest subsequence common to both. A subsequence is formed by deleting zero or more characters from a string WITHOUT changing the order of the remaining characters. It is NOT the same as a substring (which must be contiguous characters).
Example: X = "ABCBDAB", Y = "BDCAB". One common subsequence is "BCAB" (length 4). Another is "BDA" (length 3). The longest common subsequence (LCS) has length 4. Finding it by brute force means checking all 2m subsequences of X against all 2n subsequences of Y — exponential in both dimensions.
LCS is everywhere in real software:
| Application | What are X and Y? | What does LCS tell you? |
|---|---|---|
| diff/patch | Lines of two file versions | Unchanged lines (the LCS); changed lines are everything else |
| Git merge | Two branch versions of a file | Common base for three-way merge |
| DNA alignment | Two gene sequences (A, C, G, T) | Conserved regions across species |
| Spell checking | Mistyped word vs dictionary word | Similarity measure (related to edit distance) |
Let X = x1x2...xm and Y = y1y2...yn. Let Z = z1z2...zk be any LCS of X and Y. Three cases:
In every case, the LCS of the full strings contains the LCS of shorter strings. Optimal substructure holds.
Define c[i,j] = length of the LCS of X[1..i] and Y[1..j].
Let us fill the entire 8×6 table:
| ε | B | D | C | A | B | |
|---|---|---|---|---|---|---|
| ε | 0 | 0 | 0 | 0 | 0 | 0 |
| A | 0 | 0 | 0 | 0 | 1 | 1 |
| B | 0 | 1 | 1 | 1 | 1 | 2 |
| C | 0 | 1 | 1 | 2 | 2 | 2 |
| B | 0 | 1 | 1 | 2 | 2 | 3 |
| D | 0 | 1 | 2 | 2 | 2 | 3 |
| A | 0 | 1 | 2 | 2 | 3 | 3 |
| B | 0 | 1 | 2 | 2 | 3 | 4 |
LCS length = c[7,5] = 4. Traceback from c[7,5]: x7=B=y5 (match, go diagonal). c[6,4]: x6=A=y4 (match, go diagonal). c[5,3]: x5=D≠C, c[4,3]=2≥c[5,2]=1, go up. c[4,3]: x4=B≠C, c[3,3]=2≥c[4,2]=1, go up. c[3,3]: x3=C=y3 (match, go diagonal). c[2,2]: x2=B≠D, c[1,2]=0<c[2,1]=1, go left. c[2,1]: x2=B=y1 (match, go diagonal). c[1,0]=0, stop.
Matches collected (in reverse): B, A, C, B → LCS = "BCAB".
Type two strings and watch the DP table fill. Orange cells = match (diagonal arrow). The traceback path reconstructs the LCS.
python def lcs(X, Y): """Compute LCS length and string with traceback.""" m, n = len(X), len(Y) c = [[0] * (n + 1) for _ in range(m + 1)] # Fill table for i in range(1, m + 1): for j in range(1, n + 1): if X[i-1] == Y[j-1]: c[i][j] = c[i-1][j-1] + 1 # match else: c[i][j] = max(c[i-1][j], c[i][j-1]) # Traceback to reconstruct the LCS result = [] i, j = m, n while i > 0 and j > 0: if X[i-1] == Y[j-1]: result.append(X[i-1]) i -= 1; j -= 1 # go diagonal elif c[i-1][j] >= c[i][j-1]: i -= 1 # go up else: j -= 1 # go left return c[m][n], ''.join(reversed(result)) length, subseq = lcs("ABCBDAB", "BDCAB") print(f"LCS length: {length}, LCS: {subseq}") # LCS length: 4, LCS: BCAB
Time: O(m·n). Space: O(m·n) for the full table. Can be reduced to O(min(m,n)) if you only need the LENGTH (keep only two rows), but then you lose the ability to reconstruct the actual LCS. For traceback in O(min(m,n)) space, use Hirschberg's algorithm — a divide-and-conquer trick that runs the LCS forward and backward to find the midpoint of the LCS in linear space.
The Unix diff utility compares two files line by line. It treats each LINE as a single "character" and computes the LCS of the two line sequences. Lines in the LCS are "unchanged." Lines not in the LCS of the old file are "deleted." Lines not in the LCS of the new file are "inserted."
Real diff implementations use the Myers diff algorithm, which finds the shortest edit script in O((m+n)·D) time where D is the edit distance. For similar files (small D), this is much faster than O(m·n). But the core idea is still LCS.
python def lcs_length_only(X, Y): """O(min(m,n)) space LCS — returns length only, no traceback.""" if len(X) < len(Y): X, Y = Y, X # iterate over shorter string for space savings n = len(Y) prev = [0] * (n + 1) curr = [0] * (n + 1) for i in range(1, len(X) + 1): for j in range(1, n + 1): if X[i-1] == Y[j-1]: curr[j] = prev[j-1] + 1 else: curr[j] = max(prev[j], curr[j-1]) prev, curr = curr, [0] * (n + 1) return prev[n] print(lcs_length_only("ABCBDAB", "BDCAB")) # 4
The LCS is not always unique. "ABCBDAB" and "BDCAB" have multiple LCSs of length 4: "BCAB" and "BDAB." When ties occur in the max(c[i-1][j], c[i][j-1]) step, both branches lead to valid LCSs. To enumerate all of them:
python def all_lcs(X, Y): """Find ALL longest common subsequences.""" m, n = len(X), len(Y) c = [[0] * (n + 1) for _ in range(m + 1)] for i in range(1, m + 1): for j in range(1, n + 1): if X[i-1] == Y[j-1]: c[i][j] = c[i-1][j-1] + 1 else: c[i][j] = max(c[i-1][j], c[i][j-1]) def backtrack(i, j): if i == 0 or j == 0: return {""} if X[i-1] == Y[j-1]: return {s + X[i-1] for s in backtrack(i-1, j-1)} results = set() if c[i-1][j] >= c[i][j-1]: results |= backtrack(i-1, j) if c[i][j-1] >= c[i-1][j]: results |= backtrack(i, j-1) return results return backtrack(m, n) print(all_lcs("ABCBDAB", "BDCAB")) # {'BCAB', 'BDAB'}
The number of distinct LCSs can be exponential in the worst case. This backtracking is only practical for small inputs or when you know the number of solutions is manageable.
| Variant | Modification | Complexity |
|---|---|---|
| Longest Common Substring | Characters must be CONTIGUOUS. Reset to 0 on mismatch instead of taking max(up, left) | O(m·n) but can use suffix arrays for O(m+n) |
| Shortest Common Supersequence | Shortest string containing both X and Y as subsequences. Length = m + n - LCS_length | O(m·n) |
| Number of distinct subsequences | Count how many ways to form Y as a subsequence of X. dp[i][j] = dp[i-1][j] + (dp[i-1][j-1] if match) | O(m·n) |
| Edit distance | As seen in Chapter 7. Distance = m + n - 2·LCS_length | O(m·n) |
Time to put everything together. This is the big interactive simulation — a multi-problem DP solver that lets you watch three classic algorithms work step by step, with full control over speed and parameters.
Select a problem, configure parameters, and watch the DP table fill cell by cell. Step advances one cell. Play animates continuously. Speed slider controls animation rate.
Given n items, each with weight wi and value vi, and a knapsack with capacity W. Maximize total value without exceeding W. Each item is included whole or not at all — no fractions (hence "0/1").
Subproblem: dp[i][w] = maximum value achievable using items 1..i with knapsack capacity w.
In words: for each item, choose the better of (1) skipping it (keep dp[i-1][w] — best value without this item at this capacity) or (2) taking it (its value vi plus the best value for the remaining capacity w-wi without this item, which is dp[i-1][w-wi]).
Time: O(n·W). Space: O(n·W), reducible to O(W) with a single-row optimization. To do the single-row trick: iterate w from W DOWN to wi (backwards!) so you do not accidentally use the current row's values (which would allow taking the same item twice).
Items: (weight=2, value=3), (weight=3, value=4), (weight=4, value=5), (weight=5, value=6). Knapsack capacity W=8.
Answer: maximum value = $10. Achieved by taking items 2 (w=3,v=4) and 4 (w=5,v=6), total weight=8=W exactly.
python def knapsack_01(items, W): """0/1 Knapsack with solution reconstruction. items: list of (weight, value) tuples W: knapsack capacity Returns: (max_value, selected_items)""" n = len(items) dp = [[0] * (W + 1) for _ in range(n + 1)] for i in range(1, n + 1): wi, vi = items[i - 1] for w in range(W + 1): if wi <= w: dp[i][w] = max(dp[i-1][w], vi + dp[i-1][w-wi]) else: dp[i][w] = dp[i-1][w] # Reconstruct: trace back to find which items were taken selected = [] w = W for i in range(n, 0, -1): if dp[i][w] != dp[i-1][w]: # item i was taken selected.append(i) w -= items[i-1][0] return dp[n][W], selected[::-1] items = [(2,3), (3,4), (4,5), (5,6)] val, sel = knapsack_01(items, 8) print(f"Max value: {val}, Items: {sel}") # Max value: 10, Items: [2, 4]
Since dp[i] only depends on dp[i-1], we can keep just two rows (or even one row with a backwards loop):
python def knapsack_01_optimized(items, W): """O(W) space knapsack — single row, backwards iteration.""" dp = [0] * (W + 1) for wi, vi in items: # CRITICAL: iterate backwards to avoid using current-row values for w in range(W, wi - 1, -1): dp[w] = max(dp[w], vi + dp[w - wi]) return dp[W] print(knapsack_01_optimized([(2,3), (3,4), (4,5), (5,6)], 8)) # 10
Given coin denominations d1, d2, ..., dk and a target amount A, find the minimum number of coins needed. Unlimited supply of each denomination.
For each amount a from 1 to A, try subtracting each coin denomination. The one that leads to the fewest remaining coins wins. If no coin fits, dp[a] = ∞ (impossible).
Time: O(A·k) where k is the number of denominations. Space: O(A).
Coins: [1, 3, 4]. Target amount: 6.
Minimum coins for amount 6 = 2 (using two 3-cent coins). Notice the greedy approach (always use the largest coin first) would give {4,1,1} = 3 coins — greedy fails! This is why coin change needs DP.
python def coin_change(coins, amount): """Minimum number of coins to make amount.""" dp = [float('inf')] * (amount + 1) dp[0] = 0 parent = [-1] * (amount + 1) # for reconstruction for a in range(1, amount + 1): for c in coins: if c <= a and dp[a - c] + 1 < dp[a]: dp[a] = dp[a - c] + 1 parent[a] = c # record which coin was used # Reconstruct the coins used if dp[amount] == float('inf'): return -1, [] used = [] a = amount while a > 0: used.append(parent[a]) a -= parent[a] return dp[amount], used count, coins_used = coin_change([1, 3, 4], 6) print(f"Min coins: {count}, Used: {coins_used}") # Min coins: 2, Used: [3, 3]
Given an array of numbers, find the longest strictly increasing subsequence. For [10, 9, 2, 5, 3, 7, 101, 18], the LIS is [2, 3, 7, 101] or [2, 5, 7, 101] — both have length 4.
For each element i, find the longest increasing subsequence ENDING at i by checking all previous elements j where a[j] < a[i]. The overall LIS is max(dp[0], dp[1], ..., dp[n-1]).
Array: [10, 9, 2, 5, 3, 7, 101, 18].
python def lis_dp(arr): """O(n^2) LIS with reconstruction.""" n = len(arr) dp = [1] * n parent = [-1] * n for i in range(1, n): for j in range(i): if arr[j] < arr[i] and dp[j] + 1 > dp[i]: dp[i] = dp[j] + 1 parent[i] = j # Find the index with max dp value best_idx = dp.index(max(dp)) # Trace back result = [] idx = best_idx while idx != -1: result.append(arr[idx]) idx = parent[idx] return max(dp), result[::-1] length, seq = lis_dp([10, 9, 2, 5, 3, 7, 101, 18]) print(f"LIS length: {length}, LIS: {seq}") # LIS length: 4, LIS: [2, 5, 7, 101]
Time: O(n2) for the DP approach. Can be improved to O(n log n) with patience sorting — maintain a "tails" array where tails[k] is the smallest tail element of any increasing subsequence of length k+1. For each new element, binary search for its position in the tails array. This is the preferred interview solution for large inputs.
python import bisect def lis_binary(arr): """O(n log n) LIS using patience sorting.""" tails = [] for x in arr: pos = bisect.bisect_left(tails, x) if pos == len(tails): tails.append(x) # extend the longest sequence else: tails[pos] = x # improve an existing sequence return len(tails) print(lis_binary([10, 9, 2, 5, 3, 7, 101, 18])) # 4
Here is a step-by-step template that works for virtually every DP interview question:
Given a set of positive integers, is there a subset that sums to a target T? This is the boolean version of knapsack and appears constantly as a sub-problem in harder DP problems.
python def subset_sum(nums, target): """Can a subset of nums sum to target?""" dp = [False] * (target + 1) dp[0] = True for num in nums: # Iterate backwards (0/1 knapsack pattern) for s in range(target, num - 1, -1): dp[s] = dp[s] or dp[s - num] return dp[target] print(subset_sum([3, 34, 4, 12, 5, 2], 9)) # True (4+5 or 3+4+2) print(subset_sum([3, 34, 4, 12, 5, 2], 30)) # False
Subset sum is used as a subroutine in "Partition Equal Subset Sum" (can you split an array into two subsets with equal sum? — just check if a subset sums to total/2) and "Target Sum" (assign + or - to each number to hit a target).
The classic climbing stairs (1 or 2 steps) is just Fibonacci. But with variable step sizes [1, 3, 5], it becomes coin change without the "minimize" — just count the ways:
python def climb_stairs(n, steps): """Number of ways to climb n stairs with given step sizes.""" dp = [0] * (n + 1) dp[0] = 1 # one way to stay at ground: do nothing for i in range(1, n + 1): for s in steps: if i >= s: dp[i] += dp[i - s] return dp[n] print(climb_stairs(4, [1, 2])) # 5 (Fibonacci!) print(climb_stairs(10, [1, 3, 5])) # 47
If your DP gives wrong answers during an interview, here is the 30-second diagnostic checklist:
| Check | Common bug | Fix |
|---|---|---|
| Base cases | dp[0] = 0 when it should be 1, or vice versa | Trace the smallest input by hand |
| Loop bounds | Off-by-one: range(n) vs range(n+1) | Verify first and last iterations match the recurrence |
| Fill direction | Reading dp[j] before it is computed | Print the table for a small input and check each cell |
| State definition | Ambiguous "what does dp[i] mean?" | Write the EXACT English meaning of dp[i] as a comment |
| Reconstruction | Using dp table for reconstruction instead of separate choice table | Keep a separate table recording WHICH choice led to each optimal value |
How many single-character operations does it take to transform one string into another? This question has a name — the edit distance (also called Levenshtein distance) — and it is one of the most practical DP problems in all of computer science. Your spell checker uses it. Your diff tool uses it. DNA alignment tools use a weighted version of it. Even fuzzy search engines like Elasticsearch use it under the hood.
The three allowed operations, each costing 1:
| Operation | What it does | Example |
|---|---|---|
| Insert | Add a character at any position | "cat" → "cart" |
| Delete | Remove a character from any position | "cart" → "cat" |
| Replace | Change one character to another | "cat" → "cut" |
Transform "kitten" into "sitting":
But how do we know 3 is optimal? Maybe there is a way to do it in 2 operations? The DP table proves no shorter sequence of operations exists.
Define d[i,j] = edit distance between X[1..i] and Y[1..j].
The [xi ≠ yj] notation means: cost 0 if the characters match (no operation needed), cost 1 if they differ (replace). A match is essentially a "free diagonal step."
| ε | s | i | t | t | i | n | g | |
|---|---|---|---|---|---|---|---|---|
| ε | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| k | 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| i | 2 | 2 | 1 | 2 | 3 | 4 | 5 | 6 |
| t | 3 | 3 | 2 | 1 | 2 | 3 | 4 | 5 |
| t | 4 | 4 | 3 | 2 | 1 | 2 | 3 | 4 |
| e | 5 | 5 | 4 | 3 | 2 | 2 | 3 | 4 |
| n | 6 | 6 | 5 | 4 | 3 | 3 | 2 | 3 |
d[6,7] = 3. Three operations needed. Let us verify this is optimal by considering alternatives:
Let us trace back through the table to find the exact operations:
The key operations are: replace 'k'→'s', match i,t,t, replace 'e'→'i', match n, insert 'g'. Total non-free operations: 3.
Type two strings and watch the edit distance table fill. Color coding: green = match (free), orange = replace, blue = insert/delete. The optimal alignment path is highlighted.
| Application | What are the "strings"? | Cost model |
|---|---|---|
| Spell checking | Typed word vs dictionary entries | Uniform costs; suggest all words within distance ≤ 2 |
| DNA alignment | Gene sequences (A,C,G,T) | Weighted: transitions (A↔G) cheaper than transversions (A↔C) |
| diff/patch | Lines of code as "characters" | Lines are units; LCS gives unchanged lines, rest are edits |
| Fuzzy search | User query vs database records | Threshold: return records within distance k |
| OCR correction | Recognized text vs dictionary | Weighted by visual similarity: 'O'↔'0' is cheaper than 'O'↔'Z' |
| Plagiarism detection | Document sentences | Sentence-level alignment; low distance = suspicious |
python def edit_distance(X, Y): """Compute Levenshtein distance between two strings.""" m, n = len(X), len(Y) d = [[0] * (n + 1) for _ in range(m + 1)] # Base cases: transforming to/from empty string for i in range(m + 1): d[i][0] = i for j in range(n + 1): d[0][j] = j for i in range(1, m + 1): for j in range(1, n + 1): cost = 0 if X[i-1] == Y[j-1] else 1 d[i][j] = min( d[i-1][j] + 1, # delete x_i d[i][j-1] + 1, # insert y_j d[i-1][j-1] + cost # replace (or free match) ) return d[m][n] print(edit_distance("kitten", "sitting")) # 3 print(edit_distance("sunday", "saturday")) # 3 print(edit_distance("horse", "ros")) # 3
Time: O(m·n). Space: O(m·n), reducible to O(min(m,n)) since each row only depends on the previous row.
The standard Levenshtein distance treats all operations as cost 1. But in practice, costs are often weighted:
| Variant | Modification | Use case |
|---|---|---|
| Damerau-Levenshtein | Add "transpose" operation (swap two adjacent characters, cost 1) | Spell checking (typos often transpose: "teh" → "the") |
| Weighted edit distance | Different costs for different operations/characters | DNA alignment (transition mutations A↔G cost less than transversions A↔C) |
| Hamming distance | Only substitutions allowed (strings must be same length) | Error-correcting codes, binary string comparison |
| Jaro-Winkler | Different formula emphasizing matching characters and their order | Record linkage, deduplication (names, addresses) |
The DP approach handles all weighted variants — just change the costs in the min() computation. The structure of the table and traceback remain identical.
Knowing the distance is useful, but often you need the ACTUAL sequence of operations. Here is the complete code with operation reconstruction:
python def edit_distance_with_ops(X, Y): """Edit distance with full operation traceback.""" m, n = len(X), len(Y) d = [[0] * (n + 1) for _ in range(m + 1)] ops = [[''] * (n + 1) for _ in range(m + 1)] for i in range(m + 1): d[i][0] = i ops[i][0] = 'D' # delete for j in range(n + 1): d[0][j] = j ops[0][j] = 'I' # insert ops[0][0] = '' for i in range(1, m + 1): for j in range(1, n + 1): if X[i-1] == Y[j-1]: d[i][j] = d[i-1][j-1] ops[i][j] = 'M' # match (free) else: choices = [ (d[i-1][j] + 1, 'D'), # delete x_i (d[i][j-1] + 1, 'I'), # insert y_j (d[i-1][j-1] + 1, 'R') # replace ] d[i][j], ops[i][j] = min(choices) # Trace back to find operations result = [] i, j = m, n while i > 0 or j > 0: op = ops[i][j] if op == 'M': result.append(f"match '{X[i-1]}'") i -= 1; j -= 1 elif op == 'R': result.append(f"replace '{X[i-1]}' with '{Y[j-1]}'") i -= 1; j -= 1 elif op == 'D': result.append(f"delete '{X[i-1]}'") i -= 1 elif op == 'I': result.append(f"insert '{Y[j-1]}'") j -= 1 return d[m][n], result[::-1] dist, operations = edit_distance_with_ops("kitten", "sitting") print(f"Distance: {dist}") for op in operations: print(f" {op}") # Distance: 3 # replace 'k' with 's' # match 'i' # match 't' # match 't' # replace 'e' with 'i' # match 'n' # insert 'g'
This is a popular LeetCode problem (LC #72). Let us trace it completely:
| ε | r | o | s | |
|---|---|---|---|---|
| ε | 0 | 1 | 2 | 3 |
| h | 1 | 1 | 2 | 3 |
| o | 2 | 2 | 1 | 2 |
| r | 3 | 2 | 2 | 2 |
| s | 4 | 3 | 3 | 2 |
| e | 5 | 4 | 4 | 3 |
d[5,3] = 3. Operations: replace 'h'→'r', match 'o', delete 'r', match 's', delete 'e'. But wait, a simpler reading: delete 'h', match 'o'... let us trace properly from d[5,3]=3. d[5,3] came from d[4,2]+1=3+... Actually the min of d[4,3]+1=3, d[5,2]+1=5, d[4,2]+[e≠s]=3+1=4. So d[5,3]=3 via delete. d[4,3]=2 via d[3,2]+[s=s]=2+0=2, match. d[3,2]=2 via d[2,2]+1=2 (delete r), or d[2,1]+1=3, or d[2,1]+[r≠o]=... d[3,2] came from d[2,1]+[r≠o]=2+1=3, d[3,1]+1=3, d[2,2]+1=2. Via delete. d[2,2]=1 via d[1,1]+[o=o? no, o≠r]=1+1=2, or d[1,2]+1=3, or d[2,1]+1=3. Wait, d[2,2]: x2='o', y2='o', match! d[1,1]+0=1. So d[2,2]=1 via match. d[1,1]=1 via d[0,0]+[h≠r]=0+1=1, replace. Operations: replace h→r, match o, delete r, match s, delete e. 3 operations confirmed.
Since each row d[i] only depends on d[i-1], we can use two rows instead of m+1 rows, reducing space from O(m·n) to O(n). This is critical when comparing long sequences (e.g., DNA with millions of base pairs):
python def edit_distance_space_opt(X, Y): """O(min(m,n)) space edit distance.""" if len(X) < len(Y): X, Y = Y, X # ensure Y is shorter (iterate over shorter dimension) m, n = len(X), len(Y) prev = list(range(n + 1)) # previous row curr = [0] * (n + 1) for i in range(1, m + 1): curr[0] = i for j in range(1, n + 1): cost = 0 if X[i-1] == Y[j-1] else 1 curr[j] = min(prev[j] + 1, curr[j-1] + 1, prev[j-1] + cost) prev, curr = curr, prev # swap rows return prev[n] print(edit_distance_space_opt("kitten", "sitting")) # 3
A closely related problem: given a string s and a pattern p with '?' (matches any single character) and '*' (matches any sequence including empty), determine if s matches p. This is string DP disguised as pattern matching.
python def wildcard_match(s, p): """Wildcard pattern matching: ? = any char, * = any sequence.""" m, n = len(s), len(p) dp = [[False] * (n + 1) for _ in range(m + 1)] dp[0][0] = True # Handle leading *s in pattern for j in range(1, n + 1): if p[j-1] == '*': dp[0][j] = dp[0][j-1] for i in range(1, m + 1): for j in range(1, n + 1): if p[j-1] == '*': dp[i][j] = dp[i][j-1] or dp[i-1][j] elif p[j-1] == '?' or p[j-1] == s[i-1]: dp[i][j] = dp[i-1][j-1] return dp[m][n] print(wildcard_match("adceb", "*a*b")) # True print(wildcard_match("acdcb", "a*c?b")) # False
The '*' case is the tricky one. dp[i][j-1] means '*' matches empty (skip it). dp[i-1][j] means '*' matches s[i] and potentially more characters (the '*' stays active). This is elegant because it naturally handles '*' matching any number of characters through the chain of dp[i-1][j] ← dp[i-2][j] ← ... lookups.
So far, every DP problem has used a 1D array or a 2D table. But dynamic programming is far more general than that. It works on ANY structure whose dependency graph is a DAG (directed acyclic graph). Trees are the most natural example: the value at a node depends only on the values of its children.
You are building a binary search tree for a fixed set of keys k1 < k2 < ... < kn. Each key has a known search probability pi (how often it is searched for). You want the BST that minimizes the expected search cost: the sum of (depth + 1) × pi for each key.
If all keys are equally likely, a balanced BST is optimal. But when probabilities are skewed, you want frequently-accessed keys near the root, even at the cost of pushing rare keys deeper. This is exactly how Huffman coding works for prefix codes, but for BSTs the constraint is different: the search-tree property must hold.
The subproblem is identical in structure to matrix-chain multiplication. Define e[i,j] = minimum expected search cost for a BST containing keys ki..kj. Try each key kr (i ≤ r ≤ j) as the root.
Consider keys {A, B, C} with search probabilities pA=0.7, pB=0.2, pC=0.1. A balanced BST puts B at the root (depth 1), A left (depth 2), C right (depth 2). Expected cost = 0.7×2 + 0.2×1 + 0.1×2 = 1.8.
But putting A at the root (depth 1), B right (depth 2), C right-right (depth 3): cost = 0.7×1 + 0.2×2 + 0.1×3 = 1.4. Much better! The most frequently accessed key should be near the root, even at the cost of making the tree unbalanced.
Finding the optimal arrangement is the optimal BST problem:
where w(i,j) = pi + pi+1 + ... + pj is the sum of frequencies in the range. The w(i,j) term arises because every node in the subtree drops one level when placed under root r, adding its probability to the cost. Time: O(n3), Space: O(n2).
This is the classic "company party" problem. A company has a tree-shaped hierarchy (each employee has exactly one direct boss, except the CEO at the root). Each employee has a "fun rating." You want to invite a subset of employees to a party that maximizes the total fun, subject to one rule: no employee can attend if their direct boss also attends.
This is an independent set problem on a tree (NP-hard on general graphs, but polynomial on trees thanks to DP!).
For each node v, define two values:
Process bottom-up via post-order traversal. Leaves first, then their parents, up to the root. For leaf nodes: include[leaf] = fun[leaf], exclude[leaf] = 0. The answer for the whole company is max(include[root], exclude[root]).
python class TreeNode: def __init__(self, val, children=None): self.val = val self.children = children or [] def max_independent_set(root): """Maximum weight independent set on a tree. Returns: (max_value, set_of_selected_nodes)""" def dp(node): """Returns (include_value, exclude_value)""" if not node.children: return node.val, 0 # leaf: include=val, exclude=0 incl = node.val excl = 0 for child in node.children: c_incl, c_excl = dp(child) incl += c_excl # if we include node, must exclude children excl += max(c_incl, c_excl) # if we exclude node, children are free return incl, excl incl, excl = dp(root) return max(incl, excl) # Build the example tree from the diagram tree = TreeNode(8, [ TreeNode(3, [TreeNode(7), TreeNode(2, [TreeNode(5)])]), TreeNode(4, [TreeNode(6), TreeNode(1)]) ]) print(max_independent_set(tree)) # 27
The time complexity of this DFS is O(n) — each node is visited exactly once. The space is O(n) for the recursion stack (O(h) where h is tree height, worst case O(n) for a skewed tree).
This same pattern — define include[v] and exclude[v] for each node, compute via post-order DFS — solves many tree DP problems:
| Problem | include[v] = | exclude[v] = |
|---|---|---|
| Max independent set | val[v] + sum(exclude[child]) | sum(max(incl, excl) for child) |
| House Robber III (LeetCode) | Same as above (binary tree version) | Same as above |
| Min vertex cover | 1 + sum(min(incl, excl) for child) | sum(include[child]) — all children MUST be covered |
| Min dominating set | More complex: 3 states (dominated-included, dominated-excluded, not-dominated) | Requires careful case analysis |
A tree where each node has a "fun value." Watch the post-order traversal compute include/exclude values. Orange nodes = selected for the party. I=include value, E=exclude value.
The longest path problem is NP-hard on general graphs (it subsumes the Hamiltonian path problem). But on a DAG, it is solvable in O(V+E) time using DP over a topological sort.
Process vertices in topological order. For each vertex v, look at all incoming edges and take the maximum. This is fundamentally DP: the "table" is indexed by vertices, the "fill order" is topological order, and each vertex depends only on its predecessors.
| Problem | Nodes (subproblems) | Edges (dependencies) | DAG shape |
|---|---|---|---|
| Rod cutting | r(0), r(1), ..., r(n) | r(j) depends on r(0) through r(j-1) | Linear chain with extra back-edges — O(n2) edges total |
| LCS | c[i][j] for 0≤i≤m, 0≤j≤n | Each cell depends on up to 3 neighbors | Grid DAG with diagonal, up, and left edges |
| Matrix chain | m[i][j] for 1≤i≤j≤n | m[i][j] depends on O(n) entries on same row/column | Upper triangle of a grid, filled diagonally |
| Fibonacci | F(0), F(1), ..., F(n) | F(k) depends on F(k-1) and F(k-2) | Linear chain — simplest possible DAG |
| Tree party | incl[v], excl[v] for each node v | Parent depends on children | The tree itself is the DAG |
This DAG perspective explains WHY the DP works: we are simply computing values in topological order on the dependency graph. It also explains WHY circular dependencies are forbidden: a cycle in the DAG would mean a subproblem depends on itself, which is undefined.
Consider a DAG with vertices A, B, C, D, E and weighted edges A→B(3), A→C(6), B→D(4), B→E(1), C→D(2), C→E(7), D→E(5). The longest path from A to E:
This is exactly the same as any other DP: define state (vertex), write recurrence (max over incoming edges), fill in topological order. The only difference is the "table" is indexed by graph vertices instead of integers or pairs.
Consider this tree (node values in parentheses):
Post-order traversal visits: 7, 5, 2, 3, 6, 1, 4, 8.
Time complexity of tree DP: O(n) where n is the number of nodes. Each node is visited exactly once in post-order, and the work at each node is proportional to its number of children. The sum of children counts over all nodes is n-1 (a tree with n nodes has n-1 edges). So total work is O(n).
This is your reference sheet. Every classic DP problem, categorized by structure, with recurrence, complexity, and interview framing. Print this. Memorize it. Use it.
| Category | Problems | Table shape | Key pattern |
|---|---|---|---|
| 1D Linear | Rod cutting, Coin change, LIS, House robber, Climbing stairs, Fibonacci, Decode ways, Word break | dp[i] | Each state depends on a fixed number of previous states |
| 2D Grid/Sequence | LCS, Edit distance, 0/1 Knapsack, Unique paths, Minimum path sum, Regex matching, Interleaving strings | dp[i][j] | Two indices (two sequences, or items × capacity) |
| Interval | Matrix chain, Optimal BST, Palindrome partitioning, Burst balloons, Strange printer | dp[i][j] diagonal fill | Subproblem = contiguous range, try all split points |
| Tree/DAG | Tree diameter, Max independent set, House robber III, Binary tree cameras | Values per node | Post-order traversal, include/exclude at each node |
| State machine | Buy/sell stock (I through IV), String transformations | dp[i][state] | At each step, transition between K finite states |
| Bitmask | TSP, Task assignment, Hamiltonian path, Set partition | dp[mask][i] | Track visited subset as bitmask, O(2n·n) |
| Problem | Recurrence | Time | Space | Key trick |
|---|---|---|---|---|
| Rod cutting | r[n]=max{p[i]+r[n-i]} | O(n2) | O(n) | 1D, try all first-cut lengths |
| Coin change | dp[a]=min{dp[a-d]+1} | O(Ak) | O(A) | 1D, unbounded supply |
| 0/1 Knapsack | max(skip, take) | O(nW) | O(nW)→O(W) | Backwards iteration for 1-row |
| LCS | match:diag+1, else:max(up,left) | O(mn) | O(mn)→O(min) | Hirschberg for linear space |
| Edit distance | min(del, ins, rep) | O(mn) | O(mn)→O(min) | d = m+n-2L (relates to LCS) |
| Matrix chain | min over split k | O(n3) | O(n2) | Diagonal fill order |
| LIS | dp[i]=1+max{dp[j]:a[j]<a[i]} | O(n2)→O(n log n) | O(n) | Patience sort + binary search |
| Optimal BST | min over root r + freq sum | O(n3)→O(n2) | O(n2) | Knuth's optimization |
| House robber | max(dp[i-1], dp[i-2]+v[i]) | O(n) | O(1) | Only need prev two values |
| Climbing stairs | dp[n]=dp[n-1]+dp[n-2] | O(n) | O(1) | Literally Fibonacci |
"Design an autocomplete with fuzzy matching."
Use edit distance with a BK-tree (ball tree for metric spaces). Index all dictionary words. Given a query, traverse the tree, pruning branches where the triangle inequality guarantees distance exceeds threshold. Reduces O(n) distance computations to O(n0.6) average. For real-time autocomplete, precompute all words within distance 1 and 2 of common prefixes.
"Design a recommendation engine using sequence alignment."
Model user behavior as a sequence of actions (viewed product A, searched for B, purchased C). Compute an LCS-variant similarity between users: users with similar action subsequences get similar recommendations. Use the O(min(m,n)) space optimization since you only need the similarity score, not the full alignment.
This family of problems is a favorite at FAANG interviews. The simplest version: given an array of stock prices, find the maximum profit from at most one buy-sell transaction.
For "at most k transactions," add a transaction counter dimension: dp[i][k][state]. For "with cooldown," add a "just_sold" state that prevents buying the next day. The state machine pattern handles all variants.
python def max_profit(prices): """Best Time to Buy and Sell Stock (single transaction).""" min_price = float('inf') max_profit = 0 for price in prices: min_price = min(min_price, price) max_profit = max(max_profit, price - min_price) return max_profit def max_profit_k(prices, k): """Best Time to Buy and Sell Stock with at most k transactions.""" n = len(prices) if k >= n // 2: # unlimited transactions return sum(max(0, prices[i+1]-prices[i]) for i in range(n-1)) dp = [[0] * n for _ in range(k + 1)] for t in range(1, k + 1): max_prev = -prices[0] for j in range(1, n): dp[t][j] = max(dp[t][j-1], prices[j] + max_prev) max_prev = max(max_prev, dp[t-1][j] - prices[j]) return dp[k][n-1] print(max_profit([7,1,5,3,6,4])) # 5 (buy at 1, sell at 6) print(max_profit_k([3,2,6,5,0,3], 2)) # 7 (buy 2 sell 6, buy 0 sell 3)
A row of houses, each with a value. Rob houses to maximize total value, but you cannot rob two adjacent houses (alarm will trigger). This is the simplest interesting DP problem and a great warm-up:
Skip house i (take dp[i-1]) or rob house i (take dp[i-2] + value[i], since we skip i-1). Only need the last two values, so O(1) space.
python def rob(nums): """House Robber — O(n) time, O(1) space.""" prev2, prev1 = 0, 0 for x in nums: prev2, prev1 = prev1, max(prev1, prev2 + x) return prev1 print(rob([2,7,9,3,1])) # 12 (rob houses 1,3: 7+9=... wait, 2+9+1=12)
Given a string s and a dictionary of words, determine if s can be segmented into space-separated dictionary words. "leetcode" with dictionary ["leet", "code"] returns True.
python def word_break(s, wordDict): """Can string s be segmented into dictionary words?""" words = set(wordDict) n = len(s) dp = [False] * (n + 1) dp[0] = True # empty string is valid for i in range(1, n + 1): for j in range(i): if dp[j] and s[j:i] in words: dp[i] = True break # found one valid segmentation, enough return dp[n] print(word_break("leetcode", ["leet", "code"])) # True print(word_break("applepenapple", ["apple", "pen"])) # True print(word_break("catsandog", ["cats","dog","sand","and","cat"])) # False
Time: O(n2) if word lookups are O(1) (hash set). Space: O(n). This is a 1D DP where dp[i] depends on all dp[j] for j < i — similar to LIS but with a string condition instead of a numerical comparison.
A robot on an m×n grid starts at top-left and can only move right or down. How many unique paths to the bottom-right? This is the simplest 2D DP and a great warm-up problem.
python def unique_paths(m, n): dp = [[1] * n for _ in range(m)] for i in range(1, m): for j in range(1, n): dp[i][j] = dp[i-1][j] + dp[i][j-1] return dp[m-1][n-1] print(unique_paths(3, 7)) # 28 print(unique_paths(3, 3)) # 6
Fun fact: the answer is the binomial coefficient C(m+n-2, m-1), computable in O(min(m,n)) without any DP. But the DP approach generalizes to grids with obstacles (set dp[i][j]=0 for obstacle cells), which the closed-form cannot handle.
Same grid, but each cell has a cost. Find the path from top-left to bottom-right that minimizes the total cost. Same recurrence, just replace addition with min:
python def min_path_sum(grid): """Minimum cost path from top-left to bottom-right.""" m, n = len(grid), len(grid[0]) dp = [[0] * n for _ in range(m)] dp[0][0] = grid[0][0] for j in range(1, n): dp[0][j] = dp[0][j-1] + grid[0][j] for i in range(1, m): dp[i][0] = dp[i-1][0] + grid[i][0] for i in range(1, m): for j in range(1, n): dp[i][j] = grid[i][j] + min(dp[i-1][j], dp[i][j-1]) return dp[m-1][n-1] print(min_path_sum([[1,3,1],[1,5,1],[4,2,1]])) # 7 (1→3→1→1→1)
This is one of the cleanest interview problems because the 2D DP is so intuitive. Common follow-up: "What if you can move up, down, left, and right?" Then it is no longer DP (cycles are possible) — you need Dijkstra's algorithm.
Many interview problems do not say "use DP." You have to recognize the pattern. Here are the tells:
| Tell | Why it suggests DP | Example problems |
|---|---|---|
| "Minimum/maximum cost to..." | Optimization = DP or greedy; if greedy fails, DP | Minimum path sum, minimum coins |
| "Number of ways to..." | Counting problems are DP (base case: 1 way to do nothing) | Climbing stairs, unique paths, decode ways |
| "Is it possible to..." | Often boolean DP: dp[i] = True if possible | Word break, partition equal subset sum |
| "Given two strings..." | Almost always 2D string DP | LCS, edit distance, regex matching |
| "Can you break this into subproblems?" | Interviewer is literally telling you to use DP | Any problem where they hint at subproblems |
Dynamic programming is not an isolated technique. It sits at the center of a web of algorithmic ideas. Understanding where DP ends and other paradigms begin makes you a fundamentally better problem solver.
| Paradigm | Optimal substructure? | Overlapping subproblems? | When to use | CLRS chapter |
|---|---|---|---|---|
| Dynamic Programming | Yes | Yes | Optimization with repeated sub-work | Ch 15 (this lesson) |
| Greedy | Yes | Irrelevant | When local optimality ⇒ global optimality | Ch 16 |
| Divide-and-Conquer | Yes | No | Independent subproblems (merge sort, FFT) | Ch 4 |
| Backtracking | Not required | Maybe | Constraint satisfaction, N-queens, Sudoku | — |
| Branch-and-Bound | Yes | Yes | When DP table is too large; use bounds to prune | — |
| Linear Programming | Continuous relaxation | N/A | Optimization over continuous variables with linear constraints | Ch 29 |
If you are studying ML (which, given this site, you probably are), DP is everywhere:
| ML Algorithm | DP Connection | Subproblem Structure |
|---|---|---|
| Viterbi (HMM decoding) | Longest path in a trellis DAG | Best path ending at state s at time t |
| CTC decoding | DP on a modified trellis | Best alignment score at position t with label l |
| Beam search | Pruned DP on a sequence DAG | Top-k partial sequences at each time step |
| Value iteration (RL) | IS bottom-up DP on the MDP state space | V(s) = max_a{R(s,a) + gamma * V(s')} |
| Smith-Waterman (bio) | Local alignment = edit distance variant | Best local alignment ending at (i,j) |
| CKY parsing (NLP) | Interval DP on parse trees | Best parse for words i..j under grammar rule |
CLRS Chapter 16: Greedy Algorithms. When DP is overkill. Activity selection, Huffman coding, and fractional knapsack are all solvable with a single greedy pass because the "greedy choice property" holds — you never need to revisit a decision. The contrast with 0/1 knapsack (which requires DP because greedy fails) is the most important dichotomy in algorithm design.
CLRS Chapter 4: Divide-and-Conquer. Merge sort and Strassen's matrix multiplication have optimal substructure but NO overlapping subproblems. The subproblems at each recursion level are always fresh partitions of the input. Memoization would not help because no subproblem is ever repeated. This is why divide-and-conquer leads to recurrence relations (solved by the master theorem), while DP leads to tables.
Graph algorithms. Shortest-path algorithms (Dijkstra, Bellman-Ford, Floyd-Warshall) are all DP in disguise.
Floyd-Warshall computes all-pairs shortest paths in O(V3) time. The DP formulation:
The subproblem d(k)[i][j] = shortest path from i to j using only vertices {1, 2, ..., k} as intermediates. At each step, we decide: does the shortest path from i to j go through vertex k (use it as a waypoint) or not?
This is structurally identical to 0/1 knapsack! Instead of "take or skip item k," it is "route through vertex k or not." The three nested loops (k, i, j) fill a 3D table, optimized to 2D because d(k) only depends on d(k-1). Note the k loop MUST be the outermost loop — this is a common interview trick question.
Why k must be outermost: d[i][j] at stage k needs d[i][k] and d[k][j] from stage k-1. If k were an inner loop, you would mix stages, using some values from the current k and others from a previous k. The outermost position ensures all d values are consistent within each stage.
python def floyd_warshall(graph): """All-pairs shortest paths. graph[i][j] = edge weight (inf if no edge).""" n = len(graph) d = [row[:] for row in graph] # copy for k in range(n): for i in range(n): for j in range(n): if d[i][k] + d[k][j] < d[i][j]: d[i][j] = d[i][k] + d[k][j] return d
Bellman-Ford computes single-source shortest paths, even with negative edge weights. The DP interpretation: d(k)[v] = shortest path from source s to v using at most k edges.
Run for k=1 to V-1 (since a shortest path in a graph with V vertices uses at most V-1 edges). This is bottom-up DP where the "subproblem size" is the number of edges allowed.
Here is how all the problems in this lesson connect to each other and to the broader algorithm landscape:
In competitive programming, DP problems are categorized by difficulty:
| Difficulty | What you need | Example |
|---|---|---|
| Easy (Div 2 A-B) | Recognize Fibonacci/climbing stairs pattern | Number of ways to reach step n |
| Medium (Div 2 C-D) | 2D table, correct recurrence, space optimization | Knapsack, LCS, edit distance |
| Hard (Div 2 E, Div 1 C) | Bitmask DP, Digit DP, DP with data structures | TSP, count numbers with certain digit properties |
| Expert (Div 1 D-E) | DP optimization (Knuth, D&C, convex hull trick) | O(n3) → O(n2) via Knuth's optimization |
For coding interviews, medium is the ceiling. You will almost never see bitmask DP or DP with advanced optimizations. Master the basics: 1D, 2D, and interval DP.
DP is one of those topics where reading about it is necessary but not sufficient. You need to solve problems. Here is a recommended progression:
| Week | Problems | Goal |
|---|---|---|
| Week 1 | Fibonacci, climbing stairs, house robber, maximum subarray | Build comfort with 1D DP. Identify the "state" and "transition" |
| Week 2 | Coin change, 0/1 knapsack, target sum, partition equal subset sum | Master the "take or skip" pattern and the backwards iteration trick |
| Week 3 | LCS, edit distance, wildcard matching, distinct subsequences | Master 2D string DP. Practice traceback reconstruction |
| Week 4 | Longest increasing subsequence, Russian doll envelopes, word break | Mix 1D and 2D. Practice the O(n log n) LIS with binary search |
| Week 5 | Buy/sell stock II-IV, matrix chain, burst balloons | State machine and interval DP. The hardest common patterns |
| Week 6 | House robber III, binary tree cameras, unique BST count | Tree DP. Write the DFS-based solutions from scratch |
After 6 weeks of daily practice (1-2 problems per day), you will recognize DP patterns instantly. The key is not memorizing solutions but understanding WHY each recurrence has the form it does. If you can derive the recurrence from the problem statement, you can solve any DP problem.
CLRS Chapter 15 is arguably the most interview-relevant chapter in the entire book. Every concept here — optimal substructure, overlapping subproblems, the 4-step recipe, memoization vs tabulation, solution reconstruction — appears in real interviews at every major tech company. The specific problems (rod cutting, matrix chain, LCS, optimal BST) are less important than the TECHNIQUE. Once you internalize the technique, you can solve problems you have never seen before.
As Bellman himself wrote in his autobiography: "I decided to use the word 'programming' to describe my work. I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying... It's impossible to use the word 'dynamic' in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible."
The name stuck. And 70 years later, dynamic programming remains one of the most powerful tools in every programmer's arsenal — from coding interviews to cutting-edge AI research. Master it, and you will see optimization problems differently forever.
DP is powerful but not universal. Here is when it does NOT work:
| Problem | Why DP fails | What to use instead |
|---|---|---|
| Longest simple path | No optimal substructure (vertex-reuse constraint breaks it) | NP-hard; use backtracking with pruning |
| Graph coloring | No natural subproblem decomposition | NP-hard; use constraint satisfaction |
| Problems with exponential state space | DP table does not fit in memory (e.g., TSP with n>25) | Branch-and-bound, approximation algorithms |
| Online problems | Cannot look ahead; subproblems depend on future inputs | Online algorithms, competitive analysis |
The real value of studying CLRS Chapter 15 is not memorizing the rod cutting formula or the LCS table. It is learning to think in subproblems. Every time you face an optimization problem, ask yourself:
If you can answer these three questions, you can solve any DP problem. The rest is implementation.
| Result | Significance |
|---|---|
| Rod cutting: O(2n) → O(n2) | Exponential to polynomial via memoization |
| Matrix chain: O(4n/n3/2) → O(n3) | Catalan-number brute force to cubic DP |
| LCS: O(2m·2n) → O(m·n) | Foundation of diff, git merge, DNA alignment |
| Edit distance: O(3m) → O(m·n) | Foundation of spell checking, fuzzy search |
| 0/1 Knapsack: O(2n) → O(n·W) | Pseudo-polynomial; NP-hard but practically efficient |
| Tree DP: O(2n) → O(n) | NP-hard on general graphs, linear on trees |
Every one of these speedups comes from the same insight: solve each subproblem once, store the result, look it up next time. The table size is the number of unique subproblems. The fill time is the number of subproblems times the work per subproblem. Everything else is details.
The gap between exponential and polynomial is not academic. For n=50, 2n ≈ 1015 operations would take a modern computer about 11 days at 109 ops/sec. n2 = 2500 operations takes microseconds. For n=100, the exponential version would outlast the heat death of the universe. The DP version finishes before your IDE can refresh. Dynamic programming does not just make things faster — it makes impossible problems trivial.
That is the power of thinking in subproblems. And that is what CLRS Chapter 15 teaches.