Skiena, Chapter 8

Dynamic Programming

Fibonacci, edit distance, longest increasing subsequence, and the partition problem. How caching subproblem results transforms exponential algorithms into polynomial ones.

Prerequisites: Chapters 1-3 (recursion, arrays). That's it.
10
Chapters
5+
Simulations
10
Quizzes

Chapter 0: The Trick

The most challenging algorithmic problems involve optimization -- finding a solution that maximizes or minimizes some function. Greedy algorithms are fast but often wrong. Exhaustive search is correct but exponentially slow.

Dynamic programming combines the best of both worlds. It systematically searches all possibilities (guaranteeing correctness) while storing results to avoid recomputing (guaranteeing efficiency).

The trick in one sentence: If your recursive algorithm computes the same subproblems over and over, store the answers in a table. Look them up instead of recomputing them. That's it. That's dynamic programming.

Dynamic programming is a technique for efficiently implementing a recursive algorithm by storing partial results. The recipe:

Step 1: Recurrence
Write a correct recursive algorithm. Don't worry about speed yet.
Step 2: Overlapping subproblems?
Does the recursion recompute the same things? If yes, DP helps.
Step 3: Cache or table
Store results in a table indexed by subproblem parameters. Look up before computing.
Step 4: Evaluation order
Fill the table bottom-up so dependencies are always ready when needed.

DP is best suited for optimization problems on objects with an inherent left-to-right order: strings, sequences, rooted trees, polygons.

When does dynamic programming help?

Chapter 1: Fibonacci -- The Classic Example

The Fibonacci numbers are defined by the recurrence F(n) = F(n-1) + F(n-2), with F(0) = 0 and F(1) = 1. The series goes 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...

The naive recursive implementation is elegant but disastrously slow:

python
def fib_r(n):
    if n <= 1: return n
    return fib_r(n - 1) + fib_r(n - 2)

This takes exponential time. Why? The recursion tree branches at every call. F(4) is computed on both sides of F(6). F(2) is computed five times just for F(6). Since F(n+1)/F(n) ≈ 1.618 (the golden ratio), computing F(n) requires at least 1.6n calls.

The recursion tree reveals the waste. To compute F(6) = 8, the naive algorithm makes 25 function calls. To compute F(45), it takes over 7 minutes. You could compute it faster by hand. The tree has exponential nodes because it recomputes the same values over and over.
Fibonacci Recursion Tree

See how many times each F(k) is computed. Warm = first computation, dimmed = redundant recomputation.

Without caching: 25 calls for F(6). Toggle to see the difference.

The picture with caching is dramatic. The recursion tree collapses from 25 nodes to just 11 -- each F(k) is computed exactly once and then looked up. This is the essence of dynamic programming: pay once, look up forever.

Why does the naive recursive Fibonacci take exponential time?

Chapter 2: Caching vs Dynamic Programming

There are two ways to fix the Fibonacci disaster:

Memoization (top-down caching): Keep the recursion but check a cache before computing. If the answer is already stored, return it. Otherwise, compute, store, and return.

python
cache = {}
def fib_c(n):
    if n in cache: return cache[n]
    if n <= 1: cache[n] = n
    else: cache[n] = fib_c(n-1) + fib_c(n-2)
    return cache[n]

True DP (bottom-up): Eliminate recursion entirely. Fill a table from small values to large, in an order that guarantees dependencies are ready.

python
def fib_dp(n):
    f = [0] * (n + 1)
    f[1] = 1
    for i in range(2, n + 1):
        f[i] = f[i-1] + f[i-2]
    return f[n]

Both run in O(n) time. But true DP can be further optimized: since F(n) depends only on the last two values, we can use O(1) space instead of O(n).

ApproachTimeSpaceStyle
Naive recursionO(1.6n)O(n) stackElegant but fatal
MemoizationO(n)O(n)Easy to add to existing recursion
Bottom-up DPO(n)O(n) or O(1)Most efficient, requires understanding evaluation order
Skiena's take-home lesson: Explicit caching of recursive call results provides most of the benefits of dynamic programming, including usually the same running time. If you prefer extra programming to more subtle thinking, memoization is fine. But true DP gives you the chance to optimize space.
What is the difference between memoization and bottom-up dynamic programming?

Chapter 3: Binomial Coefficients

The binomial coefficient C(n, k) counts the ways to choose k items from n. It is defined by the recurrence from Pascal's triangle:

C(n, k) = C(n-1, k-1) + C(n-1, k)

with base cases C(n, 0) = C(n, n) = 1. Why does this work? Consider whether the nth element is in the chosen subset. If yes, pick k-1 more from n-1 items. If no, pick all k from n-1 items. No overlap, all cases covered.

Computing C(n, k) via the factorial formula n!/(k!(n-k)!) is dangerous -- the intermediate factorials easily overflow, even when the final answer fits in an integer. The DP approach fills Pascal's triangle directly:

python
def binomial(n, k):
    C = [[0] * (k + 1) for _ in range(n + 1)]
    for i in range(n + 1):
        C[i][0] = 1
        if i <= k: C[i][i] = 1
    for i in range(2, n + 1):
        for j in range(1, min(i, k + 1)):
            C[i][j] = C[i-1][j-1] + C[i-1][j]
    return C[n][k]
Evaluation order matters. Each cell depends on two cells in the previous row. We fill row by row, left to right. By the time we need C[i-1][j-1] and C[i-1][j], they are already computed. Getting the evaluation order right is the entire art of dynamic programming.
Why is computing C(n, k) via n!/(k!(n-k)!) numerically dangerous?

Chapter 4: Edit Distance -- The Problem

How different are two strings? The edit distance between strings P and T is the minimum number of character operations (insert, delete, substitute) to transform P into T.

OperationExample
Substitution"shot" → "spot" (replace h with p)
Insertion"ago" → "agog" (insert g)
Deletion"hour" → "our" (delete h)

Think about the last characters of each string. What could have happened?

Match/Substitute
Align P[i] with T[j]. Cost 0 if they match, 1 if they don't. Then solve for P[1..i-1] and T[1..j-1].
Insert
T[j] has no match in P. We "insert" it. Cost 1. Then solve for P[1..i] and T[1..j-1].
Delete
P[i] has no match in T. We "delete" it. Cost 1. Then solve for P[1..i-1] and T[1..j].
D[i, j] = min(D[i-1, j-1] + match(i,j), D[i, j-1] + 1, D[i-1, j] + 1)

The recursive version branches three ways at every position. Since most calls only reduce one index, the running time is at least 3n -- exponential. Computing edit distance between two 11-character strings takes several seconds recursively. Anything longer disappears into oblivion.

What are the three possible operations on the last characters when computing edit distance?

Chapter 5: Edit Distance by DP

The edit distance recurrence has only |P| × |T| distinct subproblems (all (i, j) pairs). The recursive version recomputes them exponentially many times. The fix: store results in a 2D table.

python
def edit_distance(s, t):
    m, n = len(s), len(t)
    D = [[0] * (n+1) for _ in range(m+1)]
    for i in range(m+1): D[i][0] = i  # delete all of s
    for j in range(n+1): D[0][j] = j  # insert all of t
    for i in range(1, m+1):
        for j in range(1, n+1):
            match = 0 if s[i-1] == t[j-1] else 1
            D[i][j] = min(
                D[i-1][j-1] + match,  # match/sub
                D[i][j-1] + 1,          # insert
                D[i-1][j] + 1           # delete
            )
    return D[m][n]

This runs in O(|P| × |T|) time and space. Each cell is computed once, and each computation is O(1). The table is filled row by row, left to right -- the same order you'd read a book.

From exponential to polynomial in one step. The recursive version takes O(3n) time. The DP version takes O(nm). For two 1000-character strings, that's 106 vs 10477 operations. DP didn't just speed things up -- it made the impossible possible.

Reconstructing the alignment: Store a parent pointer in each cell recording which of the three choices was optimal. Trace back from D[m][n] to D[0][0] to reconstruct the actual edit operations.

The edit distance DP table has |P| x |T| cells. What does cell D[i][j] represent?

Chapter 6: Longest Increasing Subsequence

Given a sequence of numbers, find the longest subsequence that is strictly increasing. The subsequence doesn't need to be contiguous -- you can skip elements.

For example, in [3, 1, 4, 1, 5, 9, 2, 6], the LIS is [1, 4, 5, 9] or [1, 4, 5, 6] (length 4).

Define L[i] = the length of the longest increasing subsequence ending at position i. Then:

L[i] = 1 + max(L[j]) for all j < i where S[j] < S[i]
python
def lis(S):
    n = len(S)
    L = [1] * n  # every element is an LIS of length 1
    parent = [-1] * n
    for i in range(1, n):
        for j in range(i):
            if S[j] < S[i] and L[j] + 1 > L[i]:
                L[i] = L[j] + 1
                parent[i] = j
    return max(L)

This runs in O(n2) time. There exists a clever O(n log n) algorithm using binary search, but the O(n2) DP version illustrates the technique beautifully.

Tracing the actual subsequence: The parent array records, for each position, which earlier position extends to give the best LIS ending here. To reconstruct the subsequence, find the position with maximum L value, then follow parent pointers backward. This is the same trace-back technique used in edit distance and Floyd-Warshall.

LIS has surprising connections. It appears in patience sorting (a card game), the theory of Young tableaux (combinatorics), and the Erdos-Szekeres theorem (any sequence of length n2 + 1 contains a monotone subsequence of length n + 1).

Connection to edit distance: LIS can be solved as a special case of edit distance! Sort S to get T. The longest common subsequence of S and T must be an increasing subsequence of S (since T is sorted). This reduction gives the same O(n2) complexity.
What does L[i] represent in the LIS dynamic programming formulation?

Chapter 7: The Partition Problem

Given a set of n integers, can you partition them into two subsets with equal sum? This is a classic NP-complete problem, but DP gives a pseudopolynomial algorithm -- polynomial in the value of the sum, not just the number of elements.

Let S be the total sum. We want to know: is there a subset summing to S/2? Define P[i][j] = true if we can make sum j using a subset of the first i elements.

P[i][j] = P[i-1][j] or P[i-1][j - ai]

Either we exclude element ai (can we make sum j without it?) or include it (can we make sum j - ai without it?).

python
def can_partition(arr):
    total = sum(arr)
    if total % 2 != 0: return False
    target = total // 2
    dp = [False] * (target + 1)
    dp[0] = True
    for num in arr:
        for j in range(target, num - 1, -1):
            dp[j] = dp[j] or dp[j - num]
    return dp[target]
Pseudopolynomial: The algorithm runs in O(n · S) time, where S is the sum. This is polynomial in S but not in the number of bits needed to represent S. If S is exponentially large in n, this is slow. But for reasonable values, it's very practical. This is why the partition problem is "weakly" NP-complete -- hard in theory, often tractable in practice.
The partition problem DP runs in O(n x S) time. Why is this called "pseudopolynomial"?

Chapter 8: The DP Table Explorer

Watch the edit distance DP table being filled in cell by cell. See how each cell depends on its three neighbors (top, left, diagonal) and how the optimal alignment is traced back through parent pointers.

Edit Distance Table

Type two strings and click "Compute" to watch the DP table fill. Arrows show the optimal alignment path.

Enter two strings and click Compute
What to notice: Each cell looks at three neighbors: diagonally above-left (match/substitute), directly left (insert), directly above (delete). The minimum of these three determines the cell's value. The highlighted path from bottom-right to top-left traces the optimal alignment.
In the edit distance table, cell D[i][j] depends on which cells?

Chapter 9: Connections

Dynamic programming is arguably the most broadly useful algorithm design technique. Here is how it connects forward and backward:

ConceptWhere It Leads
Overlapping subproblemsThe key insight distinguishing DP from divide-and-conquer (which has no overlap)
Edit distanceDNA sequence alignment, spell-checking, diff utilities
Partition problemChapter 9: Pseudopolynomial algorithms for weakly NP-complete problems
Floyd-WarshallChapter 6: DP on graphs for all-pairs shortest paths
DAG shortest pathsChapter 5: DP on DAGs follows topological order
TSP via DPO(n2 2n) -- much better than n!, but still exponential for this NP-hard problem
Skiena's take-home lessons from Chapter 8:
• Dynamic programming is a technique for efficiently implementing recursive algorithms by storing partial results. Start with a correct recursion, then speed it up.
• Caching (memoization) is the easiest way to get DP benefits. Store results indexed by the recursive call parameters.
• The trick is recognizing overlapping subproblems. If different recursive calls compute the same thing, DP helps. If every call is unique (quicksort, DFS), caching is useless.
• DP is generally the right method for optimization problems on left-to-right objects: strings, sequences, polygons, rooted trees.
• Understanding the evaluation order is essential: fill the table so that every cell's dependencies are already computed.
When does caching NOT help a recursive algorithm?