Dynamic Programming — Skiena, Chapter 8

Chapter 0: The Trick

The most challenging algorithmic problems involve optimization -- finding a solution that maximizes or minimizes some function. Greedy algorithms are fast but often wrong. Exhaustive search is correct but exponentially slow.

Dynamic programming combines the best of both worlds. It systematically searches all possibilities (guaranteeing correctness) while storing results to avoid recomputing (guaranteeing efficiency).

The trick in one sentence: If your recursive algorithm computes the same subproblems over and over, store the answers in a table. Look them up instead of recomputing them. That's it. That's dynamic programming.

Dynamic programming is a technique for efficiently implementing a recursive algorithm by storing partial results. The recipe:

Step 1: Recurrence

Write a correct recursive algorithm. Don't worry about speed yet.

↓

Step 2: Overlapping subproblems?

Does the recursion recompute the same things? If yes, DP helps.

↓

Step 3: Cache or table

Store results in a table indexed by subproblem parameters. Look up before computing.

↓

Step 4: Evaluation order

Fill the table bottom-up so dependencies are always ready when needed.

DP is best suited for optimization problems on objects with an inherent left-to-right order: strings, sequences, rooted trees, polygons.

When does dynamic programming help?

When a recursive algorithm solves the same subproblems repeatedly -- storing results avoids redundant computation Always Only for sorting problems

Chapter 1: Fibonacci -- The Classic Example

The Fibonacci numbers are defined by the recurrence F(n) = F(n-1) + F(n-2), with F(0) = 0 and F(1) = 1. The series goes 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...

The naive recursive implementation is elegant but disastrously slow:

python
def fib_r(n):
    if n <= 1: return n
    return fib_r(n - 1) + fib_r(n - 2)

This takes exponential time. Why? The recursion tree branches at every call. F(4) is computed on both sides of F(6). F(2) is computed five times just for F(6). Since F(n+1)/F(n) ≈ 1.618 (the golden ratio), computing F(n) requires at least 1.6ⁿ calls.

The recursion tree reveals the waste. To compute F(6) = 8, the naive algorithm makes 25 function calls. To compute F(45), it takes over 7 minutes. You could compute it faster by hand. The tree has exponential nodes because it recomputes the same values over and over.

Fibonacci Recursion Tree

See how many times each F(k) is computed. Warm = first computation, dimmed = redundant recomputation.

Without caching: 25 calls for F(6). Toggle to see the difference.

The picture with caching is dramatic. The recursion tree collapses from 25 nodes to just 11 -- each F(k) is computed exactly once and then looked up. This is the essence of dynamic programming: pay once, look up forever.

Why does the naive recursive Fibonacci take exponential time?

Because the recursion tree branches into two calls at every level, and the same subproblems (like F(3), F(2)) are recomputed exponentially many times Because addition is slow Because Fibonacci numbers are large

Chapter 2: Caching vs Dynamic Programming

There are two ways to fix the Fibonacci disaster:

Memoization (top-down caching): Keep the recursion but check a cache before computing. If the answer is already stored, return it. Otherwise, compute, store, and return.

python
cache = {}
def fib_c(n):
    if n in cache: return cache[n]
    if n <= 1: cache[n] = n
    else: cache[n] = fib_c(n-1) + fib_c(n-2)
    return cache[n]

True DP (bottom-up): Eliminate recursion entirely. Fill a table from small values to large, in an order that guarantees dependencies are ready.

python
def fib_dp(n):
    f = [0] * (n + 1)
    f[1] = 1
    for i in range(2, n + 1):
        f[i] = f[i-1] + f[i-2]
    return f[n]

Both run in O(n) time. But true DP can be further optimized: since F(n) depends only on the last two values, we can use O(1) space instead of O(n).

Approach	Time	Space	Style
Naive recursion	O(1.6ⁿ)	O(n) stack	Elegant but fatal
Memoization	O(n)	O(n)	Easy to add to existing recursion
Bottom-up DP	O(n)	O(n) or O(1)	Most efficient, requires understanding evaluation order

Skiena's take-home lesson: Explicit caching of recursive call results provides most of the benefits of dynamic programming, including usually the same running time. If you prefer extra programming to more subtle thinking, memoization is fine. But true DP gives you the chance to optimize space.

What is the difference between memoization and bottom-up dynamic programming?

Memoization is top-down (recursive with caching); bottom-up DP fills a table iteratively in dependency order. Both avoid recomputation, but bottom-up can optimize space Memoization is always faster They are the same thing

Chapter 3: Binomial Coefficients

The binomial coefficient C(n, k) counts the ways to choose k items from n. It is defined by the recurrence from Pascal's triangle:

C(n, k) = C(n-1, k-1) + C(n-1, k)

with base cases C(n, 0) = C(n, n) = 1. Why does this work? Consider whether the nth element is in the chosen subset. If yes, pick k-1 more from n-1 items. If no, pick all k from n-1 items. No overlap, all cases covered.

Computing C(n, k) via the factorial formula n!/(k!(n-k)!) is dangerous -- the intermediate factorials easily overflow, even when the final answer fits in an integer. The DP approach fills Pascal's triangle directly:

python
def binomial(n, k):
    C = [[0] * (k + 1) for _ in range(n + 1)]
    for i in range(n + 1):
        C[i][0] = 1
        if i <= k: C[i][i] = 1
    for i in range(2, n + 1):
        for j in range(1, min(i, k + 1)):
            C[i][j] = C[i-1][j-1] + C[i-1][j]
    return C[n][k]

Evaluation order matters. Each cell depends on two cells in the previous row. We fill row by row, left to right. By the time we need C[i-1][j-1] and C[i-1][j], they are already computed. Getting the evaluation order right is the entire art of dynamic programming.

Why is computing C(n, k) via n!/(k!(n-k)!) numerically dangerous?

The intermediate factorials overflow even when the final answer fits in an integer; the DP approach avoids this by computing sums of smaller values Division is slow It gives the wrong answer

Chapter 4: Edit Distance -- The Problem

How different are two strings? The edit distance between strings P and T is the minimum number of character operations (insert, delete, substitute) to transform P into T.

Operation	Example
Substitution	"shot" → "spot" (replace h with p)
Insertion	"ago" → "agog" (insert g)
Deletion	"hour" → "our" (delete h)

Think about the last characters of each string. What could have happened?

Match/Substitute

Align P[i] with T[j]. Cost 0 if they match, 1 if they don't. Then solve for P[1..i-1] and T[1..j-1].

↓

Insert

T[j] has no match in P. We "insert" it. Cost 1. Then solve for P[1..i] and T[1..j-1].

↓

Delete

P[i] has no match in T. We "delete" it. Cost 1. Then solve for P[1..i-1] and T[1..j].

D[i, j] = min(D[i-1, j-1] + match(i,j), D[i, j-1] + 1, D[i-1, j] + 1)

The recursive version branches three ways at every position. Since most calls only reduce one index, the running time is at least 3ⁿ -- exponential. Computing edit distance between two 11-character strings takes several seconds recursively. Anything longer disappears into oblivion.

What are the three possible operations on the last characters when computing edit distance?

Match/substitute (align both last characters), insert (add a character to P), delete (remove a character from P) Sort, search, insert Copy, move, swap

Chapter 5: Edit Distance by DP

The edit distance recurrence has only |P| × |T| distinct subproblems (all (i, j) pairs). The recursive version recomputes them exponentially many times. The fix: store results in a 2D table.

python
def edit_distance(s, t):
    m, n = len(s), len(t)
    D = [[0] * (n+1) for _ in range(m+1)]
    for i in range(m+1): D[i][0] = i  # delete all of s
    for j in range(n+1): D[0][j] = j  # insert all of t
    for i in range(1, m+1):
        for j in range(1, n+1):
            match = 0 if s[i-1] == t[j-1] else 1
            D[i][j] = min(
                D[i-1][j-1] + match,  # match/sub
                D[i][j-1] + 1,          # insert
                D[i-1][j] + 1           # delete
            )
    return D[m][n]

This runs in O(|P| × |T|) time and space. Each cell is computed once, and each computation is O(1). The table is filled row by row, left to right -- the same order you'd read a book.

From exponential to polynomial in one step. The recursive version takes O(3ⁿ) time. The DP version takes O(nm). For two 1000-character strings, that's 10⁶ vs 10⁴⁷⁷ operations. DP didn't just speed things up -- it made the impossible possible.

Reconstructing the alignment: Store a parent pointer in each cell recording which of the three choices was optimal. Trace back from D[m][n] to D[0][0] to reconstruct the actual edit operations.

The edit distance DP table has |P| x |T| cells. What does cell D[i][j] represent?

The minimum number of edit operations to transform P[1..i] into T[1..j] Whether P[i] equals T[j] The longest common subsequence of P and T

Chapter 6: Longest Increasing Subsequence

Given a sequence of numbers, find the longest subsequence that is strictly increasing. The subsequence doesn't need to be contiguous -- you can skip elements.

For example, in [3, 1, 4, 1, 5, 9, 2, 6], the LIS is [1, 4, 5, 9] or [1, 4, 5, 6] (length 4).

Define L[i] = the length of the longest increasing subsequence ending at position i. Then:

L[i] = 1 + max(L[j]) for all j < i where S[j] < S[i]

python
def lis(S):
    n = len(S)
    L = [1] * n  # every element is an LIS of length 1
    parent = [-1] * n
    for i in range(1, n):
        for j in range(i):
            if S[j] < S[i] and L[j] + 1 > L[i]:
                L[i] = L[j] + 1
                parent[i] = j
    return max(L)

This runs in O(n²) time. There exists a clever O(n log n) algorithm using binary search, but the O(n²) DP version illustrates the technique beautifully.

Tracing the actual subsequence: The parent array records, for each position, which earlier position extends to give the best LIS ending here. To reconstruct the subsequence, find the position with maximum L value, then follow parent pointers backward. This is the same trace-back technique used in edit distance and Floyd-Warshall.

LIS has surprising connections. It appears in patience sorting (a card game), the theory of Young tableaux (combinatorics), and the Erdos-Szekeres theorem (any sequence of length n² + 1 contains a monotone subsequence of length n + 1).

Connection to edit distance: LIS can be solved as a special case of edit distance! Sort S to get T. The longest common subsequence of S and T must be an increasing subsequence of S (since T is sorted). This reduction gives the same O(n²) complexity.

What does L[i] represent in the LIS dynamic programming formulation?

The length of the longest increasing subsequence that ends at position i The length of the longest subsequence starting at position i Whether position i is in the LIS

Chapter 7: The Partition Problem

Given a set of n integers, can you partition them into two subsets with equal sum? This is a classic NP-complete problem, but DP gives a pseudopolynomial algorithm -- polynomial in the value of the sum, not just the number of elements.

Let S be the total sum. We want to know: is there a subset summing to S/2? Define P[i][j] = true if we can make sum j using a subset of the first i elements.

P[i][j] = P[i-1][j] or P[i-1][j - a_i]

Either we exclude element a_i (can we make sum j without it?) or include it (can we make sum j - a_i without it?).

python
def can_partition(arr):
    total = sum(arr)
    if total % 2 != 0: return False
    target = total // 2
    dp = [False] * (target + 1)
    dp[0] = True
    for num in arr:
        for j in range(target, num - 1, -1):
            dp[j] = dp[j] or dp[j - num]
    return dp[target]

Pseudopolynomial: The algorithm runs in O(n · S) time, where S is the sum. This is polynomial in S but not in the number of bits needed to represent S. If S is exponentially large in n, this is slow. But for reasonable values, it's very practical. This is why the partition problem is "weakly" NP-complete -- hard in theory, often tractable in practice.

The partition problem DP runs in O(n x S) time. Why is this called "pseudopolynomial"?

Because it's polynomial in the numeric value of S, not in the number of bits needed to represent S; if S is exponentially large in n, the algorithm is exponential Because it only works half the time Because it's not a real polynomial algorithm

Chapter 8: The DP Table Explorer

Watch the edit distance DP table being filled in cell by cell. See how each cell depends on its three neighbors (top, left, diagonal) and how the optimal alignment is traced back through parent pointers.

Edit Distance Table

Type two strings and click "Compute" to watch the DP table fill. Arrows show the optimal alignment path.

Enter two strings and click Compute

What to notice: Each cell looks at three neighbors: diagonally above-left (match/substitute), directly left (insert), directly above (delete). The minimum of these three determines the cell's value. The highlighted path from bottom-right to top-left traces the optimal alignment.

In the edit distance table, cell D[i][j] depends on which cells?

D[i-1][j-1] (diagonal: match/sub), D[i][j-1] (left: insert), and D[i-1][j] (above: delete) All cells in the same row Only D[i-1][j-1]

Chapter 9: Connections

Dynamic programming is arguably the most broadly useful algorithm design technique. Here is how it connects forward and backward:

Concept	Where It Leads
Overlapping subproblems	The key insight distinguishing DP from divide-and-conquer (which has no overlap)
Edit distance	DNA sequence alignment, spell-checking, diff utilities
Partition problem	Chapter 9: Pseudopolynomial algorithms for weakly NP-complete problems
Floyd-Warshall	Chapter 6: DP on graphs for all-pairs shortest paths
DAG shortest paths	Chapter 5: DP on DAGs follows topological order
TSP via DP	O(n² 2ⁿ) -- much better than n!, but still exponential for this NP-hard problem

Skiena's take-home lessons from Chapter 8:
• Dynamic programming is a technique for efficiently implementing recursive algorithms by storing partial results. Start with a correct recursion, then speed it up.
• Caching (memoization) is the easiest way to get DP benefits. Store results indexed by the recursive call parameters.
• The trick is recognizing overlapping subproblems. If different recursive calls compute the same thing, DP helps. If every call is unique (quicksort, DFS), caching is useless.
• DP is generally the right method for optimization problems on left-to-right objects: strings, sequences, polygons, rooted trees.
• Understanding the evaluation order is essential: fill the table so that every cell's dependencies are already computed.

When does caching NOT help a recursive algorithm?

When all recursive calls have distinct parameter values (like quicksort or DFS), so nothing is ever recomputed When the problem is too easy When the recursion is too deep