Max subarray, Strassen, Master theorem — breaking problems into pieces.
You have a stack of 1,024 exams to sort by student ID. One approach: flip through the pile one by one, inserting each exam into the right place. That is insertion sort — O(n²). For 1,024 exams it takes roughly a million comparisons. There is a better way.
Split the stack in half. Give 512 exams to a friend. You each sort your half independently. Then you merge the two sorted halves by comparing the top exam from each pile and picking the smaller one. Two people doing 512 exams each is far less work than one person doing 1,024 — and the merge step takes only one pass through the combined pile. This is merge sort, and it runs in O(n log n). For 1,024 exams, that is about 10,000 comparisons instead of a million.
But merge sort is not just a sorting algorithm. It is an instance of a design pattern that solves hundreds of problems across computer science. That pattern is called divide-and-conquer, and it has exactly three steps:
The simulation below shows merge sort as the canonical divide-and-conquer algorithm. Watch how the array is split recursively until every piece is a single element, and then the sorted pieces are merged back together. Pay attention to how the work at each level of the tree sums to O(n).
Watch the three steps: divide (split in half), conquer (recurse to base case), combine (merge sorted halves). Each level of the tree does O(n) total work.
In this chapter, we study three problems that showcase divide-and-conquer beyond sorting. The maximum subarray problem shows how D&C can find patterns in data. Strassen's algorithm shows how D&C can beat the "obvious" O(n³) matrix multiplication. And the master theorem gives us a tool to analyze any D&C recurrence without doing the recursion tree by hand every time.
Each of these problems follows the same three-step pattern. Once you internalize it, you will see divide-and-conquer opportunities everywhere: in sorting, searching, computational geometry, number theory, linear algebra, and signal processing.
The power of divide-and-conquer comes from a simple mathematical fact. Suppose a problem of size n requires f(n) work to solve directly. If you split it into two problems of size n/2, each costs f(n/2). If f is superlinear (like n²), then 2 · f(n/2) = 2 · (n/2)² = n²/2 — half the work. Every time you split, you save. The savings compound recursively.
Consider a concrete example. Multiplying two n-digit numbers takes n² single-digit multiplications with the grade-school algorithm. Split each number into two halves of n/2 digits. Naively you need 4 half-size multiplications (which is 4 · (n/2)² = n² — no savings). But if you can cleverly reduce 4 multiplications to 3 (Karatsuba's trick), you get 3 · (n/2)² = 3n²/4 at each level. Applied recursively, this drops the exponent from 2 to log2(3) ≈ 1.585. That is the magic of D&C: small savings at each level compound into asymptotic improvements.
Every time you encounter a new problem and suspect D&C might work, ask these four questions:
| Question | What It Determines | Example (Merge Sort) |
|---|---|---|
| How do I split? | The divide step — must produce smaller instances of the same problem | Split array at midpoint |
| How many subproblems? | The constant a in T(n) = aT(n/b) + f(n) | a = 2 (left half + right half) |
| How much does each shrink? | The constant b — each subproblem has size n/b | b = 2 (each half has n/2 elements) |
| How do I combine? | The f(n) term — work to merge subsolutions | f(n) = O(n) merge step |
If you can answer all four questions, you have a D&C algorithm. If the combine step is too expensive (say O(n²)), the algorithm may not improve on brute force. The art is in designing a cheap combine step — or in reducing the number of subproblems.
See how different D&C recurrences produce different growth rates. The slider changes the number of subproblems a (with b=2 and f(n)=n fixed). Watch the curve steepen as a grows.
You bought a stock on some day and sold it on a later day. Looking at the daily price changes (not prices, but changes: +3, -2, +5, -1, ...), the profit you made is the sum of the daily changes from your buy day to your sell day. The question: which contiguous window of days maximizes your total profit?
This is the maximum subarray problem: given an array A of n numbers (positive and negative), find the contiguous subarray A[i..j] whose elements have the largest sum. If all numbers are positive, the answer is the whole array. If all are negative, the answer is the single least-negative element. The interesting case is a mix of positives and negatives.
The brute-force approach checks all O(n²) pairs (i, j) and sums each subarray in O(n) time, giving O(n³). We can precompute prefix sums to bring it down to O(n²). But divide-and-conquer does it in O(n log n).
Split the array at the midpoint. The maximum subarray must lie in exactly one of three places:
Cases 1 and 2 are recursive calls on halves. Case 3 is the combine step: we find the maximum subarray crossing the midpoint by scanning left from mid and right from mid+1, keeping running sums. This takes O(n) time.
The recurrence is T(n) = 2T(n/2) + Θ(n). This is the same recurrence as merge sort, so the solution is Θ(n log n).
There is a simpler and faster approach. Kadane's algorithm scans left to right, maintaining the maximum subarray ending at the current position. At each position, either extend the current subarray or start fresh.
python def kadane(A): """Return (max_sum, start, end) of maximum subarray.""" max_ending_here = 0 max_so_far = float('-inf') start = end = temp_start = 0 for j in range(len(A)): max_ending_here += A[j] if max_ending_here > max_so_far: max_so_far = max_ending_here start = temp_start end = j if max_ending_here < 0: max_ending_here = 0 temp_start = j + 1 return max_so_far, start, end # Example A = [-2, 1, -3, 4, -1, 2, 1, -5, 4] print(kadane(A)) # (6, 3, 6) → subarray [4, -1, 2, 1]
Kadane's is O(n) — strictly better than D&C's O(n log n). So why study the D&C version? Because the technique matters more than this specific problem. The same "left / right / crossing" decomposition appears in closest pair, count inversions, and many geometry problems where Kadane's trick does not apply.
Let us trace the D&C algorithm on A = [-2, 1, -3, 4, -1, 2, 1, -5, 4] completely by hand. The array has 9 elements, so mid = 4.
The crossing subarray [4, -1, 2, 1] wins with sum 6. Notice how the crossing case found something neither recursive call could: a subarray spanning the midpoint. This is why the crossing check is essential — without it, we would have reported sum 4 instead of 6.
Watch D&C split the array recursively, solving left/right/crossing at each level. Then see Kadane scan once left-to-right. The maximum subarray is highlighted in orange.
python def max_crossing_subarray(A, low, mid, high): """Find max subarray that crosses the midpoint. O(n).""" # Scan left from mid left_sum = float('-inf') total = 0 max_left = mid for i in range(mid, low - 1, -1): total += A[i] if total > left_sum: left_sum = total max_left = i # Scan right from mid+1 right_sum = float('-inf') total = 0 max_right = mid + 1 for j in range(mid + 1, high + 1): total += A[j] if total > right_sum: right_sum = total max_right = j return max_left, max_right, left_sum + right_sum def max_subarray_dc(A, low, high): """Find max subarray in A[low..high] using D&C. O(n log n).""" if low == high: # base case: single element return low, high, A[low] mid = (low + high) // 2 # Recurse on left and right halves ll, lr, lsum = max_subarray_dc(A, low, mid) rl, rr, rsum = max_subarray_dc(A, mid + 1, high) cl, cr, csum = max_crossing_subarray(A, low, mid, high) # Return the best of three cases if lsum ≥ rsum and lsum ≥ csum: return ll, lr, lsum elif rsum ≥ lsum and rsum ≥ csum: return rl, rr, rsum else: return cl, cr, csum # Test A = [-2, 1, -3, 4, -1, 2, 1, -5, 4] l, r, s = max_subarray_dc(A, 0, len(A) - 1) print(f"Max subarray A[{l}..{r}] = {A[l:r+1]}, sum = {s}") # Max subarray A[3..6] = [4, -1, 2, 1], sum = 6
Array: [-2, 1, -3, 4, -1, 2, 1, -5, 4]. Midpoint index = 4 (value -1).
| Case | Subarray | Max Sum |
|---|---|---|
| Left half [-2, 1, -3, 4, -1] | [4] at index 3 | 4 |
| Right half [2, 1, -5, 4] | [2, 1] at indices 5-6 | 3 |
| Crossing | [4, -1] left + [2, 1] right = [4, -1, 2, 1] | 3 + 3 = 6 |
The crossing subarray wins with sum 6. This matches Kadane's output: [4, -1, 2, 1] at indices 3..6.
A subtle edge case: what if every element is negative? The D&C algorithm handles this correctly because the base case returns single elements, and the max of all single-element returns is the least-negative element. Kadane's algorithm (as written above) also handles this, since we initialize max_so_far to negative infinity and track individual elements.
Some implementations of Kadane's reset max_ending_here to 0 when it goes negative. This variant returns 0 for all-negative arrays (the "empty subarray"). Whether the empty subarray is allowed depends on the problem statement — always clarify in an interview.
| Approach | Time | Space | Key Idea |
|---|---|---|---|
| Brute force (all pairs) | O(n³) | O(1) | Check all (i,j) pairs, sum each subarray |
| Brute force + prefix sums | O(n²) | O(n) | Precompute prefix sums for O(1) range sums |
| Divide and Conquer | O(n log n) | O(log n) stack | Left, right, crossing decomposition |
| Kadane's algorithm | O(n) | O(1) | Greedy: extend or restart at each position |
The progression from O(n³) to O(n) illustrates a deep point about algorithm design. Each improvement comes from a different insight: prefix sums avoid redundant addition, D&C avoids redundant pair-checking, and Kadane's exploits the sequential structure. The D&C approach is not the fastest here, but the technique — splitting into left, right, and crossing — generalizes to problems (like closest pair) where Kadane's trick does not work.
Multiplying two n×n matrices the way you learned in linear algebra takes Θ(n³) time: each of the n² entries in the result is the dot product of a row and a column, and each dot product sums n terms. For n=1000, that is a billion multiplications. Can we do better?
In 1969, Volker Strassen shocked the mathematics community by showing that the answer is yes. He found a way to multiply 2×2 matrices using 7 multiplications instead of 8. That one saved multiplication, applied recursively, drops the running time from O(n³) to O(n2.807). The improvement seems small for tiny matrices but is enormous at scale.
Partition each n×n matrix into four (n/2)×(n/2) submatrices:
From the definition of matrix multiplication:
This requires 8 multiplications of (n/2)×(n/2) matrices and 4 additions. The recurrence is T(n) = 8T(n/2) + Θ(n²). Solving it (we will learn how later in this chapter), the answer is T(n) = Θ(n³) — no better than the schoolbook method. The problem: 8 recursive multiplications. Each halving creates 8 subproblems, so the work grows as 8log n = nlog28 = n³.
Strassen defined 7 cleverly chosen products of sums and differences of submatrices:
Then the four quadrants of C are computed using only additions and subtractions of the Pi:
Let us verify Strassen with concrete numbers.
All four entries match. Seven multiplications, more additions, but the same result.
Compare the number of scalar multiplications for naive (8 recursive mults per level) vs Strassen (7 per level). The gap widens exponentially with matrix size.
| n | Naive (n³) | Strassen (~n2.807) | Speedup |
|---|---|---|---|
| 4 | 64 | ~50 | 1.3x |
| 64 | 262,144 | ~117,649 | 2.2x |
| 1024 | 1.07 × 109 | ~2.4 × 108 | 4.5x |
| 4096 | 6.87 × 1010 | ~7.8 × 109 | 8.8x |
In practice, the constant factor in Strassen is large (18 additions vs 4 for naive). Libraries like BLAS switch to Strassen only when n exceeds a crossover point (typically n ≥ 64-256). Below that, the cache-friendly naive algorithm with loop tiling wins.
Can we do better than 7? For 2×2 matrices, Hopcroft and Kerr (1971) proved that 7 is optimal — you cannot multiply 2×2 matrices with fewer than 7 multiplications. But for larger block sizes, you can sometimes do better. The current best algorithm for general matrix multiplication achieves exponent ω ≈ 2.371 (Alman-Williams 2024), though its constant factor is so enormous that it is never used in practice.
The practical state of the art is:
| Algorithm | Exponent | Practical? | When Used |
|---|---|---|---|
| Schoolbook | 3.000 | Yes | Small matrices (n < 64), GPU kernels |
| Strassen | 2.807 | Yes | Large dense matrices (n > 256), some BLAS implementations |
| Coppersmith-Winograd variants | 2.371 | No | Theoretical interest only |
python import numpy as np def strassen(A, B): """Multiply two square matrices using Strassen's algorithm.""" n = A.shape[0] if n ≤ 64: # crossover to naive for small matrices return A @ B mid = n // 2 A11, A12 = A[:mid, :mid], A[:mid, mid:] A21, A22 = A[mid:, :mid], A[mid:, mid:] B11, B12 = B[:mid, :mid], B[:mid, mid:] B21, B22 = B[mid:, :mid], B[mid:, mid:] # 7 recursive multiplications P1 = strassen(A11, B12 - B22) P2 = strassen(A11 + A12, B22) P3 = strassen(A21 + A22, B11) P4 = strassen(A22, B21 - B11) P5 = strassen(A11 + A22, B11 + B22) P6 = strassen(A12 - A22, B21 + B22) P7 = strassen(A11 - A21, B11 + B12) # Combine with additions only C = np.zeros((n, n)) C[:mid, :mid] = P5 + P4 - P2 + P6 C[:mid, mid:] = P1 + P2 C[mid:, :mid] = P3 + P4 C[mid:, mid:] = P5 + P1 - P3 - P7 return C
Notice the crossover at n=64. Below this threshold, we fall back to NumPy's built-in matrix multiplication (which uses optimized BLAS). This hybrid approach is how real implementations work — Strassen for the top levels of recursion, naive for the small base cases where cache performance matters more than asymptotic complexity.
The Pi formulas look magical. Let us verify C11 = P5 + P4 - P2 + P6 algebraically to see that Strassen did not just get lucky.
Everything cancels except the two terms we need. Strassen found these particular linear combinations through careful algebraic manipulation. The same verification works for C12, C21, and C22 — each Pi combination produces exactly the right pair of terms while canceling everything else.
Every divide-and-conquer algorithm has a natural recurrence that describes its running time. You split a problem of size n into a subproblems of size n/b, do f(n) work to divide and combine, and solve each subproblem recursively. The result is:
Where:
Let us see this for the algorithms we have studied:
| Algorithm | a | b | f(n) | Recurrence | Solution |
|---|---|---|---|---|---|
| Merge sort | 2 | 2 | Θ(n) | T(n) = 2T(n/2) + Θ(n) | Θ(n log n) |
| Binary search | 1 | 2 | Θ(1) | T(n) = T(n/2) + Θ(1) | Θ(log n) |
| Max subarray (D&C) | 2 | 2 | Θ(n) | T(n) = 2T(n/2) + Θ(n) | Θ(n log n) |
| Naive matrix mult. | 8 | 2 | Θ(n²) | T(n) = 8T(n/2) + Θ(n²) | Θ(n³) |
| Strassen | 7 | 2 | Θ(n²) | T(n) = 7T(n/2) + Θ(n²) | Θ(n2.807) |
| Karatsuba | 3 | 2 | Θ(n) | T(n) = 3T(n/2) + Θ(n) | Θ(n1.585) |
There are three standard methods to solve recurrences:
The substitution method has two steps: (1) guess the form of the solution, and (2) prove the guess is correct using mathematical induction.
Example: prove that T(n) = 2T(n/2) + n is O(n log n).
The key trick: we needed cn - n ≥ 0, i.e., c ≥ 1. Choose c = 1 and the proof goes through. The base case T(1) = 1 ≤ c · 1 · log 1 = 0 fails, but we can handle small cases separately (T(2) = 4 ≤ 2 · 2 · 1 = 4 works).
| Problem | Trick | Example |
|---|---|---|
| Residual term won't cancel | Subtract a lower-order term from the guess | T(n) ≤ cn - d instead of T(n) ≤ cn |
| Floors and ceilings | Assume n is an exact power of b, then argue the general case by monotonicity | T(n) = T(⌈n/2⌉) + 1: assume n = 2k |
| Guess too loose | Tighten by trying exact form: cn log n instead of cn² | Recursion tree suggests n log n, so try that |
| Need Ω bound | Same technique but with ≥ instead of ≤ | T(n) ≥ cn log n - d proves Ω(n log n) |
Prove that T(n) = 2T(⌊n/2⌋) + 1 is O(n).
The trick: when the residual does not cooperate, subtract a lower-order term from your guess. This is the most common substitution technique and it comes up in interviews. The guess T(n) ≤ cn - d still implies T(n) = O(n), since cn - d ≤ cn.
Enter a, b, and f(n) exponent. The simulation computes T(n) numerically and overlays your O-bound guess so you can see if it fits.
The substitution method requires a guess. But where does the guess come from? The recursion tree method gives you a visual way to derive it. Draw the tree of recursive calls, compute the work at each level, and sum the geometric series.
For T(n) = 2T(n/2) + cn (merge sort), the tree looks like this:
Every level does exactly cn work. There are log2(n) + 1 levels. Total: cn · (log2(n) + 1) = Θ(n log n). This confirms our substitution proof from Chapter 3.
The total work is a geometric series with ratio r = 3/16 < 1:
The ratio r = 3/16 < 1 means the series converges — the root level dominates. The total is Θ(n²), which is the same as f(n). This is Master Theorem Case 3 (the combine cost dominates).
This recurrence has unequal subproblems — the master theorem does not apply directly. But the recursion tree still works.
The shortest path from root to leaf is log3(n) (always taking the n/3 branch). The longest is log3/2(n) (always taking the 2n/3 branch). So the tree has between log3(n) and log3/2(n) full levels. Each full level costs cn. The total is Θ(n log n) — same as merge sort, even though the split is uneven.
The series is n + 2n + 4n + ... + n² — a growing geometric series with ratio 2. The last term (leaves) dominates: n². Total = Θ(n²). This is Master Theorem Case 1: the recursion creates so many subproblems (4 subproblems at half size) that the leaf work swamps the combine work.
Notice the contrast with merge sort's tree (2 subproblems at half size, O(n) combine): every level costs the same. With 4 subproblems at half size, each level doubles the previous level's cost, so the leaves dominate.
Recursion tree totals always reduce to geometric series. Here is the pattern you need:
| Series Ratio r | Behavior | Total Dominated By | Master Theorem Case |
|---|---|---|---|
| r < 1 (decreasing) | Converges: ∑ ≤ first / (1-r) | First term (root) | Case 3 |
| r = 1 (constant) | All terms equal | count × value = Θ(f(n) log n) | Case 2 |
| r > 1 (increasing) | Diverges: ∑ ~ last term | Last term (leaves) | Case 1 |
For T(n) = aT(n/b) + f(n) where f(n) = cnd, the ratio of consecutive levels is:
This single number tells you everything:
For merge sort: a=2, b=2, d=1. r = 2/21 = 1. Case 2. For Strassen: a=7, b=2, d=2. r = 7/4 > 1. Case 1. For T(n)=3T(n/4)+n²: a=3, b=4, d=2. r = 3/16 < 1. Case 3.
Watch the recursion tree build level by level. Each node shows its subproblem size and cost. The right panel shows total work per level. Select a recurrence to animate.
In an interview, you will often need to draw a recursion tree to derive a bound. Here is a systematic approach:
Drawing a recursion tree every time is tedious. The master theorem gives you the answer directly for any recurrence of the form T(n) = aT(n/b) + f(n), where a ≥ 1 and b > 1.
The key quantity is nlogba — this is the total number of leaves in the recursion tree. The theorem compares f(n) (work per level at the top) with nlogba (work at the bottom). Whichever is bigger wins.
Example 1: T(n) = 9T(n/3) + n
Example 2: T(n) = T(2n/3) + 1
Example 3: T(n) = 3T(n/4) + n log n
Example 4: T(n) = 2T(n/2) + n log n
The recursion tree for T(n) = aT(n/b) + f(n) has logb(n) levels. At level k, there are ak nodes, each doing f(n/bk) work. The total work at level k is ak · f(n/bk). The leaves (level logb(n)) contribute Θ(nlogba) total work, since there are alogbn = nlogba of them.
Now imagine the work at each level as a bar in a bar chart:
| Case | Shape of Bar Chart | Why | Total |
|---|---|---|---|
| Case 1 | Grows top-to-bottom (leaves tallest) | Many subproblems, each small — leaf-heavy tree | Dominated by leaves: Θ(nlogba) |
| Case 2 | All bars equal height | Work per level is exactly the same — flat tree | height × levels: Θ(nlogba log n) |
| Case 3 | Shrinks top-to-bottom (root tallest) | Combine step expensive, leaves cheap — root-heavy tree | Dominated by root: Θ(f(n)) |
Think of it as a tug-of-war between "recursion cost" (driven by a and b) and "combine cost" (f(n)). The master theorem tells you who wins.
The master theorem only handles recurrences with equal-size subproblems. What about T(n) = T(n/3) + T(2n/3) + n? The Akra-Bazzi method handles these generalized recurrences of the form:
Find p such that ∑ aibip = 1. Then T(n) = Θ(np(1 + ∫1n f(x)/xp+1 dx)). For T(n) = T(n/3) + T(2n/3) + n: find p such that (1/3)p + (2/3)p = 1. By inspection, p = 1 works: 1/3 + 2/3 = 1. The integral gives a log n factor. So T(n) = Θ(n log n), confirming our recursion tree analysis.
You do not need to memorize Akra-Bazzi for interviews, but knowing it exists shows depth. When the master theorem fails (unequal splits, non-polynomial f(n)), Akra-Bazzi is the fallback.
Case 3 has an extra condition: af(n/b) ≤ cf(n) for some c < 1 and all sufficiently large n. This is the regularity condition. It ensures that f(n) does not have pathological behavior (like oscillating). For all "normal" functions (polynomials, n log n, etc.), the regularity condition is satisfied automatically. You only need to worry about it for exotic functions.
To verify it: plug in and check that the inequality holds. For f(n) = n² with a=3, b=4: 3(n/4)² = 3n²/16 ≤ cn² requires c ≥ 3/16. Since 3/16 < 1, the condition is satisfied with c = 3/16.
Set a, b, and the exponent of f(n). The calculator identifies which case applies and gives the solution.
| Recurrence | a | b | nlogba | Case | T(n) |
|---|---|---|---|---|---|
| T = 2T(n/2) + 1 | 2 | 2 | n | 1 | Θ(n) |
| T = 2T(n/2) + n | 2 | 2 | n | 2 | Θ(n log n) |
| T = 2T(n/2) + n² | 2 | 2 | n | 3 | Θ(n²) |
| T = 4T(n/2) + n | 4 | 2 | n² | 1 | Θ(n²) |
| T = 4T(n/2) + n² | 4 | 2 | n² | 2 | Θ(n² log n) |
| T = 4T(n/2) + n³ | 4 | 2 | n² | 3 | Θ(n³) |
| T = 7T(n/2) + n² | 7 | 2 | n2.807 | 1 | Θ(n2.807) |
| T = 9T(n/3) + n | 9 | 3 | n² | 1 | Θ(n²) |
This is the payoff chapter. Below is a single simulation that visualizes four divide-and-conquer algorithms side by side. Select an algorithm, set the input size, and watch the recursive decomposition happen step by step. Each algorithm follows the same three-step pattern — divide, conquer, combine — but the specific strategy differs.
Select an algorithm and watch D&C in action. The array/points split recursively. Colors show recursion depth. Status shows current operation.
Each algorithm demonstrates a different aspect of divide-and-conquer:
| Algorithm | Divide | Conquer | Combine | Recurrence |
|---|---|---|---|---|
| Merge sort | Split at midpoint | Sort halves | Merge sorted halves (O(n)) | T(n) = 2T(n/2) + n |
| Quicksort | Partition around pivot | Sort subarrays | Nothing (in-place) | T(n) = T(k) + T(n-k-1) + n |
| Closest pair | Split by x-coordinate | Find closest in halves | Check strip near dividing line | T(n) = 2T(n/2) + n |
| Karatsuba | Split digits in half | 3 half-length mults | Shift and add | T(n) = 3T(n/2) + n |
Every D&C algorithm sits somewhere on a spectrum between "easy divide, hard combine" and "hard divide, easy combine". Understanding where each algorithm sits helps you design new ones.
| Algorithm | Divide Difficulty | Combine Difficulty | Where the Intelligence Is |
|---|---|---|---|
| Merge sort | Trivial (split at midpoint) | O(n) merge | Combine — merging correctly |
| Quicksort | O(n) partition | Trivial (nothing) | Divide — choosing good pivot |
| Closest pair | Trivial (split by x-median) | O(n) strip check | Combine — proving strip has O(1) comparisons per point |
| Strassen | O(n²) additions | O(n²) additions | Neither — the cleverness is in having 7 products instead of 8 |
| Binary search | O(1) comparison | Trivial (nothing) | Divide — deciding which half to keep |
| FFT | O(n) even/odd split | O(n) butterfly | Both — mathematical structure of roots of unity |
Closest pair is the most elegant D&C algorithm beyond sorting. Let us trace through the key ideas:
The recurrence is T(n) = 2T(n/2) + O(n log n) — not O(n) — because we sort the strip by y at each level. This gives T(n) = O(n log² n). However, with a clever pre-sorting trick (maintain a separate y-sorted list alongside the x-sorted list), we can reduce the combine step to O(n), giving the optimal T(n) = O(n log n).
Two practical properties distinguish D&C sorting algorithms in interviews:
Stability. A sorting algorithm is stable if equal elements maintain their relative order. Merge sort is stable (we take from the left array when equal). Quicksort is not stable in its standard implementation. This matters when sorting database records by a secondary key.
In-place. Merge sort requires O(n) extra space for the merge step. Quicksort partitions in-place using O(log n) stack space. When memory is tight, quicksort wins despite its O(n²) worst case (mitigated by random pivot selection).
python def merge_sort(A): if len(A) ≤ 1: return A mid = len(A) // 2 left = merge_sort(A[:mid]) # Conquer left half right = merge_sort(A[mid:]) # Conquer right half return merge(left, right) # Combine: O(n) merge def merge(L, R): result = [] i = j = 0 while i < len(L) and j < len(R): if L[i] ≤ R[j]: result.append(L[i]); i += 1 else: result.append(R[j]); j += 1 result.extend(L[i:]); result.extend(R[j:]) return result
python def quicksort(A, lo, hi): if lo < hi: p = partition(A, lo, hi) # Divide: all hard work here quicksort(A, lo, p - 1) # Conquer left quicksort(A, p + 1, hi) # Conquer right # Combine: nothing — array is sorted in place! def partition(A, lo, hi): pivot = A[hi] i = lo - 1 for j in range(lo, hi): if A[j] ≤ pivot: i += 1 A[i], A[j] = A[j], A[i] A[i + 1], A[hi] = A[hi], A[i + 1] return i + 1
Given n points in the plane, find the pair with the smallest Euclidean distance. The brute-force approach checks all O(n²) pairs. D&C brings it down to O(n log n).
The "at most 7 comparisons" is the non-obvious part, and it is the key insight that makes the algorithm O(n log n) instead of O(n²). Let us prove it carefully.
Consider the strip: all points within distance δ of the dividing line. We sort these points by y-coordinate. For each point p, we need to check if any subsequent point (in y-order) is closer than δ.
How many subsequent points can possibly be within distance δ? Imagine a δ×2δ rectangle centered on the dividing line, with p at the bottom. This rectangle extends δ to the left and δ to the right of the midline, and δ upward from p.
Divide this rectangle into 8 cells of size (δ/2)×(δ/2). Each cell has diagonal δ/√2 < δ. So each cell can contain at most one point (if it contained two, they would be closer than δ, contradicting the fact that δ is the minimum distance within each half). Therefore, the rectangle contains at most 8 points, and since p is one of them, there are at most 7 others to check.
This argument is subtle and frequently tested in interviews. The interviewer wants to hear: (1) sort strip by y, (2) for each point check only next 7, (3) the "8 points in a rectangle" packing argument.
Watch D&C find the closest pair: split by x-median, recurse on halves, check the strip. The strip (gray band) shrinks as δ decreases. The closest pair is highlighted.
python import math def closest_pair(points): """Return (dist, p1, p2) for closest pair.""" pts = sorted(points, key=lambda p: p[0]) return _cp(pts) def _cp(pts): n = len(pts) if n ≤ 3: return brute_force(pts) mid = n // 2 dl, pl1, pl2 = _cp(pts[:mid]) dr, pr1, pr2 = _cp(pts[mid:]) if dl < dr: delta, best = dl, (dl, pl1, pl2) else: delta, best = dr, (dr, pr1, pr2) # Check strip mid_x = pts[mid][0] strip = [p for p in pts if abs(p[0] - mid_x) < delta] strip.sort(key=lambda p: p[1]) for i in range(len(strip)): j = i + 1 while j < len(strip) and strip[j][1] - strip[i][1] < delta: d = math.dist(strip[i], strip[j]) if d < best[0]: best = (d, strip[i], strip[j]) j += 1 return best
Multiplying two n-digit numbers the grade-school way takes O(n²) single-digit multiplications. Karatsuba (1960) showed how to do it in O(nlog23) ≈ O(n1.585) using the same D&C trick as Strassen: reduce 4 multiplications to 3.
Split each n-digit number into two halves:
python def karatsuba(x, y): """Multiply two non-negative integers using Karatsuba.""" if x < 10 or y < 10: return x * y n = max(len(str(x)), len(str(y))) m = n // 2 p = 10 ** m x1, x0 = divmod(x, p) y1, y0 = divmod(y, p) z2 = karatsuba(x1, y1) z0 = karatsuba(x0, y0) z1 = karatsuba(x1 + x0, y1 + y0) - z2 - z0 return z2 * p * p + z1 * p + z0 print(karatsuba(1234, 5678)) # 7006652
Let us multiply 1234 × 5678 step by step:
Grade-school multiplication of 1234 × 5678 needs 4 × 4 = 16 digit-multiplications. Karatsuba used 3 half-size multiplications (each of 2-digit numbers). For 2-digit numbers the savings are modest, but for 1024-digit RSA keys, Karatsuba saves 25% at each recursion level, compounding to a 40% speedup overall.
Binary search is the simplest D&C algorithm: divide by 2, conquer one half (discard the other), combine step is trivial. T(n) = T(n/2) + O(1) = O(log n). Master theorem Case 2 with a=1, b=2.
What makes binary search special among D&C algorithms is that it only recurses into one subproblem (a=1). This is why it is logarithmic — each step cuts the search space in half without creating any additional work. Compare with merge sort (a=2): two subproblems of half size create linear work per level. One subproblem of half size creates constant work per level.
python def binary_search(A, target): """D&C perspective: divide by 2, conquer 1 half, combine trivially.""" lo, hi = 0, len(A) - 1 while lo ≤ hi: mid = (lo + hi) // 2 # Divide: find midpoint if A[mid] == target: return mid # Found it elif A[mid] < target: lo = mid + 1 # Conquer: discard left half else: hi = mid - 1 # Conquer: discard right half return -1 # Combine: nothing (trivial)
The iterative version above is equivalent to the recursive D&C formulation — we just unroll the tail recursion into a loop. This is a common optimization: when the recursion only goes into one branch (tail recursion), convert it to iteration to save stack space.
The Fast Fourier Transform evaluates a polynomial of degree n at n roots of unity in O(n log n) time, versus the naive O(n²). It splits the polynomial into even and odd coefficients (divide), recursively evaluates each half (conquer), and combines using the butterfly pattern (combine). T(n) = 2T(n/2) + O(n) = O(n log n).
Why does FFT matter? Because polynomial multiplication is secretly behind everything:
| Application | Why FFT Helps | Speedup |
|---|---|---|
| Polynomial multiplication | Multiply in coefficient form: O(n²). Convert to point-value via FFT, multiply pointwise O(n), convert back via inverse FFT. Total O(n log n). | O(n²) → O(n log n) |
| Large integer multiplication | Treat digits as polynomial coefficients | O(n²) → O(n log n) |
| Signal processing | Convolution in time domain = multiplication in frequency domain | O(n²) → O(n log n) |
| String matching | Pattern matching via polynomial multiplication with wildcards | O(nm) → O(n log n) |
The D&C structure of FFT is beautiful: a polynomial P(x) = a0 + a1x + a2x² + ... + an-1xn-1 splits into even powers Peven(x²) = a0 + a2x² + a4x4 + ... and odd powers Podd(x²) = a1 + a3x² + a5x4 + ..., so P(x) = Peven(x²) + x · Podd(x²). Two half-size FFTs plus O(n) combining. Same recurrence as merge sort. We will cover FFT in full depth in a later chapter (CLRS Ch 30).
| Case | Condition | Solution | Intuition |
|---|---|---|---|
| 1 | f(n) = O(nlogba - ε) | Θ(nlogba) | Leaves dominate — too many subproblems |
| 2 | f(n) = Θ(nlogba) | Θ(nlogba log n) | Equal work at every level |
| 3 | f(n) = Ω(nlogba + ε) | Θ(f(n)) | Root dominates — combine step expensive |
| Problem | Recurrence | Time | Key Trick |
|---|---|---|---|
| Merge sort | 2T(n/2) + n | O(n log n) | Linear merge |
| Quicksort (avg) | 2T(n/2) + n | O(n log n) | Random pivot |
| Binary search | T(n/2) + 1 | O(log n) | Discard half |
| Max subarray | 2T(n/2) + n | O(n log n) | Cross-midpoint scan |
| Strassen | 7T(n/2) + n² | O(n2.807) | 7 clever products |
| Karatsuba | 3T(n/2) + n | O(n1.585) | 3 half-digit mults |
| Closest pair | 2T(n/2) + n | O(n log n) | Strip has ≤7 comparisons |
| Count inversions | 2T(n/2) + n | O(n log n) | Count during merge |
| FFT | 2T(n/2) + n | O(n log n) | Even/odd split + butterfly |
| Select (median) | T(n/5) + T(7n/10) + n | O(n) | Median of medians |
Compute xn in O(log n) time using the D&C insight: xn = (xn/2)² if n is even, x · xn-1 if n is odd.
python def power(x, n): """Compute x^n in O(log n) multiplications.""" if n == 0: return 1 if n < 0: return 1 / power(x, -n) if n % 2 == 0: half = power(x, n // 2) return half * half # 1 multiplication, not 2 else: return x * power(x, n - 1)
An inversion is a pair (i, j) where i < j but A[i] > A[j]. Counting inversions measures how far an array is from sorted. Brute force: O(n²). D&C (modified merge sort): O(n log n). The trick: count cross-inversions during the merge step.
python def count_inversions(A): if len(A) ≤ 1: return A, 0 mid = len(A) // 2 left, l_inv = count_inversions(A[:mid]) right, r_inv = count_inversions(A[mid:]) merged, split_inv = merge_count(left, right) return merged, l_inv + r_inv + split_inv def merge_count(L, R): result, inv = [], 0 i = j = 0 while i < len(L) and j < len(R): if L[i] ≤ R[j]: result.append(L[i]); i += 1 else: result.append(R[j]); j += 1 inv += len(L) - i # all remaining left elements are inversions result.extend(L[i:]); result.extend(R[j:]) return result, inv
python def search_rotated(A, target): """Find target in rotated sorted array, O(log n).""" lo, hi = 0, len(A) - 1 while lo ≤ hi: mid = (lo + hi) // 2 if A[mid] == target: return mid if A[lo] ≤ A[mid]: # left half is sorted if A[lo] ≤ target < A[mid]: hi = mid - 1 else: lo = mid + 1 else: # right half is sorted if A[mid] < target ≤ A[hi]: lo = mid + 1 else: hi = mid - 1 return -1
Watch the modified merge sort count inversions. During each merge, when a right element is chosen before left elements are exhausted, the remaining left elements are all inversions (shown in red).
Given two sorted arrays of size m and n, find the median of the combined array in O(log(min(m,n))) time. This is one of the hardest D&C interview problems (LeetCode Hard).
The key insight: the median divides the combined array into two equal halves. We binary search for the correct partition point in the smaller array, then compute the partition in the larger array.
python def find_median(A, B): """Find median of two sorted arrays in O(log(min(m,n))).""" if len(A) > len(B): A, B = B, A # ensure A is shorter m, n = len(A), len(B) lo, hi = 0, m while lo ≤ hi: i = (lo + hi) // 2 # partition A at i j = (m + n + 1) // 2 - i # partition B at j left_A = A[i-1] if i > 0 else float('-inf') right_A = A[i] if i < m else float('inf') left_B = B[j-1] if j > 0 else float('-inf') right_B = B[j] if j < n else float('inf') if left_A ≤ right_B and left_B ≤ right_A: if (m + n) % 2 == 0: return (max(left_A, left_B) + min(right_A, right_B)) / 2 return max(left_A, left_B) elif left_A > right_B: hi = i - 1 else: lo = i + 1
Quickselect finds the k-th smallest element in O(n) average time using D&C — partition around a random pivot, then recurse into only the half that contains the target rank. Unlike quicksort, we recurse into one side, not both.
python import random def quickselect(A, k): """Find k-th smallest element (0-indexed). Average O(n).""" if len(A) == 1: return A[0] pivot = random.choice(A) lows = [x for x in A if x < pivot] highs = [x for x in A if x > pivot] pivots = [x for x in A if x == pivot] if k < len(lows): return quickselect(lows, k) elif k < len(lows) + len(pivots): return pivot else: return quickselect(highs, k - len(lows) - len(pivots))
The recurrence is T(n) = T(n/2) + O(n) on average (we recurse into one half). By the master theorem: a=1, b=2, f(n)=n, nlog21 = 1. Since f(n) = n polynomially dominates 1, Case 3 gives T(n) = O(n). Key: we recurse into only ONE subproblem, not two.
The worst case is O(n²) (always picking the min or max as pivot), but randomization makes this astronomically unlikely. The median-of-medians algorithm (CLRS Ch 9) guarantees O(n) worst case by choosing a "good enough" pivot deterministically — split into groups of 5, find median of each, then recursively find the median of those medians.
A recurring theme in this chapter: the number of subproblems a is the most important factor in D&C complexity. Let us see this concretely for T(n) = aT(n/2) + n:
| a (subproblems) | Solution | Exponent | n=1024 |
|---|---|---|---|
| 1 | O(n) | 1.000 | 1,024 |
| 2 | O(n log n) | ~1.000 × log | 10,240 |
| 3 | O(n1.585) | 1.585 | 59,049 |
| 4 | O(n²) | 2.000 | 1,048,576 |
| 7 | O(n2.807) | 2.807 | 282,475,249 |
| 8 | O(n³) | 3.000 | 1,073,741,824 |
Going from a=8 to a=7 (Strassen) reduces the work for n=1024 by nearly 4x. Going from a=4 to a=3 (Karatsuba) reduces it by 18x. Every saved subproblem pays exponential dividends. This is why researchers have spent decades trying to reduce the matrix multiplication constant from 7 to lower values — each reduction compounds across all recursion levels.
| Dimension | What They Ask | What a Strong Answer Includes |
|---|---|---|
| Concept | "State and prove the master theorem Case 2" | Recursion tree derivation showing equal work at every level, geometric series argument |
| Design | "Design an O(n log n) algorithm to count inversions" | Modified merge sort with cross-inversion counting during merge step |
| Code | "Implement pow(x, n) in O(log n)" | Handle negative exponents, avoid recomputing xn/2 twice |
| Debug | "My merge sort gives wrong output for [3,1,2]" | Check merge function — likely off-by-one in index handling or missing extend for remaining elements |
| Frontier | "What is the current best matrix multiplication exponent?" | ω ≈ 2.371 (Williams et al. 2024), but constant factor is astronomical — Strassen (ω=2.807) is used in practice up to n ~ 104 |
For rapid recall during interviews. Every entry here should be instant for you:
| Recurrence | a | b | f(n) | Case | Answer |
|---|---|---|---|---|---|
| T(n) = T(n/2) + O(1) | 1 | 2 | 1 | 2 | O(log n) — binary search |
| T(n) = T(n/2) + O(n) | 1 | 2 | n | 3 | O(n) — quickselect |
| T(n) = 2T(n/2) + O(1) | 2 | 2 | 1 | 1 | O(n) — tree traversal |
| T(n) = 2T(n/2) + O(n) | 2 | 2 | n | 2 | O(n log n) — merge sort |
| T(n) = 2T(n/2) + O(n²) | 2 | 2 | n² | 3 | O(n²) — root dominated |
| T(n) = 3T(n/2) + O(n) | 3 | 2 | n | 1 | O(n1.585) — Karatsuba |
| T(n) = 4T(n/2) + O(n) | 4 | 2 | n | 1 | O(n²) — leaf dominated |
| T(n) = 7T(n/2) + O(n²) | 7 | 2 | n² | 1 | O(n2.807) — Strassen |
| T(n) = 9T(n/3) + O(n) | 9 | 3 | n | 1 | O(n²) — leaf dominated |
| Mistake | Why It Happens | How to Avoid It |
|---|---|---|
| Forgetting the crossing/boundary case | Only check left and right halves, miss solutions spanning the midpoint | Always ask: "Can the answer span the divide boundary?" |
| Wrong base case | Off-by-one: returning wrong value for n=0 or n=1 | Test your base case manually with the smallest possible input |
| Recomputing subproblems | Accidentally overlapping subproblems (should switch to DP) | Check if the same subproblem appears multiple times in the recursion tree |
| O(n) combine with O(n) subproblems | T(n) = nT(n/2) + n gives exponential time | The subproblem count a must be a constant, not a function of n |
| Not handling odd-length arrays | mid = n/2 with integer division, but left and right sizes differ by 1 | Use mid = (lo + hi) // 2, left = A[lo..mid], right = A[mid+1..hi] |
Just as pow(x, n) computes scalar exponentiation in O(log n), we can compute matrix exponentiation Mn in O(k³ log n) time, where k is the matrix dimension. This is critical for computing the n-th Fibonacci number in O(log n):
python import numpy as np def matrix_power(M, n): """Compute M^n in O(k^3 log n) using repeated squaring.""" k = M.shape[0] result = np.eye(k, dtype=int) # identity matrix base = M.copy() while n > 0: if n % 2 == 1: result = result @ base base = base @ base # square the base n //= 2 return result # Fibonacci via matrix exponentiation # [F(n+1), F(n)] = [[1,1],[1,0]]^n * [F(1), F(0)] def fibonacci(n): if n ≤ 1: return n M = np.array([[1, 1], [1, 0]]) result = matrix_power(M, n - 1) return result[0][0] print(fibonacci(50)) # 12586269025 (in O(log 50) matrix mults)
This technique extends to any linear recurrence: if xn depends linearly on the previous k terms, you can express it as a k×k matrix raised to the n-th power. D&C (repeated squaring) makes this O(k³ log n) instead of the naive O(kn).
A significant inversion is a pair (i, j) where i < j and A[i] > 2·A[j]. This cannot be counted during the standard merge step because the "2x" condition breaks the merge invariant. Instead, count significant inversions in a separate pass before merging:
python def count_significant(A): """Count pairs (i,j) where i<j and A[i] > 2*A[j]. O(n log n).""" if len(A) ≤ 1: return A, 0 mid = len(A) // 2 left, l_inv = count_significant(A[:mid]) right, r_inv = count_significant(A[mid:]) # Count cross-significant inversions (before merge) count = 0 j = 0 for i in range(len(left)): while j < len(right) and left[i] > 2 * right[j]: j += 1 count += j # all right[0..j-1] form significant inversions with left[i] # Standard merge (for the recursive structure) merged = [] a = b = 0 while a < len(left) and b < len(right): if left[a] ≤ right[b]: merged.append(left[a]); a += 1 else: merged.append(right[b]); b += 1 merged.extend(left[a:]); merged.extend(right[b:]) return merged, l_inv + r_inv + count
This "count before merge" pattern is a common interview variant. The key insight: because both halves are sorted, the counting loop uses two pointers and runs in O(n), keeping the total at O(n log n).
In interviews, these signals suggest a D&C approach:
Divide-and-conquer is one of four major algorithm design paradigms in CLRS. Here is how they relate:
| Paradigm | Strategy | Subproblem Overlap? | CLRS Chapter |
|---|---|---|---|
| Divide & Conquer | Split into independent subproblems, recurse, combine | No — each subproblem is fresh | Ch 4 (this lesson) |
| Dynamic Programming | Split into overlapping subproblems, memoize | Yes — same subproblem solved many times | Ch 15 |
| Greedy | Make locally optimal choice at each step | No recursion needed | Ch 16 |
| Backtracking | Explore all possibilities, prune dead ends | Depends on problem | Ch 34 (NP-completeness) |
The most common algorithm design interview question is: "Should I use D&C or DP?" Here is a concrete decision procedure:
| Problem | Subproblem Overlap? | Best Approach | Why |
|---|---|---|---|
| Max subarray | No — left and right halves are independent | D&C (or Kadane's) | Each subproblem appears exactly once |
| Fibonacci | Yes — fib(3) computed many times | DP (memoization) | Exponential without memo, O(n) with |
| Matrix chain multiplication | Yes — same subchains reappear | DP (tabulation) | O(n³) DP vs exponential naive recursion |
| Merge sort | No — each subarray is unique | D&C | O(n log n), no repeating subproblems |
| Longest common subsequence | Yes — same (i,j) pairs reappear | DP (tabulation) | O(mn) table |
You now have three tools for analyzing any D&C recurrence T(n) = aT(n/b) + f(n):
And you have seen D&C applied to four domains: sorting (merge sort), optimization (max subarray), linear algebra (Strassen), and number theory (Karatsuba). The pattern is always the same: divide, conquer, combine. What changes is where the cleverness lives — in the divide step, the combine step, or in reducing the number of subproblems.
| Concept | Key Takeaway | Interview Relevance |
|---|---|---|
| D&C paradigm | Divide, conquer, combine — every D&C follows this template | Recognize problems that split naturally into independent subproblems |
| Max subarray | Left/right/crossing decomposition — the crossing case is what D&C uniquely provides | Kadane's O(n) is better here, but the technique generalizes |
| Strassen | Reducing subproblem count (8 → 7) changes the exponent asymptotically | Shows that small constant-factor improvements can compound exponentially |
| Recurrences | T(n) = aT(n/b) + f(n) captures all D&C algorithms | Must be able to write and solve recurrences on a whiteboard |
| Recursion trees | Visual method: draw tree, sum per level, identify geometric series | Best tool for generating guesses and handling non-standard recurrences |
| Master theorem | Three cases based on comparing f(n) with nlogba | Instant solution for standard recurrences — must know cold |
| Closest pair | Strip has O(1) comparisons per point — the non-obvious packing argument | Classic interview problem, tests ability to argue about the combine step |
| Karatsuba | Same trick as Strassen: reduce 4 multiplications to 3 | Shows D&C applies to number theory, not just arrays |
The single most important skill from this chapter is writing and solving recurrences. Every algorithm in the next 20 chapters of CLRS builds on this foundation. When you encounter a new algorithm, your first question should be: "What is the recurrence?" and your second should be: "Which master theorem case does it fall into?"
D&C is not always the right tool. Here are situations where it fails or underperforms:
| Situation | Why D&C Fails | Better Approach |
|---|---|---|
| Overlapping subproblems | Same subproblem solved exponentially many times (e.g., naive recursive Fibonacci) | Dynamic programming (memoization) |
| No natural decomposition | Problem does not split into smaller instances of itself (e.g., graph coloring) | Backtracking, heuristics |
| Expensive combine step | If combining costs as much as solving the original, no speedup | Redesign the algorithm or use a different paradigm |
| Sequential dependencies | Each step depends on the result of the previous step (e.g., some streaming problems) | Greedy, online algorithms |
| Small constant-factor overhead | Recursion overhead, cache misses from non-contiguous memory access | Iterative algorithms for small inputs (hybrid approach) |
The best real-world algorithms are hybrids. Python's Timsort uses insertion sort for small runs and merge sort for combining them. BLAS uses Strassen for large matrices and tiled naive for small ones. Quicksort implementations use insertion sort below n ≈ 16. Always think about the crossover point — the input size where the asymptotically faster algorithm actually becomes faster in wall-clock time.
| Domain | D&C Application | Why D&C? |
|---|---|---|
| Databases | External merge sort for disk-based sorting | Data too large for RAM — sort chunks, merge |
| Graphics | BSP trees, kd-trees for spatial partitioning | Recursively divide space to accelerate queries |
| Signal processing | FFT for frequency analysis, convolution | O(n log n) instead of O(n²) for polynomial multiply |
| Compilers | Recursive descent parsing | Naturally recursive grammar structure |
| ML/AI | Decision trees, k-d trees for nearest neighbor | Recursively partition feature space |
| Crypto | Karatsuba for big-integer multiplication | RSA keys are 2048+ bits — O(n²) is too slow |
| MapReduce | Distributed sorting and aggregation | Divide data across machines, merge results |