Partition, randomize, conquer — the fastest practical sorting algorithm.
You have one million numbers to sort. You already know merge sort — it runs in O(n log n) time, guaranteed. But merge sort has a dirty secret: it needs O(n) extra space. For your million numbers, that is another million-element array sitting in memory, used only to merge subarrays together. On an embedded system with 64 KB of RAM, that extra array might not fit. In a database sorting gigabytes of records, doubling memory usage is a non-starter.
Can we sort in O(n log n) time without that extra array? Can we rearrange the elements in-place, using only the original array and a constant amount of extra space?
Heapsort can do it in-place with O(n log n) worst case, but it has terrible cache behavior — it jumps around the array following tree indices (parent, left child, right child), causing cache misses on nearly every access. In practice, heapsort is 3-5x slower than merge sort despite having the same asymptotic complexity.
The answer is quicksort. Invented by Tony Hoare in 1960, it is the default sorting algorithm in most standard libraries — C's qsort, Java's Arrays.sort for primitives, and the backbone of Python's sorted() (which uses Timsort, a merge sort variant, but quicksort-family algorithms dominate for primitives). It works by a deceptively simple idea:
Notice the asymmetry with merge sort. In merge sort, the work happens after the recursion — you split blindly, then merge carefully. In quicksort, the work happens before the recursion — you partition carefully, then the recursion is trivial because the subarrays are already separated. The partition step does all the heavy lifting.
Let us see this in action. The simulation below shows a random array. Click Start to watch quicksort partition around a pivot (highlighted in orange), separate the elements, then recurse on each half. Notice how every partition places exactly one element (the pivot) in its correct final position.
Watch the pivot (orange) land in its final position after each partition. Green bars are sorted.
Two things to notice. First, the pivot always ends up in the right place — if you freeze the animation after any partition, that pivot's bar is exactly where it would be in the fully sorted array. Second, the partition is in-place: no extra array is allocated. Elements are swapped within the original array.
The pivot determines how evenly the array is split. The ideal pivot is the median — it splits the array into two equal halves. But finding the exact median is itself a non-trivial problem (we will see quickselect in Chapter 7). So in practice, we approximate a good pivot using randomization or sampling.
How bad can the pivot be? Think of the pivot's rank (its position in sorted order). If the pivot has rank r, the partition produces subarrays of size r-1 and n-r. The worst case is r=1 or r=n (the minimum or maximum), which gives a 0:(n-1) split. Any other rank gives a better split.
This is the single most important conceptual difference between the two great O(n log n) sorting algorithms, and interviewers love asking about it.
In merge sort, you split the array in half (trivial — just compute the midpoint), recurse on both halves, then merge the two sorted halves together (the hard part). The work happens after the recursion, on the way back up. Splitting costs nothing; merging costs O(n).
In quicksort, you partition the array around a pivot (the hard part), then recurse on both halves. The work happens before the recursion, on the way down. Partitioning costs O(n); there is nothing to do on the way back up because the subarrays are already separated.
| Property | Merge Sort | Quicksort |
|---|---|---|
| Where work happens | After recursion (merge step) | Before recursion (partition step) |
| Split/combine cost | Split: O(1), Merge: O(n) | Partition: O(n), Combine: O(0) |
| Extra space | O(n) for merge buffer | O(1) for partition (plus O(log n) stack) |
| Worst case | Always O(n log n) | O(n²) with bad pivots |
| Cache behavior | Poor (merge reads from 2 arrays + writes to 1) | Excellent (partition scans sequentially) |
The catch? Quicksort's worst case is O(n²). If the pivot is always the smallest or largest element, each partition only peels off one element, and we get n levels of recursion with O(n) work each. But this worst case is avoidable — with randomized pivot selection, the expected time is O(n log n) for any input. That is why quicksort is called "quick."
Tony Hoare invented quicksort in 1960, while he was a visiting student at Moscow State University. He was working on a machine translation project and needed to sort words into alphabetical order to look them up in a Russian-English dictionary stored in sorted order on magnetic tape. He first thought of insertion sort, but it was too slow. Then he had the insight: pick a word at random, split the dictionary into words before and after it, and sort the two halves independently.
He later recalled: "I was very pleased with my invention, but I was totally unable to prove that it always worked." It took another year before Hoare published the proof of correctness and the average-case analysis. The paper, "Quicksort" (Computer Journal, 1962), is one of the most cited in computer science. Hoare went on to win the Turing Award in 1980, partly for this contribution.
The algorithm's elegance lies in its recursive structure: one operation (partition) that places a single element in its correct position, applied recursively until every element has been placed. No temporary arrays, no complex data structures — just comparisons and swaps.
Sixty years later, quicksort (in its evolved forms) remains the most-used sorting algorithm in practice. Not bad for an idea born from a translation problem.
Quicksort is a divide-and-conquer algorithm with three steps:
Let us count the elements of this pattern more carefully. The partition step takes Θ(n) time (one linear scan). Each recursive call handles a subarray that is strictly smaller (by at least one element — the pivot, which is excluded from both recursive calls). So the recursion always terminates — the subarray shrinks by at least one element per level. The total work depends on how balanced the splits are, which we will analyze in detail in Chapter 3.
To build more intuition, let us trace quicksort on a tiny array: [3, 1, 2].
Two comparisons for three elements. The optimal lower bound for sorting 3 elements is ceil(log2(3!)) = ceil(log2(6)) = 3 comparisons. But we got lucky — the pivot happened to be the median. If the pivot were the minimum (e.g., input [2, 3, 1] with pivot 1), we would get a 0:2 split and need 3 comparisons total.
Let us also trace a small example where the pivot is NOT the median to see the unbalanced case:
This is the worst-case structure in miniature: the pivot is the extreme element, giving a 0:(n-1) split. One level of recursion is "wasted" because one subarray is empty. Scale this up to n elements and you get n levels of recursion.
Partition is the engine of quicksort. Everything depends on it. Get partition right, and quicksort is trivial — three lines of recursion. Get partition wrong, and everything falls apart: infinite loops, wrong output, stack overflows.
The partition subroutine takes a subarray A[lo..hi], picks a pivot element, and rearranges elements so that everything ≤ pivot is on the left and everything > pivot is on the right. After partition returns, the pivot is at some index q, sitting in its final sorted position — the exact index it would occupy in the fully sorted array. This is done in-place with O(n) time and O(1) extra space.
Think of partition as a bouncer at a nightclub. The pivot is the VIP threshold. Everyone at or below the threshold goes to the left room. Everyone above goes to the right room. The bouncer (partition) makes one pass through the line, checking each person, and at the end the two rooms are filled and the threshold is placed exactly at the door between them.
CLRS uses the Lomuto partition scheme, named after Nico Lomuto. It is not the fastest partition in practice (we will see Hoare's in Chapter 5), but it is the easiest to understand and prove correct. Here is how it works:
Let us trace Lomuto partition on the array [2, 8, 7, 1, 3, 5, 6, 4] with pivot = 4 (the last element). We track i (boundary of the ≤ region) and j (the scanner).
| Step | j | A[j] | A[j] ≤ 4? | Action | i | Array after |
|---|---|---|---|---|---|---|
| Init | - | - | - | i = -1 | -1 | [2, 8, 7, 1, 3, 5, 6, 4] |
| 1 | 0 | 2 | Yes | i++, swap A[0]↔A[0] | 0 | [2, 8, 7, 1, 3, 5, 6, 4] |
| 2 | 1 | 8 | No | skip | 0 | [2, 8, 7, 1, 3, 5, 6, 4] |
| 3 | 2 | 7 | No | skip | 0 | [2, 8, 7, 1, 3, 5, 6, 4] |
| 4 | 3 | 1 | Yes | i++, swap A[1]↔A[3] | 1 | [2, 1, 7, 8, 3, 5, 6, 4] |
| 5 | 4 | 3 | Yes | i++, swap A[2]↔A[4] | 2 | [2, 1, 3, 8, 7, 5, 6, 4] |
| 6 | 5 | 5 | No | skip | 2 | [2, 1, 3, 8, 7, 5, 6, 4] |
| 7 | 6 | 6 | No | skip | 2 | [2, 1, 3, 8, 7, 5, 6, 4] |
| Final | - | - | - | swap A[3]↔A[7] | 3 | [2, 1, 3, 4, 7, 5, 6, 8] |
Pivot 4 is now at index 3. Everything to its left (2, 1, 3) is ≤ 4. Everything to its right (7, 5, 6, 8) is > 4. The pivot is in its final sorted position — no matter how we sort the left and right halves, 4 will stay at index 3.
Step through the partition yourself in the simulation below. Click Step to advance one comparison at a time. Green cells are ≤ pivot, red cells are > pivot, gray cells are unexamined.
Click Step to advance. The orange bar is the pivot. Green = ≤ pivot, red = > pivot.
python def lomuto_partition(A, lo, hi): """Partition A[lo..hi] using A[hi] as pivot. Returns the final index of the pivot.""" pivot = A[hi] i = lo - 1 # boundary of the ≤ region for j in range(lo, hi): if A[j] <= pivot: i += 1 A[i], A[j] = A[j], A[i] # grow the ≤ region A[i + 1], A[hi] = A[hi], A[i + 1] # place pivot return i + 1
The function does exactly n - 1 comparisons (j runs from lo to hi - 1) and at most n swaps. Time: O(n). Space: O(1) — just the variables i, j, and pivot.
The correctness of quicksort rests entirely on the partition invariant. Let us state it precisely. At each iteration, the array A[lo..hi] has four regions:
Initialization: Before the loop starts, i = lo - 1 and j = lo. The ≤ region and > region are both empty. The entire array (except the pivot) is unexamined. The invariant holds trivially.
Maintenance: At each step, we examine A[j]. If A[j] ≤ pivot, we grow the ≤ region by swapping A[j] into position i+1 and advancing both i and j. If A[j] > pivot, we just advance j — A[j] is already in the > region (between i+1 and j). Either way, the invariant is maintained.
Termination: When j = hi, the unexamined region is empty. We swap the pivot (A[hi]) with A[i+1], placing it between the ≤ and > regions. The pivot is now at its correct sorted position.
Let us trace partition on [5, 3, 8, 6, 2, 7, 1, 4] with pivot = 4.
| j | A[j] | ≤ 4? | Action | i | Array |
|---|---|---|---|---|---|
| 0 | 5 | No | skip | -1 | [5, 3, 8, 6, 2, 7, 1, 4] |
| 1 | 3 | Yes | i++, swap A[0]↔A[1] | 0 | [3, 5, 8, 6, 2, 7, 1, 4] |
| 2 | 8 | No | skip | 0 | [3, 5, 8, 6, 2, 7, 1, 4] |
| 3 | 6 | No | skip | 0 | [3, 5, 8, 6, 2, 7, 1, 4] |
| 4 | 2 | Yes | i++, swap A[1]↔A[4] | 1 | [3, 2, 8, 6, 5, 7, 1, 4] |
| 5 | 7 | No | skip | 1 | [3, 2, 8, 6, 5, 7, 1, 4] |
| 6 | 1 | Yes | i++, swap A[2]↔A[6] | 2 | [3, 2, 1, 6, 5, 7, 8, 4] |
| done | - | - | swap A[3]↔A[7] | 3 | [3, 2, 1, 4, 5, 7, 8, 6] |
Pivot 4 lands at index 3. Left region [3, 2, 1] — all ≤ 4. Right region [5, 7, 8, 6] — all > 4. Notice the elements within each region are NOT sorted — that is the job of the recursive calls.
We have the partition subroutine. Quicksort itself is almost embarrassingly simple: partition the array, then recursively sort the two halves.
python def quicksort(A, lo, hi): """Sort A[lo..hi] in-place.""" if lo < hi: q = lomuto_partition(A, lo, hi) # pivot at index q quicksort(A, lo, q - 1) # sort left of pivot quicksort(A, q + 1, hi) # sort right of pivot # Usage: A = [2, 8, 7, 1, 3, 5, 6, 4] quicksort(A, 0, len(A) - 1) print(A) # [1, 2, 3, 4, 5, 6, 7, 8]
Three lines of logic. That is the entire algorithm. But let us carefully trace the recursion to see why it works.
Starting with [2, 8, 7, 1, 3, 5, 6, 4]:
| Call | Subarray | Pivot | After partition | Pivot index |
|---|---|---|---|---|
| quicksort(0,7) | [2,8,7,1,3,5,6,4] | 4 | [2,1,3,4,7,5,6,8] | 3 |
| quicksort(0,2) | [2,1,3] | 3 | [2,1,3] | 2 |
| quicksort(0,1) | [2,1] | 1 | [1,2] | 0 |
| quicksort(4,7) | [7,5,6,8] | 8 | [7,5,6,8] | 7 |
| quicksort(4,6) | [7,5,6] | 6 | [5,6,7] | 5 |
Each recursive call handles a smaller subarray. When a subarray has 0 or 1 elements (lo ≥ hi), we return immediately — a single element is already sorted. The pivot at each level is permanently placed, and eventually every element has been a pivot.
Let us visualize the recursion. Each node represents a call to quicksort(lo, hi), and the pivot chosen is shown in parentheses.
Count the partition operations: 5 non-trivial calls (the ones that actually partition). Each partition does O(subarray size) work. Total work: 8 + 3 + 2 + 4 + 3 = 20 comparisons. For n=8, O(n log n) would be about 8 × 3 = 24, so we are right in the ballpark.
How much memory does the recursion use? Each call to quicksort pushes a stack frame containing lo, hi, q, and the return address — about 32 bytes on a 64-bit machine. The maximum stack depth equals the maximum recursion depth.
Compare this to merge sort, which also uses O(log n) stack frames (always balanced recursion) but additionally allocates O(n) auxiliary space for the merge buffer — 4 MB for 1 million 32-bit integers. Quicksort's in-place nature is its biggest practical advantage.
Why is the final array sorted? By induction on the subarray size:
Base case: An array of 0 or 1 elements is trivially sorted. When lo ≥ hi, quicksort returns immediately without modifying the array. This is correct because a 0- or 1-element subarray is already sorted.
Inductive step: Assume quicksort correctly sorts arrays smaller than n. For an array of size n, partition places the pivot at index q. All elements in A[lo..q-1] are ≤ pivot, and all elements in A[q+1..hi] are > pivot. By the inductive hypothesis, the recursive calls sort A[lo..q-1] and A[q+1..hi] (both are smaller than n, since the pivot is excluded from both). After both recursive calls return, the full array A[lo..hi] is sorted: everything left of q is sorted and ≤ A[q], everything right of q is sorted and > A[q], and A[q] is in its correct position.
Termination: Each recursive call operates on a strictly smaller subarray (at least one element smaller, because the pivot is excluded). So the recursion reaches the base case in at most n levels.
Consider sorting cards by value, where we have [5♠, 3♥, 5♥, 1♦] with pivot = 5♥ (last element). After Lomuto partition:
The swap in step j=2 moved (5,a) from position 1 (where it had been swapped) to position 2 and later beyond the pivot. The relative order of equal elements is not preserved.
Quicksort does not allocate an auxiliary array — it sorts in-place. But it is not truly O(1) space. The recursion stack uses space proportional to the recursion depth. In the best case, the recursion depth is O(log n) (balanced partitions). In the worst case, it is O(n) (maximally unbalanced). We will fix the worst case in Chapter 4 and Chapter 6.
Watch the full quicksort execution below. The simulation highlights the active subarray, shows the pivot selection, partition, and recursive calls. Notice how the sorted region (green) grows as pivots land in their final positions.
Watch the recursion unfold. Orange = pivot, blue = active subarray, green = sorted.
Quicksort's performance depends entirely on how balanced the partitions are. Let us derive the running time for three scenarios.
The worst case occurs when every partition produces the most unbalanced split possible: one side has n-1 elements and the other has 0 elements. This happens when the pivot is always the minimum or maximum element — for example, when the input is already sorted and we always pick the last element as pivot.
The recursion tree is a long chain: n levels, each doing O(n), O(n-1), ..., O(1) work. Total: n + (n-1) + ... + 1 = n(n+1)/2 = O(n²). This is as bad as insertion sort — and it happens on already-sorted input, which is arguably the most common input in practice.
Let us trace the worst case on [1, 2, 3, 4, 5] with the last element as pivot.
| Call | Subarray | Pivot | After partition | Split | Comparisons |
|---|---|---|---|---|---|
| qs(0,4) | [1,2,3,4,5] | 5 | [1,2,3,4,5] | 4:0 | 4 |
| qs(0,3) | [1,2,3,4] | 4 | [1,2,3,4] | 3:0 | 3 |
| qs(0,2) | [1,2,3] | 3 | [1,2,3] | 2:0 | 2 |
| qs(0,1) | [1,2] | 2 | [1,2] | 1:0 | 1 |
| qs(0,0) | [1] | - | [1] | base | 0 |
Total comparisons: 4 + 3 + 2 + 1 = 10 = n(n-1)/2. For n = 1000, that is 499,500 comparisons. For n = 1,000,000, it is 499,999,500,000 — half a trillion comparisons. Every element is a pivot, but each partition only peels off the pivot itself, leaving n-1 elements for the next level. The pivot is always the maximum of the remaining subarray because the input is sorted.
The best case occurs when every partition splits the array exactly in half:
The recursion tree is perfectly balanced: log&sub2;(n) levels, each doing Θ(n) total work across all nodes. Total: O(n log n).
Here is the surprising result. Even if the partition is not perfectly balanced, quicksort is still O(n log n) as long as the split is proportional. Consider a 9:1 split — seems terrible, right?
The recursion tree has two branches of different lengths. The short branch (n/10 side) bottoms out at depth log10(n). The long branch (9n/10 side) bottoms out at depth log10/9(n). But at every level of the tree, the total work across all nodes is at most cn (because each element participates in at most one partition call per level). The total number of levels is log10/9(n) = O(log n). So the total work is O(n log n).
What if we alternate between best-case and worst-case splits? Suppose at even levels we get a balanced split and at odd levels we get a worst-case split. Is this still O(n log n)?
The bad partition costs an extra O(n) at that level but does not change the asymptotic bound. The good partition that follows immediately "repairs" the damage. This is why even mediocre pivot selection gives O(n log n) on average — you need consistently bad pivots (all worst-case) to get O(n²).
Let us put real numbers to these bounds for n = 1,000,000 (one million):
| Case | Depth | Total comparisons | Time at 1 GHz |
|---|---|---|---|
| Worst case | 1,000,000 | ~500,000,000,000 (500 billion) | ~500 seconds |
| Best case | 20 | ~20,000,000 | ~0.02 seconds |
| Average case | ~40 | ~28,000,000 (1.39n log n) | ~0.03 seconds |
The worst case is 25,000 times slower than the average case. This is not a theoretical concern — it happens in practice on sorted or nearly-sorted input, which is common (think: appending new records to an already-sorted database). This is why randomized pivot selection or median-of-three is essential.
The simulation below shows three recursion trees side-by-side: worst case (sorted input, always pick last element), best case (perfect median pivot), and average case (random pivots). Watch how the total work at each level stays consistent but the tree shape changes dramatically.
Compare the tree shapes. Worst case is a chain. Best case is balanced. Average case is slightly unbalanced but still O(log n) deep.
Why is the average case O(n log n) even though some splits are terrible? Here is the intuitive argument. A random pivot has a 50% chance of landing in the middle 50% of the sorted array (between the 25th and 75th percentile). If it does, the larger partition has at most 3n/4 elements. So on average, every other level has a good split.
After two levels (one bad, one good), the problem size drops from n to at most 3n/4. After 2 log4/3(n) levels, the problem size reaches 1. Since log4/3(n) = O(log n), the depth is O(log n). Each level does O(n) work (the partition scans), so the total is O(n log n).
More precisely, the expected number of levels to reduce from n to 1 is:
| Case | Pivot selection | Split ratio | Depth | Time |
|---|---|---|---|---|
| Worst | Always min or max | n-1 : 0 | n | Θ(n²) |
| Best | Always median | n/2 : n/2 | log n | Θ(n log n) |
| Average | Random element | Varies | O(log n) | Θ(n log n) |
| 9:1 split | 90th percentile | 9n/10 : n/10 | log10/9 n | Θ(n log n) |
The worst case of O(n²) is triggered by a specific input pattern (sorted data) combined with a specific pivot choice (always last element). An adversary who knows your algorithm can always craft an input that triggers the worst case. How do we defend against this? By making the algorithm randomized: choose the pivot uniformly at random.
python import random def randomized_partition(A, lo, hi): """Pick a random pivot, then use Lomuto partition.""" r = random.randint(lo, hi) # random index in [lo, hi] A[r], A[hi] = A[hi], A[r] # swap random element to end return lomuto_partition(A, lo, hi) def randomized_quicksort(A, lo, hi): if lo < hi: q = randomized_partition(A, lo, hi) randomized_quicksort(A, lo, q - 1) randomized_quicksort(A, q + 1, hi)
One extra line — swap a random element to the pivot position — and the worst case disappears probabilistically. No adversarial input can consistently trigger O(n²) because the adversary does not know which random choices we will make.
This is a profound principle in algorithm design: randomness defeats adversaries. A randomized algorithm with good expected performance is often preferable to a deterministic algorithm with good worst-case performance, because the randomized algorithm has good performance on ALL inputs (in expectation), while the deterministic algorithm may be slow on specific inputs.
Let us prove that randomized quicksort makes O(n log n) expected comparisons. This is one of the most elegant probabilistic analyses in CLRS.
Label the elements by their rank in sorted order: z1 < z2 < ... < zn. Define an indicator random variable Xij that equals 1 if zi and zj are ever compared, and 0 otherwise. The total number of comparisons is:
By linearity of expectation:
When are zi and zj compared? Only when one of them is chosen as pivot while the set {zi, zi+1, ..., zj} is still together in the same subarray. If any element zk with i < k < j is chosen as pivot first, then zi and zj end up on opposite sides and are never compared. So zi and zj are compared if and only if the first element from {zi, zi+1, ..., zj} to be chosen as pivot is zi or zj.
There are j - i + 1 elements in the set, each equally likely to be chosen first, and exactly 2 of them (zi and zj) trigger a comparison. Now we sum:
Let us compute the expected comparisons for n = 8 to make this real, not just asymptotic.
Adjacent elements (k=1) are always compared — if zi and zi+1 are adjacent in sorted order, the first one chosen as pivot must be one of them. Distant elements (k=7, i.e., the min and max) are compared only 25% of the time — it is likely that some element between them will be chosen as pivot first, separating them.
The expected time is O(n log n), but how likely is it that we actually get close to O(n²)? The answer: astronomically unlikely. By Chebyshev's inequality and more refined concentration bounds, the probability that randomized quicksort takes more than cn log n time (for a constant c > 2) decreases exponentially in n. For n = 1,000,000, the probability of exceeding 10 × n log n comparisons is less than 10-100. You are more likely to be hit by a meteorite while writing the sort function.
Randomized quicksort is a Las Vegas algorithm: it always produces the correct answer (a sorted array), but the running time varies. The expected time is O(n log n), but any particular run might be faster or slower. Contrast this with a Monte Carlo algorithm, which always runs in a fixed time but might produce a wrong answer with some small probability.
The distinction matters in practice. You never need to check if randomized quicksort produced the right answer — it always does. You just cannot predict exactly how long it will take (though the expected time and the variance are well understood).
For the interview, remember: when someone says "expected O(n log n)", they mean a Las Vegas algorithm. The randomness is in the algorithm's choices (pivot selection), not in the input. Even on adversarial input, the expected time (averaging over the algorithm's random choices) is O(n log n).
This is subtly different from "average-case O(n log n)" which means: if the input is drawn uniformly at random, the deterministic algorithm runs in O(n log n). An adversary can defeat a deterministic algorithm by choosing the input; they cannot defeat a randomized algorithm because the random choices are hidden from them.
What if the adversary can observe your random choices as you make them? For example, what if the adversary can somehow feed elements into the array adaptively? In the oblivious adversary model (the standard one), the adversary must commit to the input before the algorithm runs. Against an oblivious adversary, randomized quicksort is O(n log n) expected for any input.
Against an adaptive adversary (who can see your random choices and modify the input on the fly), randomized quicksort can be forced into O(n²). But this model is relevant only in cryptographic or adversarial computation settings, not in standard sorting.
The simulation below is dramatic. We sort the same already-sorted array twice: once with deterministic pivot (last element) and once with randomized pivot. Watch the recursion depth explode in the deterministic case.
Left: last-element pivot (worst case). Right: random pivot. Same sorted input array.
Lomuto partition is clean and simple, but it is not the only game in town. Real-world sorting libraries use more sophisticated partition schemes. Let us examine two important variants.
Tony Hoare's original partition (from his 1962 paper) uses two pointers that start at opposite ends of the array and walk toward each other. It is more elegant and does fewer swaps on average.
python def hoare_partition(A, lo, hi): """Hoare's partition: two pointers moving inward.""" pivot = A[lo] # pivot is the first element i = lo - 1 j = hi + 1 while True: # Move i right, skipping elements < pivot i += 1 while A[i] < pivot: i += 1 # Move j left, skipping elements > pivot j -= 1 while A[j] > pivot: j -= 1 # If pointers crossed, partition is done if i >= j: return j # Swap out-of-place elements A[i], A[j] = A[j], A[i]
Key differences from Lomuto:
| Property | Lomuto | Hoare |
|---|---|---|
| Pivot position | Last element | First element (or any; swap first) |
| Pointers | One pointer (j), one boundary (i) | Two pointers (i, j) moving inward |
| Average swaps | ~n/2 | ~n/6 (three times fewer!) |
| Pivot placement | Pivot ends at its final position | Pivot may NOT be at its final position |
| Return value | Pivot index q; recurse on [lo,q-1] and [q+1,hi] | Split point j; recurse on [lo,j] and [j+1,hi] |
| All-equal input | O(n²) — always n-1:0 split | O(n log n) — roughly balanced split |
The full quicksort using Hoare partition looks slightly different because Hoare returns a split point, not the pivot's final index:
python def quicksort_hoare(A, lo, hi): """Quicksort using Hoare's partition scheme.""" if lo < hi: j = hoare_partition(A, lo, hi) quicksort_hoare(A, lo, j) # NOTE: j, not j-1 quicksort_hoare(A, j + 1, hi) # NOTE: j+1, not j+2
The subtlety: Hoare's partition guarantees that A[lo..j] ≤ A[j+1..hi], but the pivot might be anywhere in A[lo..j]. So we recurse on A[lo..j] (inclusive of j, because j might not be the pivot) and A[j+1..hi]. Getting this wrong — e.g., recursing on (lo, j-1) — will either miss elements or cause infinite recursion.
Trace Hoare's partition on [8, 1, 6, 4, 0, 3, 9, 5] with pivot = A[0] = 8.
| Step | i | j | A[i] | A[j] | Action | Array after |
|---|---|---|---|---|---|---|
| Init | -1 | 8 | - | - | - | [8, 1, 6, 4, 0, 3, 9, 5] |
| 1 | 0 | 7 | 8 | 5 | i stops at 8 (≥8), j stops at 5 (≤8). Swap. | [5, 1, 6, 4, 0, 3, 9, 8] |
| 2 | 6 | 5 | 9 | 3 | i scans right, stops at 9 (≥8). j scans left, stops at 3 (≤8). i<j? 6>5, no! Return j=5. | [5, 1, 6, 4, 0, 3, 9, 8] |
Hoare returns j=5. The array is partitioned: A[0..5] = [5, 1, 6, 4, 0, 3] (all ≤ 8) and A[6..7] = [9, 8] (all ≥ 8). Notice the pivot 8 is NOT at position 5 — it has moved to position 7. This is the key difference from Lomuto: Hoare does not place the pivot in its final position. That is why with Hoare, we recurse on [lo..j] and [j+1..hi] (not [lo..q-1] and [q+1..hi]).
Also notice: only 1 swap was needed (positions 0 and 7). Lomuto would have needed ~4 swaps on this input. This is why Hoare's partition is preferred in practice — it does about 3x fewer swaps on average.
Consider the input [5, 5, 5, 5, 5] with both partition schemes:
Lomuto: Pivot = 5 (last element). Every element is ≤ 5, so the boundary i advances to the end. The pivot swaps with itself at position 4. Split: (4, 0). The next call has [5, 5, 5, 5], and the same thing happens. This is an (n-1):0 split every time, giving O(n²).
Hoare: Pivot = 5 (first element). i starts at -1, j starts at 5. i advances to 0 (A[0] = 5 ≥ 5, stop). j decrements to 4 (A[4] = 5 ≤ 5, stop). Swap A[0] and A[4]. i advances to 1, j decrements to 3. Swap. i=2, j=2, they cross. Return j=2. Split: (0..2) and (3..4) — roughly balanced! Hoare gives O(n log n) on all-equal input where Lomuto gives O(n²).
quicksort(lo, j) and quicksort(j+1, hi), NOT quicksort(lo, q-1) and quicksort(q+1, hi). Using the wrong bounds is one of the most common bugs in quicksort implementations. The pivot might be anywhere in A[lo..j] — it is not at index j specifically. This also means Hoare's partition cannot be used with quickselect directly (you would need to check if the pivot ended up at index k separately).What happens when many elements are equal to the pivot? Both Lomuto and Hoare still do O(n) work on the "equal" elements in subsequent recursive calls. With many duplicates, this wastes enormous time.
Three-way partition (Dijkstra's Dutch National Flag problem) splits the array into three regions: elements < pivot, elements = pivot, and elements > pivot. All the equal elements are placed together and excluded from future recursion.
python def three_way_partition(A, lo, hi): """Dutch National Flag partition. Returns (lt, gt) such that: A[lo..lt-1] < pivot A[lt..gt] = pivot A[gt+1..hi] > pivot""" pivot = A[lo] lt = lo # boundary of < region i = lo + 1 # current element gt = hi # boundary of > region while i <= gt: if A[i] < pivot: A[lt], A[i] = A[i], A[lt] lt += 1 i += 1 elif A[i] > pivot: A[gt], A[i] = A[i], A[gt] gt -= 1 # don't increment i — swapped element is unexamined else: i += 1 # equal to pivot, just advance return lt, gt def quicksort_3way(A, lo, hi): if lo < hi: lt, gt = three_way_partition(A, lo, hi) quicksort_3way(A, lo, lt - 1) # sort < region quicksort_3way(A, gt + 1, hi) # sort > region # = region is DONE — all duplicates of pivot are in place
Arrays.sort uses a dual-pivot variant with three-way partitioning.Trace Dijkstra's three-way partition on [4, 2, 4, 1, 4, 3, 4, 2] with pivot = 4 (A[0]).
| Step | i | lt | gt | A[i] | Action | Array |
|---|---|---|---|---|---|---|
| Init | 1 | 0 | 7 | 2 | - | [4, 2, 4, 1, 4, 3, 4, 2] |
| 1 | 1 | 0 | 7 | 2 | 2 < 4: swap A[lt]↔A[i], lt++, i++ | [2, 4, 4, 1, 4, 3, 4, 2] |
| 2 | 2 | 1 | 7 | 4 | 4 = 4: i++ | [2, 4, 4, 1, 4, 3, 4, 2] |
| 3 | 3 | 1 | 7 | 1 | 1 < 4: swap A[lt]↔A[i], lt++, i++ | [2, 1, 4, 4, 4, 3, 4, 2] |
| 4 | 4 | 2 | 7 | 4 | 4 = 4: i++ | [2, 1, 4, 4, 4, 3, 4, 2] |
| 5 | 5 | 2 | 7 | 3 | 3 < 4: swap A[lt]↔A[i], lt++, i++ | [2, 1, 3, 4, 4, 4, 4, 2] |
| 6 | 6 | 3 | 7 | 4 | 4 = 4: i++ | [2, 1, 3, 4, 4, 4, 4, 2] |
| 7 | 7 | 3 | 7 | 2 | 2 < 4: swap A[lt]↔A[i], lt++, i++ | [2, 1, 3, 2, 4, 4, 4, 4] |
| Done | 8 | 4 | 7 | - | i > gt, stop | [2, 1, 3, 2, 4, 4, 4, 4] |
Result: lt=4, gt=7. The array is split into three regions:
• A[0..3] = [2, 1, 3, 2] — all < 4 (needs further sorting)
• A[4..7] = [4, 4, 4, 4] — all = 4 (DONE, never recursed into!)
Four out of eight elements were equal to the pivot and are immediately in their final positions. With standard Lomuto partition, those four elements would participate in several more levels of recursion.
Three-way quicksort is actually entropy-optimal for inputs with many duplicates. The entropy of a sequence is:
where nk is the count of the k-th distinct value. Three-way quicksort runs in O(nH) time, which is O(n log n) when all elements are distinct (H = log n) and O(n) when all elements are equal (H = 0). This is provably optimal — no comparison-based sorting algorithm can do better.
The simulation below compares all three partition schemes on the same array. Try the "Many Duplicates" button to see three-way partitioning shine — the swap counts tell the story.
Compare swap counts across partition variants. Try "Many Duplicates" to see 3-way win.
This is the payoff. Everything you have learned — Lomuto, Hoare, three-way, randomized pivot, median-of-three — comes together in one interactive sorting laboratory. You control the algorithm, the input distribution, and the speed. Watch comparisons, swaps, and recursion depth in real-time.
std::sort actually does.Instead of picking the last element or a random element as pivot, examine three elements — the first, middle, and last — and use the median of the three. This avoids the worst case on sorted and reverse-sorted input (the most common adversarial patterns) with minimal overhead.
python def median_of_three(A, lo, hi): """Move the median of A[lo], A[mid], A[hi] to A[hi].""" mid = (lo + hi) // 2 if A[lo] > A[mid]: A[lo], A[mid] = A[mid], A[lo] if A[lo] > A[hi]: A[lo], A[hi] = A[hi], A[lo] if A[mid] > A[hi]: A[mid], A[hi] = A[hi], A[mid] # Now A[lo] ≤ A[mid] ≤ A[hi]. Use A[mid] as pivot. A[mid], A[hi - 1] = A[hi - 1], A[mid] return A[hi - 1]
Introsort (Musser, 1997) begins with quicksort but monitors the recursion depth. If it exceeds 2·floor(log2(n)), we know we have hit a bad pivot sequence. Introsort bails out and switches to heapsort, which guarantees O(n log n) worst case. The result: O(n log n) worst case with quicksort's excellent average-case constant factors.
python import math def introsort(A, lo, hi, depth_limit): if hi - lo < 16: insertion_sort(A, lo, hi) # small arrays: insertion sort return if depth_limit == 0: heapsort(A, lo, hi) # depth exceeded: heapsort return # Otherwise: quicksort with median-of-three pivot = median_of_three(A, lo, hi) q = lomuto_partition(A, lo, hi) introsort(A, lo, q - 1, depth_limit - 1) introsort(A, q + 1, hi, depth_limit - 1) def sort(A): """Production-grade sort: introsort.""" n = len(A) depth_limit = 2 * int(math.log2(n)) if n > 1 else 0 introsort(A, 0, n - 1, depth_limit)
Now — the big simulation. Sort up to 120 elements with your choice of algorithm and input pattern.
Choose an algorithm and input type. Watch comparisons, swaps, and recursion depth in real-time.
Quicksort has overhead per recursive call: function call, partition setup, and base-case checks. For tiny subarrays (n < 16), this overhead exceeds the actual sorting work. Insertion sort has almost zero overhead and is very fast on small arrays because:
(1) It compiles to a tight inner loop — compare, shift, repeat.
(2) Small arrays fit entirely in L1 cache, so memory access is instant.
(3) On nearly-sorted data (which small subarrays from quicksort often are), insertion sort is O(n).
python def insertion_sort(A, lo, hi): """Sort A[lo..hi] using insertion sort. Optimal for small subarrays (n < ~16).""" for i in range(lo + 1, hi + 1): key = A[i] j = i - 1 while j >= lo and A[j] > key: A[j + 1] = A[j] j -= 1 A[j + 1] = key
The threshold (16, 24, or 32) varies by implementation and hardware. C++ std::sort typically uses 16. Java's dual-pivot quicksort uses 47. The optimal value can be found by benchmarking on the target machine.
Let us think about this quantitatively. Quicksort on a subarray of size k has overhead: function call (~5 ns), partition setup (~3 ns), base-case check (~1 ns), plus the actual partition work. For k = 4, the partition does 3 comparisons and ~1.5 swaps. Total: ~15 ns of overhead + ~10 ns of real work = ~25 ns. Insertion sort on k = 4 does at most 6 comparisons (worst case 4·3/2 = 6) with a tight inner loop: ~12 ns total. The crossover point is where quicksort's overhead exceeds insertion sort's total work.
Vladimir Yaroslavskiy's dual-pivot quicksort (2009) uses two pivots to create three partitions:
Why is this faster? It does more comparisons per element (~1.9n vs ~1.4n per level), but the number of levels is fewer: each partition shrinks the subarray to ~n/3 instead of ~n/2, so the depth is log3(n) instead of log2(n). The net result is slightly fewer total cache misses and slightly fewer total swaps. Java benchmarks show a ~10% speedup over single-pivot quicksort on modern hardware.
Try these experiments:
| Experiment | Algorithm | Input | What to observe |
|---|---|---|---|
| 1 | Basic | Sorted | Watch the recursion depth explode — O(n) depth, O(n²) total work |
| 2 | Randomized | Sorted | Same input, but recursion depth stays ~O(log n) |
| 3 | Basic | Many Duplicates | Lots of wasted swaps on equal elements |
| 4 | Introsort | Sorted | Detects the pathological case and switches to heapsort |
| 5 | Median-of-Three | Random | Consistently balanced partitions with minimal overhead |
| 6 | Randomized | Many Duplicates | Still does extra work on duplicates — compare to 3-way |
| 7 | Basic | Nearly Sorted | Almost as bad as fully sorted — a few swaps don't help much |
After running these experiments, you should see these patterns:
Comparisons tell you about the algorithm's time complexity. O(n²) algorithms will show comparison counts in the tens of thousands for n=100. O(n log n) algorithms will show ~700 comparisons for n=100.
Swaps tell you about the algorithm's data movement. Fewer swaps = less time spent writing to memory. Hoare's partition typically does 3x fewer swaps than Lomuto.
Max depth tells you about stack space. O(n) depth means the algorithm could stack overflow on large inputs. O(log n) depth (5-7 for n=100) means the stack is safe. Introsort caps depth at 2 log n.
The showcase simulation lets you verify every claim in this lesson empirically. Do not just read about quicksort — watch it.
Sometimes you do not need to sort the entire array. You just need the k-th smallest element. Finding the median of a dataset. Finding the 95th percentile of response times. Finding the top-10 most frequent search queries. Must you sort the entire array first?
Sorting costs O(n log n). But finding a single order statistic (the k-th smallest element) should intuitively cost less — you are asking one question about one element, not fully ordering all n elements. Can we do better?
No. Quickselect (also by Hoare, 1961) uses the partition subroutine but only recurses into one side — the side that contains the k-th element. This reduces the expected time from O(n log n) to O(n).
After partitioning, the pivot is at index q. Three cases:
python def quickselect(A, lo, hi, k): """Return the k-th smallest element (0-indexed) in A[lo..hi].""" if lo == hi: return A[lo] q = randomized_partition(A, lo, hi) if k == q: return A[q] elif k < q: return quickselect(A, lo, q - 1, k) else: return quickselect(A, q + 1, hi, k) # Find the median of an array: A = [7, 3, 9, 1, 5, 8, 2] median = quickselect(A, 0, len(A) - 1, len(A) // 2) print(median) # 5
In quicksort, we recurse on both sides: T(n) = T(left) + T(right) + O(n). In quickselect, we recurse on only one side. With a random pivot, the expected size of that side is n/2 (on average, the pivot is near the median). So:
The geometric series converges to 2. We do at most 2n comparisons in expectation. Compare this to sorting, which requires Ω(n log n) comparisons. By only pursuing one branch of the recursion, we turned an O(n log n) algorithm into an O(n) algorithm.
Think of it this way: quicksort does O(n) work per level and has O(log n) levels, giving O(n log n). Quickselect does O(n) work at level 1, O(n/2) at level 2, O(n/4) at level 3, and so on — because it discards half the elements at each level. The total is a geometric series that converges to O(n).
| Task | Best algorithm | Time |
|---|---|---|
| Sort the entire array | Quicksort / introsort | O(n log n) |
| Find the k-th smallest | Quickselect | O(n) expected |
| Find the median | Quickselect with k = n/2 | O(n) expected |
| Find the top k elements (unsorted) | Quickselect with k, then take A[n-k:] | O(n) expected |
| Find the top k elements (sorted) | Quickselect + sort the k elements | O(n + k log k) |
| Find the top k elements (streaming, unknown n) | Min-heap of size k | O(n log k) |
The median of medians algorithm (also called BFPRT after its inventors: Blum, Floyd, Pratt, Rivest, Tarjan) guarantees a good pivot. The idea:
Find the 3rd smallest element (0-indexed) in [7, 3, 9, 1, 5, 8, 2, 6].
| Step | Subarray | Pivot | After partition | Pivot index q | k vs q |
|---|---|---|---|---|---|
| 1 | [7,3,9,1,5,8,2,6] | 6 (random) | [3,1,5,2,6,8,9,7] | 4 | 3 < 4 → go left |
| 2 | [3,1,5,2] | 2 (random) | [1,2,5,3] | 1 | 3 > 1 → go right |
| 3 | [5,3] | 3 (random) | [3,5] | 2 | 3 = target index → but we are in subarray starting at idx 2, so global idx = 2+0=2... wait, let us be more careful. |
Actually, let us trace this more carefully with global indices. The array is A = [7, 3, 9, 1, 5, 8, 2, 6]. We want k=3 (the element that would be at index 3 in sorted order, which is 5 since sorted = [1, 2, 3, 5, 6, 7, 8, 9]).
Selection algorithms are used constantly in real systems, often without people realizing it:
Databases: SELECT * FROM orders ORDER BY amount LIMIT 10 does not need to sort all orders. It needs the top 10, which is a selection problem.
Machine learning: Finding the median of a feature column for splitting in a decision tree. Computing percentiles for normalization.
Statistics: Interquartile range (IQR) requires finding the 25th and 75th percentiles. Outlier detection often uses percentile-based thresholds.
Image processing: Median filtering (replacing each pixel with the median of its neighbors) requires O(k) selection per pixel, not O(k log k) sorting.
Networking: Finding the 95th percentile latency from a stream of request times. Load balancers use percentile-based health checks.
Graphics: K-nearest-neighbor search often uses selection to find the k-th closest point, then takes all points closer than it.
Watch quickselect find the k-th element. Notice how it only recurses into one partition — the work drops geometrically each level.
Watch quickselect find the k-th smallest. Only ONE side is recursed into (highlighted). Gray elements are eliminated.
Quicksort's O(n log n) average case is the same as merge sort and heapsort. So why is quicksort the default in most standard libraries? The answer lies in constant factors, memory access patterns, and practical engineering.
Cache-friendliness. Quicksort accesses elements sequentially — the partition scan moves linearly through the array. Modern CPUs have multiple levels of cache (L1, L2, L3), and sequential access patterns exploit spatial locality: when you access A[i], the hardware pre-fetches A[i+1] through A[i+63] into cache. Merge sort, by contrast, alternates between reading from two subarrays and writing to a temporary buffer — three different memory regions that may not all fit in cache. On large arrays, cache misses dominate running time, and quicksort's sequential access pattern is 2-3x faster in practice.
In-place. Quicksort needs O(log n) stack space for recursion, but no auxiliary array. Merge sort needs O(n) extra space for the merge buffer. For sorting 1 GB of data, that is an extra 1 GB of memory. In memory-constrained environments (embedded systems, databases), this is a deal-breaker.
Small constant factors. Quicksort's inner loop (compare, conditionally swap, increment pointer) compiles to very tight machine code — a handful of instructions. Heapsort's inner loop involves index arithmetic for tree navigation (left child = 2i+1, parent = (i-1)/2) and more complex swap patterns, adding overhead even though the asymptotic complexity is the same.
| Situation | Problem with Quicksort | Better Choice |
|---|---|---|
| Need stability | Equal elements may be reordered | Merge sort (stable) |
| Need O(n log n) guarantee | Worst case is O(n²) | Merge sort or introsort |
| Linked lists | No random access for partition | Merge sort (natural for lists) |
| External sorting (disk) | Random access is slow on disk | Merge sort (sequential I/O) |
| Parallel sorting | Partition is sequential bottleneck | Merge sort (natural parallelism) |
| Nearly sorted data | O(n²) without randomization | Timsort (O(n) on sorted data) |
| Language/Library | Algorithm | Details |
|---|---|---|
| C++ std::sort | Introsort | Quicksort + heapsort fallback + insertion sort for small arrays |
| Java Arrays.sort (primitives) | Dual-pivot quicksort | Two pivots create three partitions; switches to insertion sort for small arrays |
| Java Arrays.sort (objects) | Timsort | Merge sort variant optimized for partially sorted data (stable) |
| Python sorted() | Timsort | Merge sort + insertion sort, stable, O(n) on sorted input |
| Rust sort_unstable | Pattern-defeating quicksort | Quicksort + heapsort fallback, detects patterns (sorted runs, duplicates) |
| Go sort.Sort | Pattern-defeating quicksort | Similar to Rust's, with median-of-three and insertion sort for small arrays |
| C qsort() | Varies by implementation | glibc uses merge sort (for stability). musl uses smoothsort. BSD uses introsort. |
| .NET Array.Sort | Introsort | Quicksort + heapsort + insertion sort, similar to C++ |
Notice a pattern: every modern standard library uses a hybrid algorithm, not pure quicksort. The hybrid combines quicksort's excellent average-case performance with fallbacks for edge cases (heapsort for worst-case depth, insertion sort for small arrays, three-way partition for duplicates). This is the engineering reality: no single algorithm is best for all inputs.
Python and Java (for objects) use Timsort, invented by Tim Peters in 2002. It is a hybrid merge sort that:
(1) Detects existing sorted runs in the input (ascending or descending).
(2) Uses insertion sort to extend short runs to a minimum run length (~32).
(3) Merges runs using a sophisticated merge strategy that minimizes comparisons.
Timsort's killer feature: it is adaptive. On already-sorted input, Timsort runs in O(n) time (it just detects the single run and returns). On nearly-sorted input with a few inversions, it runs in O(n + n_inversions). On random data, it falls back to O(n log n) like standard merge sort.
The trade-off: Timsort requires O(n) extra space (for the merge buffer) and is ~30% slower than quicksort on random data due to worse cache behavior. But it is stable and adaptive, making it ideal for sorting objects with complex comparison functions (where the number of comparisons matters more than cache behavior).
Let us quantify the cache advantage. A modern CPU has an L1 cache of ~32 KB with 64-byte cache lines. When sorting 4-byte integers, each cache line holds 16 elements.
This 3x difference in cache misses, combined with merge sort's O(n) memory allocation overhead, is why quicksort is 2-3x faster in practice despite doing 39% more comparisons.
Modern CPUs predict which way a branch (if/else) will go before they know the answer. If the prediction is correct, execution continues at full speed. If wrong, the CPU must flush its pipeline and restart — costing ~15 clock cycles.
In quicksort's partition loop, the branch if A[j] <= pivot is taken about 50% of the time on random data. This is the worst case for branch prediction — the predictor cannot guess better than random. Each misprediction costs ~15 cycles.
Pattern-defeating quicksort (pdqsort) addresses this with block partitioning: instead of branching on every comparison, it scans a block of ~64 elements, writes the indices of elements that need swapping into a small buffer, and then performs all swaps in a separate loop. The index-writing loop has no data-dependent branches (it always writes, using conditional moves), eliminating misprediction entirely. The swap loop is also branchless. The result: ~30% faster partition on random data.
Quicksort scans linearly (cache-friendly). Merge sort alternates between three memory regions (cache-unfriendly).
Standard quicksort uses O(n) stack space in the worst case (one frame per recursion level). We can guarantee O(log n) stack space with a simple trick: always recurse on the smaller partition, and use a loop for the larger one.
python def quicksort_tail(A, lo, hi): """Quicksort with tail call elimination. Stack depth is always O(log n).""" while lo < hi: q = randomized_partition(A, lo, hi) # Recurse on the smaller partition, loop on the larger if q - lo < hi - q: quicksort_tail(A, lo, q - 1) # smaller left side lo = q + 1 # iterate on larger right side else: quicksort_tail(A, q + 1, hi) # smaller right side hi = q - 1 # iterate on larger left side
Why does this limit the stack depth to O(log n)? Each recursive call is on a subarray of size at most n/2 (since we chose the smaller side). After log2(n) recursive calls, the subarray size reaches 1. The larger side is handled by the while loop, which reuses the current stack frame. No new stack frame is allocated for the larger partition.
This is a standard optimization that all production sorting libraries use. In C++, the compiler can sometimes perform tail call optimization automatically, but relying on the compiler is fragile — it is better to write the code explicitly as a loop + recursion on the smaller side.
The combination of tail call optimization with randomized pivot gives O(log n) expected stack depth and O(log n) guaranteed maximum stack depth. Without tail call optimization, randomized pivot gives O(log n) expected stack depth but O(n) worst case (astronomically unlikely but not impossible).
| Technique | Expected stack depth | Worst-case stack depth |
|---|---|---|
| Basic quicksort | O(log n) | O(n) |
| Randomized pivot | O(log n) | O(n) (extremely unlikely) |
| Tail call optimization | O(log n) | O(log n) guaranteed |
| Introsort (with heapsort fallback) | O(log n) | O(log n) guaranteed |
Scenario 1: "Sort takes quadratic time on some inputs."
Diagnosis: You are using deterministic pivot selection (first or last element) and the input is sorted or nearly sorted. Fix: switch to randomized pivot or median-of-three. Better: use introsort to guarantee O(n log n).
Scenario 2: "Sort produces wrong results."
Most common bugs in quicksort implementations: (1) Off-by-one in the partition boundary — using quicksort(A, lo, q) instead of quicksort(A, lo, q-1) with Lomuto partition causes infinite recursion when the pivot is at lo. (2) Forgetting to handle the base case (lo ≥ hi). (3) Using Hoare partition with Lomuto-style recursion bounds — Hoare returns a split point, not the pivot index.
Scenario 3: "Stack overflow on large inputs."
Recursion depth is O(n) in worst case. Fix: (1) Use randomized pivot to make O(n) depth astronomically unlikely. (2) Use tail call optimization — recurse on the smaller partition, iterate on the larger one, capping stack depth at O(log n). (3) Use introsort to switch to heapsort after depth 2·log(n).
Scenario 4: "Sort is very slow on data with many duplicates."
Diagnosis: Standard Lomuto partition always splits at the pivot, but if many elements equal the pivot, they all end up on one side (usually the left), creating a heavily unbalanced split. For n elements all equal, Lomuto gives an (n-1):0 split every time, resulting in O(n²). Fix: use three-way partition to group all equal elements together and exclude them from recursion.
Scenario 5: "Sort works correctly on small inputs but crashes on n > 50,000."
Diagnosis: Stack overflow from deep recursion. Python's default recursion limit is 1000. With worst-case pivot selection on sorted input, recursion depth equals n. Fix: (1) Add sys.setrecursionlimit(n + 100) (hacky). (2) Better: use tail call optimization to cap stack depth at O(log n). (3) Best: use iterative quicksort with an explicit stack.
python def quicksort_iterative(A): """Iterative quicksort using an explicit stack.""" stack = [(0, len(A) - 1)] while stack: lo, hi = stack.pop() if lo >= hi: continue q = randomized_partition(A, lo, hi) # Push the larger partition first (processed last) # so the smaller partition is processed next (stack discipline) if q - lo > hi - q: stack.append((lo, q - 1)) stack.append((q + 1, hi)) else: stack.append((q + 1, hi)) stack.append((lo, q - 1))
In interviews, you should be able to discuss the trade-offs between quicksort, merge sort, and heapsort fluently. Here is the complete picture:
| Property | Quicksort | Merge Sort | Heapsort |
|---|---|---|---|
| Average time | 1.39n log n | n log n | ~2n log n |
| Worst time | O(n²)* | O(n log n) | O(n log n) |
| Extra space | O(log n) stack | O(n) buffer | O(1) |
| Stable? | No | Yes | No |
| Cache behavior | Excellent (sequential) | Poor (3 memory regions) | Poor (tree-indexed jumps) |
| Adaptive? | No (without pdqsort) | Yes (Timsort variant) | No |
| Parallelizable? | Moderately (after partition) | Highly (natural parallelism) | Poorly |
| Best for | General-purpose in-memory | External sort, stability needed | Guaranteed worst case, no extra space |
* O(n²) worst case is eliminated by introsort (quicksort + heapsort fallback).
This is your cheat sheet. Every variant, every complexity, every common interview question — organized for rapid review.
| Algorithm | Best | Average | Worst | Space | Stable? |
|---|---|---|---|---|---|
| Quicksort (basic) | O(n log n) | O(n log n) | O(n²) | O(log n) | No |
| Quicksort (randomized) | O(n log n) | O(n log n) | O(n²)* | O(log n) | No |
| Quicksort (3-way) | O(n) | O(n log n) | O(n²)* | O(log n) | No |
| Introsort | O(n log n) | O(n log n) | O(n log n) | O(log n) | No |
| Quickselect | O(n) | O(n) | O(n²)* | O(log n) | N/A |
| Merge sort | O(n log n) | O(n log n) | O(n log n) | O(n) | Yes |
| Heapsort | O(n log n) | O(n log n) | O(n log n) | O(1) | No |
* Worst case is astronomically unlikely with randomized pivot. Expected time is the average column.
These are the implementations you should be able to write from memory in an interview.
1. Lomuto partition — The building block. 10 lines. Must handle the invariant correctly.
2. Quicksort — 5 lines wrapping partition. Get the recursion bounds right (lo, q-1) and (q+1, hi).
3. Randomized quicksort — Add one line: swap random element to pivot position.
4. Three-way partition — The Dutch National Flag. Three pointers: lt, i, gt. Handle the "do not increment i when swapping from gt" case.
5. Quickselect — Like quicksort but recurse on ONE side only. Return value, not void.
| Dimension | What they test | Example question | Staff-level answer adds |
|---|---|---|---|
| Concept | First-principles math | "Derive the expected number of comparisons in randomized quicksort" | Explains the indicator variable technique, harmonic series, and why 1.39n log n is acceptable despite being 39% more than merge sort's n log n |
| Design | System architecture | "Design a sorting module for a database engine" | Discusses introsort for in-memory, external merge sort for disk, stability requirements for complex keys, and fallback strategies |
| Code | Implementation skill | "Implement quickselect for the k-th smallest element" | Handles edge cases (k out of bounds, empty array), uses randomized pivot, discusses iterative vs recursive |
| Debug | Failure diagnosis | "Our sort is O(n²) on production data" | Checks input distribution (sorted? many duplicates?), pivot selection strategy, recursion depth monitoring |
| Frontier | Research awareness | "What is pattern-defeating quicksort?" | Explains pdqsort: detects sorted runs, falls back to heapsort, uses block partitioning to avoid branch mispredictions |
The latest evolution of quicksort is pattern-defeating quicksort (pdqsort), used in Rust's sort_unstable and Go's sort.Sort. It combines several ideas:
1. Block partitioning. Instead of branching on every comparison (which causes branch misprediction ~50% of the time on random data), pdqsort buffers the indices of elements that need swapping into small fixed-size blocks. This converts unpredictable branches into predictable sequential reads, giving a 15-30% speedup on modern CPUs.
2. Pattern detection. If the algorithm detects that the input is already sorted (the partition always puts the pivot at the end), it switches to a linear-time check and returns immediately. If it detects many equal elements, it switches to three-way partitioning.
3. Heapsort fallback. Like introsort, it monitors recursion depth and switches to heapsort if it exceeds 2·log(n), guaranteeing O(n log n) worst case.
4. Insertion sort for small arrays. For subarrays smaller than 24 elements, it uses insertion sort (with sentinel optimization).
Can we sort faster than O(n log n) using comparisons? No. Here is the proof, and interviewers sometimes ask for it.
Any comparison-based sorting algorithm can be modeled as a decision tree: a binary tree where each internal node is a comparison "A[i] ≤ A[j]?", the left child is the "yes" branch, and the right child is the "no" branch. Each leaf represents a specific permutation of the input.
There are n! possible permutations of n elements. The decision tree must have at least n! leaves (one for each permutation). A binary tree with L leaves has height at least log2(L). Therefore:
This means any comparison-based sorting algorithm must make at least Ω(n log n) comparisons in the worst case. Quicksort, merge sort, and heapsort are all optimal to within constant factors. The Ω(n log n) lower bound is tight — it is both necessary and achievable.
| Year | Contribution | Author | Impact |
|---|---|---|---|
| 1960 | Quicksort invented | Hoare | The original algorithm with Hoare partition |
| 1962 | Published with analysis | Hoare | First formal average-case analysis |
| 1975 | Comprehensive analysis | Sedgewick (PhD thesis) | Analyzed all major variants |
| 1993 | Engineering a Sort Function | Bentley & McIlroy | Median-of-three, fat partition, production-grade C qsort |
| 1997 | Introsort | Musser | Heapsort fallback, O(n log n) worst case |
| 2002 | Timsort | Peters | Merge sort + insertion sort, Python/Java standard |
| 2009 | Dual-pivot quicksort | Yaroslavskiy | Two pivots, three partitions, Java standard |
| 2021 | Pattern-defeating quicksort | Peters | Adaptive detection, Rust/Go standard |
| Problem | Technique | Time |
|---|---|---|
| K-th Largest Element (LC 215) | Quickselect with k = n-k | O(n) expected |
| Sort Colors (LC 75) | Three-way partition (Dutch National Flag) | O(n), one pass |
| Top K Frequent Elements (LC 347) | Quickselect on frequency counts | O(n) expected |
| Wiggle Sort II (LC 324) | Quickselect for median + 3-way partition | O(n) |
| K Closest Points to Origin (LC 973) | Quickselect on distance values | O(n) expected |
| Sort an Array (LC 912) | Randomized quicksort (or introsort) | O(n log n) expected |
Before we look at specific problems, let us establish a testing pattern for any quicksort-family algorithm. These are the inputs that break incorrect implementations:
| Test case | Why it breaks things | What to verify |
|---|---|---|
| Empty array [] | Edge case: lo > hi | Should return immediately, no crash |
| Single element [5] | Base case: lo == hi | Should return immediately |
| Two elements [2, 1] | Smallest non-trivial case | Should swap to [1, 2] |
| Already sorted [1,2,3,4,5] | Triggers worst case without randomization | Must still produce correct output |
| Reverse sorted [5,4,3,2,1] | Also triggers worst case | Must still produce correct output |
| All equal [3,3,3,3,3] | Lomuto gives n-1:0 split every time | Must terminate (not infinite loop) |
| Two distinct values [1,2,1,2,1] | Tests partition with many duplicates | Must sort correctly |
| Large random array (n=10000) | Tests performance, not just correctness | Should complete in < 1 second |
LeetCode 75: Given an array with values 0, 1, 2 (red, white, blue), sort it in one pass. This is literally the Dutch National Flag problem.
python def sortColors(nums): """Sort [0,1,2] in-place using three-way partition.""" lo = 0 # boundary of 0-region i = 0 # current element hi = len(nums) - 1 # boundary of 2-region while i <= hi: if nums[i] == 0: nums[lo], nums[i] = nums[i], nums[lo] lo += 1 i += 1 elif nums[i] == 2: nums[hi], nums[i] = nums[i], nums[hi] hi -= 1 # don't increment i else: i += 1
python def findKthLargest(nums, k): """Find the k-th largest element using quickselect.""" # k-th largest = (n-k)-th smallest (0-indexed) target = len(nums) - k def select(lo, hi): # Randomized partition r = random.randint(lo, hi) nums[r], nums[hi] = nums[hi], nums[r] pivot = nums[hi] i = lo - 1 for j in range(lo, hi): if nums[j] <= pivot: i += 1 nums[i], nums[j] = nums[j], nums[i] nums[i+1], nums[hi] = nums[hi], nums[i+1] q = i + 1 if q == target: return nums[q] elif q < target: return select(q + 1, hi) else: return select(lo, q - 1) return select(0, len(nums) - 1)
Here is how a complete quicksort interview question might unfold at a top tech company.
Interviewer: "Implement a function that sorts an array of integers in-place."
You (clarifying): "I'll implement quicksort. A few clarifications: Can I modify the input array? Are there constraints on space? Do I need stability?"
Interviewer: "In-place modification is fine. Minimize extra space. Stability not needed."
You: Write Lomuto partition first. Then write the quicksort wrapper. Then add randomized pivot selection.
Interviewer: "What's the time complexity?"
You: "O(n log n) expected with randomized pivot. Worst case O(n²), but the probability of significantly exceeding 2n ln n comparisons is exponentially small. If we need a hard guarantee, we can add introsort: monitor recursion depth and switch to heapsort after 2 log n levels, giving O(n log n) worst case."
Interviewer: "This works, but when I test it on an array of all identical elements, it is very slow. Why?"
You: "With Lomuto partition, an all-equal array always produces an (n-1):0 split because every element is ≤ pivot, so the entire array moves to the left of the pivot. The fix is three-way partition — Dijkstra's Dutch National Flag — which groups all elements equal to the pivot together and never recurses into them. On all-equal input, three-way partition runs in O(n): one partition call, both recursive calls have size 0."
You then write the three-way partition and modify quicksort to use it.
Interviewer: "Good. Now what if I asked you to find the median instead of sorting?"
You: "Quickselect. Same partition, but recurse only into the side containing the target index. Expected O(n) time."
This conversation demonstrates all five dimensions: concept (complexity analysis), design (choosing the right variant), code (clean implementation), debug (diagnosing the all-equal input), and frontier (mentioning introsort and pdqsort).
These are the bugs that will cost you the interview. Memorize them.
| Bug | Symptom | Fix |
|---|---|---|
| Wrong recursion bounds with Lomuto | Infinite recursion or wrong output | Recurse on (lo, q-1) and (q+1, hi), NOT (lo, q) and (q+1, hi) |
| Wrong recursion bounds with Hoare | Infinite recursion | Recurse on (lo, j) and (j+1, hi), NOT (lo, j-1) and (j+1, hi) |
| Forgetting base case | Stack overflow | Check lo < hi (not lo != hi) to handle empty subarrays |
| Partition returns wrong index | Sorted output has elements in wrong positions | Lomuto must return i+1 (not i). Test with [2,1]. |
| Off-by-one in random pivot | Pivot is sometimes out of bounds | Use randint(lo, hi) inclusive, not randint(lo, hi-1) |
| Not incrementing i in three-way when swapping from gt | Skips elements or infinite loop | When A[i] > pivot, swap A[i] and A[gt], decrement gt, but do NOT increment i (the swapped element is unexamined) |
| Using quicksort on linked list | O(n) time per partition just for traversal | Use merge sort for linked lists |
When asked to implement quicksort in an interview, write it in this order:
Step 1: Partition (the hard part). Write and test this first.
Step 2: Quicksort wrapper (3 lines). Base case, partition call, two recursive calls.
Step 3: Randomized pivot (1 extra line). Swap random element to pivot position.
Step 4: Mention optimizations. "In production I would add insertion sort for small subarrays, median-of-three, and introsort fallback."
This shows you know both the fundamentals and the engineering. Most candidates stop at Step 2.
| Resource | What you learn |
|---|---|
| CLRS Chapter 7 | The formal analysis: recurrences, indicator variables, expected comparisons |
| Sedgewick's PhD thesis (1975) | The most thorough analysis of quicksort variants ever written |
| Musser, "Introspective Sorting and Selection" (1997) | The introsort algorithm — quicksort with a heapsort safety net |
| Bentley & McIlroy, "Engineering a Sort Function" (1993) | How to build a production-grade sort. The paper behind C's qsort. |
| Yaroslavskiy, "Dual-Pivot Quicksort" (2009) | The algorithm behind Java's Arrays.sort for primitives |
| Peters, "Pattern-defeating Quicksort" (2021) | Modern quicksort that adapts to input patterns. Used in Rust and Go. |
Quicksort is more than a sorting algorithm. It is a masterclass in algorithm design:
The power of randomization. A single random choice (the pivot) transforms a potentially O(n²) algorithm into an expected O(n log n) algorithm. This principle — using randomness to defeat worst cases — appears throughout computer science: hash tables, skip lists, random forests, stochastic gradient descent.
The value of simplicity. The core algorithm is three lines. Every optimization (insertion sort for small arrays, median-of-three, introsort fallback) is layered on top without changing the fundamental structure. This makes quicksort easy to implement correctly, easy to debug, and easy to teach.
The importance of constants. Asymptotic analysis says O(n log n) = O(n log n), but in practice, quicksort beats merge sort by 2-3x because of cache behavior and in-place operation. The constant factors that big-O hides can dominate real-world performance.