Introduction to Algorithms (CLRS) — Chapter 7

Quicksort

Partition, randomize, conquer — the fastest practical sorting algorithm.

Prerequisites: Arrays + Recursion. That's it.
10
Chapters
9+
Simulations
5
Interview Dimensions

Chapter 0: The Problem

You have one million numbers to sort. You already know merge sort — it runs in O(n log n) time, guaranteed. But merge sort has a dirty secret: it needs O(n) extra space. For your million numbers, that is another million-element array sitting in memory, used only to merge subarrays together. On an embedded system with 64 KB of RAM, that extra array might not fit. In a database sorting gigabytes of records, doubling memory usage is a non-starter.

Can we sort in O(n log n) time without that extra array? Can we rearrange the elements in-place, using only the original array and a constant amount of extra space?

Heapsort can do it in-place with O(n log n) worst case, but it has terrible cache behavior — it jumps around the array following tree indices (parent, left child, right child), causing cache misses on nearly every access. In practice, heapsort is 3-5x slower than merge sort despite having the same asymptotic complexity.

The answer is quicksort. Invented by Tony Hoare in 1960, it is the default sorting algorithm in most standard libraries — C's qsort, Java's Arrays.sort for primitives, and the backbone of Python's sorted() (which uses Timsort, a merge sort variant, but quicksort-family algorithms dominate for primitives). It works by a deceptively simple idea:

The quicksort idea. Pick one element (the pivot). Rearrange the array so that everything ≤ pivot is on the left, and everything > pivot is on the right. The pivot lands in its final sorted position. Now recursively sort the left part and the right part. That is the entire algorithm.

Notice the asymmetry with merge sort. In merge sort, the work happens after the recursion — you split blindly, then merge carefully. In quicksort, the work happens before the recursion — you partition carefully, then the recursion is trivial because the subarrays are already separated. The partition step does all the heavy lifting.

Let us see this in action. The simulation below shows a random array. Click Start to watch quicksort partition around a pivot (highlighted in orange), separate the elements, then recurse on each half. Notice how every partition places exactly one element (the pivot) in its correct final position.

Quicksort Overview

Watch the pivot (orange) land in its final position after each partition. Green bars are sorted.

Click Start to begin

Two things to notice. First, the pivot always ends up in the right place — if you freeze the animation after any partition, that pivot's bar is exactly where it would be in the fully sorted array. Second, the partition is in-place: no extra array is allocated. Elements are swapped within the original array.

What Makes a Good Pivot?

The pivot determines how evenly the array is split. The ideal pivot is the median — it splits the array into two equal halves. But finding the exact median is itself a non-trivial problem (we will see quickselect in Chapter 7). So in practice, we approximate a good pivot using randomization or sampling.

How bad can the pivot be? Think of the pivot's rank (its position in sorted order). If the pivot has rank r, the partition produces subarrays of size r-1 and n-r. The worst case is r=1 or r=n (the minimum or maximum), which gives a 0:(n-1) split. Any other rank gives a better split.

Think of it this way. If you throw a dart at a ruler, the probability that it lands in the middle 50% (between the 25th and 75th percentile) is 1/2. So half the time, a random pivot gives at most a 3:1 split. The probability of getting a 9:1 split or worse is only 20%. The probability of consistently getting bad pivots for log(n) levels is astronomically small — like flipping a coin and getting tails 20 times in a row.

Quicksort vs Merge Sort: Where the Work Happens

This is the single most important conceptual difference between the two great O(n log n) sorting algorithms, and interviewers love asking about it.

In merge sort, you split the array in half (trivial — just compute the midpoint), recurse on both halves, then merge the two sorted halves together (the hard part). The work happens after the recursion, on the way back up. Splitting costs nothing; merging costs O(n).

In quicksort, you partition the array around a pivot (the hard part), then recurse on both halves. The work happens before the recursion, on the way down. Partitioning costs O(n); there is nothing to do on the way back up because the subarrays are already separated.

PropertyMerge SortQuicksort
Where work happensAfter recursion (merge step)Before recursion (partition step)
Split/combine costSplit: O(1), Merge: O(n)Partition: O(n), Combine: O(0)
Extra spaceO(n) for merge bufferO(1) for partition (plus O(log n) stack)
Worst caseAlways O(n log n)O(n²) with bad pivots
Cache behaviorPoor (merge reads from 2 arrays + writes to 1)Excellent (partition scans sequentially)
Think of it this way. Merge sort is like organizing a library by first dumping all books on the floor in two piles, organizing each pile separately, then carefully interleaving them back onto the shelves. Quicksort is like sorting mail: pick a letter (pivot), throw everything for "A-M" in the left bin and "N-Z" in the right bin, then sort each bin. The separation happens first; once everything is in the right bin, the bins just need internal sorting.

The catch? Quicksort's worst case is O(n²). If the pivot is always the smallest or largest element, each partition only peels off one element, and we get n levels of recursion with O(n) work each. But this worst case is avoidable — with randomized pivot selection, the expected time is O(n log n) for any input. That is why quicksort is called "quick."

A Brief History

Tony Hoare invented quicksort in 1960, while he was a visiting student at Moscow State University. He was working on a machine translation project and needed to sort words into alphabetical order to look them up in a Russian-English dictionary stored in sorted order on magnetic tape. He first thought of insertion sort, but it was too slow. Then he had the insight: pick a word at random, split the dictionary into words before and after it, and sort the two halves independently.

He later recalled: "I was very pleased with my invention, but I was totally unable to prove that it always worked." It took another year before Hoare published the proof of correctness and the average-case analysis. The paper, "Quicksort" (Computer Journal, 1962), is one of the most cited in computer science. Hoare went on to win the Turing Award in 1980, partly for this contribution.

The algorithm's elegance lies in its recursive structure: one operation (partition) that places a single element in its correct position, applied recursively until every element has been placed. No temporary arrays, no complex data structures — just comparisons and swaps.

Sixty years later, quicksort (in its evolved forms) remains the most-used sorting algorithm in practice. Not bad for an idea born from a translation problem.

The Divide-and-Conquer Structure

Quicksort is a divide-and-conquer algorithm with three steps:

Divide
Partition A[lo..hi] into two subarrays A[lo..q-1] and A[q+1..hi] such that every element in the left subarray is ≤ A[q] and every element in the right subarray is > A[q]. The pivot A[q] is in its final position.
Conquer
Recursively sort A[lo..q-1] and A[q+1..hi]. Because partition already separated the elements, sorting the subarrays independently produces a fully sorted array.
Combine
Nothing. The array is already sorted in-place. This is the beautiful part — there is no combine step.

Let us count the elements of this pattern more carefully. The partition step takes Θ(n) time (one linear scan). Each recursive call handles a subarray that is strictly smaller (by at least one element — the pivot, which is excluded from both recursive calls). So the recursion always terminates — the subarray shrinks by at least one element per level. The total work depends on how balanced the splits are, which we will analyze in detail in Chapter 3.

The Recursion Unwound

To build more intuition, let us trace quicksort on a tiny array: [3, 1, 2].

// quicksort([3, 1, 2], 0, 2)
// Pivot = A[2] = 2
// Partition: j=0, A[0]=3 > 2, skip. j=1, A[1]=1 ≤ 2, swap → [1, 3, 2]
// i=0, place pivot: swap A[1] with A[2] → [1, 2, 3]. q=1.
// Recurse left: quicksort([1], 0, 0) → base case, done
// Recurse right: quicksort([3], 2, 2) → base case, done
// Total: 2 comparisons, 2 swaps. Array sorted: [1, 2, 3]

Two comparisons for three elements. The optimal lower bound for sorting 3 elements is ceil(log2(3!)) = ceil(log2(6)) = 3 comparisons. But we got lucky — the pivot happened to be the median. If the pivot were the minimum (e.g., input [2, 3, 1] with pivot 1), we would get a 0:2 split and need 3 comparisons total.

Let us also trace a small example where the pivot is NOT the median to see the unbalanced case:

// quicksort([5, 3, 1], 0, 2)
// Pivot = A[2] = 1 (the minimum!)
// Partition: j=0, A[0]=5 > 1, skip. j=1, A[1]=3 > 1, skip.
// i=-1, place pivot: swap A[0] with A[2] → [1, 3, 5]. q=0.
// Recurse left: quicksort([], 0, -1) → base case (empty)
// Recurse right: quicksort([3, 5], 1, 2)
// Pivot = A[2] = 5. j=1, A[1]=3 ≤ 5, swap → [3, 5]. q=2.
// Two base cases. Done.
// Total: 3 comparisons. Same result, but the recursion was deeper (3 levels vs 2).

This is the worst-case structure in miniature: the pivot is the extreme element, giving a 0:(n-1) split. One level of recursion is "wasted" because one subarray is empty. Scale this up to n elements and you get n levels of recursion.

Concept check: Merge sort splits the array in half blindly, then merges carefully. Quicksort partitions carefully, then recurses trivially. What is the key consequence of this difference for memory usage?

Chapter 1: The Partition Algorithm

Partition is the engine of quicksort. Everything depends on it. Get partition right, and quicksort is trivial — three lines of recursion. Get partition wrong, and everything falls apart: infinite loops, wrong output, stack overflows.

The partition subroutine takes a subarray A[lo..hi], picks a pivot element, and rearranges elements so that everything ≤ pivot is on the left and everything > pivot is on the right. After partition returns, the pivot is at some index q, sitting in its final sorted position — the exact index it would occupy in the fully sorted array. This is done in-place with O(n) time and O(1) extra space.

Think of partition as a bouncer at a nightclub. The pivot is the VIP threshold. Everyone at or below the threshold goes to the left room. Everyone above goes to the right room. The bouncer (partition) makes one pass through the line, checking each person, and at the end the two rooms are filled and the threshold is placed exactly at the door between them.

CLRS uses the Lomuto partition scheme, named after Nico Lomuto. It is not the fastest partition in practice (we will see Hoare's in Chapter 5), but it is the easiest to understand and prove correct. Here is how it works:

1. Choose pivot
Use the last element A[hi] as the pivot. (Later we will randomize this choice.)
2. Maintain boundary i
Variable i starts at lo - 1. Everything at indices lo..i is ≤ pivot. Everything at i+1..j-1 is > pivot.
3. Scan with j
Scan j from lo to hi - 1. If A[j] ≤ pivot: increment i, swap A[i] with A[j]. This grows the "≤ pivot" region by one.
4. Place pivot
After the scan, swap A[i+1] with A[hi] (the pivot). Now the pivot is at index i+1, with all smaller elements to its left and all larger to its right.
5. Return i+1
This is the pivot's final index. Quicksort will recurse on A[lo..i] and A[i+2..hi].

Worked Example

Let us trace Lomuto partition on the array [2, 8, 7, 1, 3, 5, 6, 4] with pivot = 4 (the last element). We track i (boundary of the ≤ region) and j (the scanner).

StepjA[j]A[j] ≤ 4?ActioniArray after
Init---i = -1-1[2, 8, 7, 1, 3, 5, 6, 4]
102Yesi++, swap A[0]↔A[0]0[2, 8, 7, 1, 3, 5, 6, 4]
218Noskip0[2, 8, 7, 1, 3, 5, 6, 4]
327Noskip0[2, 8, 7, 1, 3, 5, 6, 4]
431Yesi++, swap A[1]↔A[3]1[2, 1, 7, 8, 3, 5, 6, 4]
543Yesi++, swap A[2]↔A[4]2[2, 1, 3, 8, 7, 5, 6, 4]
655Noskip2[2, 1, 3, 8, 7, 5, 6, 4]
766Noskip2[2, 1, 3, 8, 7, 5, 6, 4]
Final---swap A[3]↔A[7]3[2, 1, 3, 4, 7, 5, 6, 8]

Pivot 4 is now at index 3. Everything to its left (2, 1, 3) is ≤ 4. Everything to its right (7, 5, 6, 8) is > 4. The pivot is in its final sorted position — no matter how we sort the left and right halves, 4 will stay at index 3.

The invariant. At every point during the scan, the array has three regions: A[lo..i] contains elements ≤ pivot, A[i+1..j-1] contains elements > pivot, and A[j..hi-1] is unexamined. This invariant is what makes the algorithm correct. When j reaches hi, the unexamined region is empty, and we just place the pivot between the two sorted regions.

Step through the partition yourself in the simulation below. Click Step to advance one comparison at a time. Green cells are ≤ pivot, red cells are > pivot, gray cells are unexamined.

Lomuto Partition Step-Through

Click Step to advance. The orange bar is the pivot. Green = ≤ pivot, red = > pivot.

i = -1, j = 0. Click Step.

The Code

python
def lomuto_partition(A, lo, hi):
    """Partition A[lo..hi] using A[hi] as pivot.
    Returns the final index of the pivot."""
    pivot = A[hi]
    i = lo - 1          # boundary of the ≤ region
    for j in range(lo, hi):
        if A[j] <= pivot:
            i += 1
            A[i], A[j] = A[j], A[i]   # grow the ≤ region
    A[i + 1], A[hi] = A[hi], A[i + 1]  # place pivot
    return i + 1

The function does exactly n - 1 comparisons (j runs from lo to hi - 1) and at most n swaps. Time: O(n). Space: O(1) — just the variables i, j, and pivot.

Why This Invariant Matters

The correctness of quicksort rests entirely on the partition invariant. Let us state it precisely. At each iteration, the array A[lo..hi] has four regions:

A[lo..i]    // all elements ≤ pivot
A[i+1..j-1]    // all elements > pivot
A[j..hi-1]    // unexamined elements
A[hi]    // the pivot itself

Initialization: Before the loop starts, i = lo - 1 and j = lo. The ≤ region and > region are both empty. The entire array (except the pivot) is unexamined. The invariant holds trivially.

Maintenance: At each step, we examine A[j]. If A[j] ≤ pivot, we grow the ≤ region by swapping A[j] into position i+1 and advancing both i and j. If A[j] > pivot, we just advance j — A[j] is already in the > region (between i+1 and j). Either way, the invariant is maintained.

Termination: When j = hi, the unexamined region is empty. We swap the pivot (A[hi]) with A[i+1], placing it between the ≤ and > regions. The pivot is now at its correct sorted position.

Loop invariant proof. This three-part structure (initialization, maintenance, termination) is the standard technique for proving correctness of iterative algorithms. You will see it throughout CLRS. For partition, the invariant guarantees that after the loop, every element to the left of the pivot is ≤ pivot and every element to the right is > pivot — exactly what quicksort needs.

Hand Trace: A Second Example

Let us trace partition on [5, 3, 8, 6, 2, 7, 1, 4] with pivot = 4.

jA[j]≤ 4?ActioniArray
05Noskip-1[5, 3, 8, 6, 2, 7, 1, 4]
13Yesi++, swap A[0]↔A[1]0[3, 5, 8, 6, 2, 7, 1, 4]
28Noskip0[3, 5, 8, 6, 2, 7, 1, 4]
36Noskip0[3, 5, 8, 6, 2, 7, 1, 4]
42Yesi++, swap A[1]↔A[4]1[3, 2, 8, 6, 5, 7, 1, 4]
57Noskip1[3, 2, 8, 6, 5, 7, 1, 4]
61Yesi++, swap A[2]↔A[6]2[3, 2, 1, 6, 5, 7, 8, 4]
done--swap A[3]↔A[7]3[3, 2, 1, 4, 5, 7, 8, 6]

Pivot 4 lands at index 3. Left region [3, 2, 1] — all ≤ 4. Right region [5, 7, 8, 6] — all > 4. Notice the elements within each region are NOT sorted — that is the job of the recursive calls.

Interview question: After Lomuto partition finishes on an 8-element array, how many elements are guaranteed to be in their final sorted position?

Chapter 2: The Quicksort Algorithm

We have the partition subroutine. Quicksort itself is almost embarrassingly simple: partition the array, then recursively sort the two halves.

python
def quicksort(A, lo, hi):
    """Sort A[lo..hi] in-place."""
    if lo < hi:
        q = lomuto_partition(A, lo, hi)  # pivot at index q
        quicksort(A, lo, q - 1)           # sort left of pivot
        quicksort(A, q + 1, hi)           # sort right of pivot

# Usage:
A = [2, 8, 7, 1, 3, 5, 6, 4]
quicksort(A, 0, len(A) - 1)
print(A)  # [1, 2, 3, 4, 5, 6, 7, 8]

Three lines of logic. That is the entire algorithm. But let us carefully trace the recursion to see why it works.

Recursion Trace

Starting with [2, 8, 7, 1, 3, 5, 6, 4]:

CallSubarrayPivotAfter partitionPivot index
quicksort(0,7)[2,8,7,1,3,5,6,4]4[2,1,3,4,7,5,6,8]3
quicksort(0,2)[2,1,3]3[2,1,3]2
quicksort(0,1)[2,1]1[1,2]0
quicksort(4,7)[7,5,6,8]8[7,5,6,8]7
quicksort(4,6)[7,5,6]6[5,6,7]5

Each recursive call handles a smaller subarray. When a subarray has 0 or 1 elements (lo ≥ hi), we return immediately — a single element is already sorted. The pivot at each level is permanently placed, and eventually every element has been a pivot.

The Recursion Tree

Let us visualize the recursion. Each node represents a call to quicksort(lo, hi), and the pivot chosen is shown in parentheses.

// Recursion tree for [2,8,7,1,3,5,6,4]:

quicksort(0,7)   pivot=4 → splits at q=3
  ┌— quicksort(0,2)   pivot=3 → splits at q=2
  │   ┌— quicksort(0,1)   pivot=1 → splits at q=0
  │   │   ┌— quicksort(0,-1)   base case
  │   │   └— quicksort(1,1)    base case
  │   └— quicksort(3,2)    base case (empty)
  └— quicksort(4,7)   pivot=8 → splits at q=7
      ┌— quicksort(4,6)   pivot=6 → splits at q=5
      │   ┌— quicksort(4,4)   base case
      │   └— quicksort(6,6)   base case
      └— quicksort(8,7)   base case (empty)

Count the partition operations: 5 non-trivial calls (the ones that actually partition). Each partition does O(subarray size) work. Total work: 8 + 3 + 2 + 4 + 3 = 20 comparisons. For n=8, O(n log n) would be about 8 × 3 = 24, so we are right in the ballpark.

Stack Frame Analysis

How much memory does the recursion use? Each call to quicksort pushes a stack frame containing lo, hi, q, and the return address — about 32 bytes on a 64-bit machine. The maximum stack depth equals the maximum recursion depth.

// Stack frames alive at any one time:
// Best case: O(log n) frames → 20 frames for n=1M → 640 bytes
// Worst case: O(n) frames → 1M frames for n=1M → 32 MB
// With tail call optimization: always O(log n) → always 640 bytes

Compare this to merge sort, which also uses O(log n) stack frames (always balanced recursion) but additionally allocates O(n) auxiliary space for the merge buffer — 4 MB for 1 million 32-bit integers. Quicksort's in-place nature is its biggest practical advantage.

Correctness Argument

Why is the final array sorted? By induction on the subarray size:

Base case: An array of 0 or 1 elements is trivially sorted. When lo ≥ hi, quicksort returns immediately without modifying the array. This is correct because a 0- or 1-element subarray is already sorted.

Inductive step: Assume quicksort correctly sorts arrays smaller than n. For an array of size n, partition places the pivot at index q. All elements in A[lo..q-1] are ≤ pivot, and all elements in A[q+1..hi] are > pivot. By the inductive hypothesis, the recursive calls sort A[lo..q-1] and A[q+1..hi] (both are smaller than n, since the pivot is excluded from both). After both recursive calls return, the full array A[lo..hi] is sorted: everything left of q is sorted and ≤ A[q], everything right of q is sorted and > A[q], and A[q] is in its correct position.

Termination: Each recursive call operates on a strictly smaller subarray (at least one element smaller, because the pivot is excluded). So the recursion reaches the base case in at most n levels.

Not stable. Quicksort is not a stable sort. Equal elements may be reordered during partition swaps. If you need stability (preserving the original order of equal elements), use merge sort. For primitives like integers and floats, stability does not matter. For objects with complex keys, it might.

Why Quicksort Is Unstable: A Concrete Example

Consider sorting cards by value, where we have [5♠, 3♥, 5♥, 1♦] with pivot = 5♥ (last element). After Lomuto partition:

// Initial: [5♠, 3♥, 5♥, 1♦] with pivot = 1♦? No, pivot = last = 1♦
// Actually let's use a clearer example.
// Array: [(5,a), (3,b), (2,c), (5,d), (4,e)] — value, label
// Pivot = (4,e). After Lomuto:
// j=0: (5,a) > 4, skip
// j=1: (3,b) ≤ 4, swap with pos 0 → [(3,b), (5,a), (2,c), (5,d), (4,e)]
// j=2: (2,c) ≤ 4, swap with pos 1 → [(3,b), (2,c), (5,a), (5,d), (4,e)]
// j=3: (5,d) > 4, skip
// Place pivot: swap pos 2 with pivot → [(3,b), (2,c), (4,e), (5,d), (5,a)]

// The two 5s: originally (5,a) came before (5,d)
// After partition: (5,d) is at index 3, (5,a) is at index 4
// Their relative order REVERSED! → unstable

The swap in step j=2 moved (5,a) from position 1 (where it had been swapped) to position 2 and later beyond the pivot. The relative order of equal elements is not preserved.

In-Place But Not Constant Space

Quicksort does not allocate an auxiliary array — it sorts in-place. But it is not truly O(1) space. The recursion stack uses space proportional to the recursion depth. In the best case, the recursion depth is O(log n) (balanced partitions). In the worst case, it is O(n) (maximally unbalanced). We will fix the worst case in Chapter 4 and Chapter 6.

Watch the full quicksort execution below. The simulation highlights the active subarray, shows the pivot selection, partition, and recursive calls. Notice how the sorted region (green) grows as pivots land in their final positions.

Full Quicksort Execution

Watch the recursion unfold. Orange = pivot, blue = active subarray, green = sorted.

Click Start for auto-play or Step to advance manually.
Concept check: In quicksort, work happens BEFORE the recursion (during partition). In merge sort, work happens AFTER the recursion (during merge). What does this mean for the base case?

Chapter 3: Performance Analysis

Quicksort's performance depends entirely on how balanced the partitions are. Let us derive the running time for three scenarios.

Worst Case: O(n²)

The worst case occurs when every partition produces the most unbalanced split possible: one side has n-1 elements and the other has 0 elements. This happens when the pivot is always the minimum or maximum element — for example, when the input is already sorted and we always pick the last element as pivot.

// Worst case recurrence:
T(n) = T(n-1) + T(0) + Θ(n)
= T(n-1) + Θ(n)    since T(0) = Θ(1)

// Expand by substitution:
T(n) = T(n-1) + cn
= T(n-2) + c(n-1) + cn
= T(n-3) + c(n-2) + c(n-1) + cn
= ...
= T(1) + c·2 + c·3 + ... + c·n
= Θ(1) + c · ∑k=2n k
= Θ(n²)    since ∑ k = n(n+1)/2

The recursion tree is a long chain: n levels, each doing O(n), O(n-1), ..., O(1) work. Total: n + (n-1) + ... + 1 = n(n+1)/2 = O(n²). This is as bad as insertion sort — and it happens on already-sorted input, which is arguably the most common input in practice.

Worst Case: Step by Step

Let us trace the worst case on [1, 2, 3, 4, 5] with the last element as pivot.

CallSubarrayPivotAfter partitionSplitComparisons
qs(0,4)[1,2,3,4,5]5[1,2,3,4,5]4:04
qs(0,3)[1,2,3,4]4[1,2,3,4]3:03
qs(0,2)[1,2,3]3[1,2,3]2:02
qs(0,1)[1,2]2[1,2]1:01
qs(0,0)[1]-[1]base0

Total comparisons: 4 + 3 + 2 + 1 = 10 = n(n-1)/2. For n = 1000, that is 499,500 comparisons. For n = 1,000,000, it is 499,999,500,000 — half a trillion comparisons. Every element is a pivot, but each partition only peels off the pivot itself, leaving n-1 elements for the next level. The pivot is always the maximum of the remaining subarray because the input is sorted.

The irony. Quicksort's worst case occurs on already-sorted input — the one case where you do not need to sort at all! A simple pre-check ("is the array already sorted?") in O(n) time could avoid this, but it would add overhead to every call. Randomized pivot selection is more elegant: it handles sorted input, reverse-sorted input, and any other adversarial pattern, all with one line of code.

Best Case: O(n log n)

The best case occurs when every partition splits the array exactly in half:

// Best case recurrence:
T(n) = 2T(n/2) + Θ(n)

// This is the same as merge sort!
// By the Master Theorem (case 2): a=2, b=2, f(n)=n, log_b(a)=1
T(n) = Θ(n log n)

The recursion tree is perfectly balanced: log&sub2;(n) levels, each doing Θ(n) total work across all nodes. Total: O(n log n).

Average Case: O(n log n)

Here is the surprising result. Even if the partition is not perfectly balanced, quicksort is still O(n log n) as long as the split is proportional. Consider a 9:1 split — seems terrible, right?

T(n) = T(9n/10) + T(n/10) + Θ(n)

The recursion tree has two branches of different lengths. The short branch (n/10 side) bottoms out at depth log10(n). The long branch (9n/10 side) bottoms out at depth log10/9(n). But at every level of the tree, the total work across all nodes is at most cn (because each element participates in at most one partition call per level). The total number of levels is log10/9(n) = O(log n). So the total work is O(n log n).

The key insight. Any split of constant proportionality (even 99:1!) gives O(n log n). The only thing that kills quicksort is when one side is O(1) and the other is O(n). As long as both sides get a constant fraction of the elements, the depth is logarithmic and the total work is n log n.

Alternating Good and Bad Partitions

What if we alternate between best-case and worst-case splits? Suppose at even levels we get a balanced split and at odd levels we get a worst-case split. Is this still O(n log n)?

// Level 0 (bad split): partition n into n-1 and 0
T(n) = T(n-1) + T(0) + cn = T(n-1) + cn

// Level 1 (good split): partition n-1 into (n-1)/2 and (n-1)/2
T(n-1) = 2T((n-1)/2) + c(n-1)

// Combining both levels:
T(n) = 2T((n-1)/2) + cn + c(n-1)
= 2T(n/2) + Θ(n)    (treating n-1 ≈ n)
= Θ(n log n)

The bad partition costs an extra O(n) at that level but does not change the asymptotic bound. The good partition that follows immediately "repairs" the damage. This is why even mediocre pivot selection gives O(n log n) on average — you need consistently bad pivots (all worst-case) to get O(n²).

Concrete Numbers

Let us put real numbers to these bounds for n = 1,000,000 (one million):

CaseDepthTotal comparisonsTime at 1 GHz
Worst case1,000,000~500,000,000,000 (500 billion)~500 seconds
Best case20~20,000,000~0.02 seconds
Average case~40~28,000,000 (1.39n log n)~0.03 seconds

The worst case is 25,000 times slower than the average case. This is not a theoretical concern — it happens in practice on sorted or nearly-sorted input, which is common (think: appending new records to an already-sorted database). This is why randomized pivot selection or median-of-three is essential.

The simulation below shows three recursion trees side-by-side: worst case (sorted input, always pick last element), best case (perfect median pivot), and average case (random pivots). Watch how the total work at each level stays consistent but the tree shape changes dramatically.

Recursion Trees: Worst vs Best vs Average

Compare the tree shapes. Worst case is a chain. Best case is balanced. Average case is slightly unbalanced but still O(log n) deep.

Click to generate recursion trees for n=16

The Intuition Behind Average Case

Why is the average case O(n log n) even though some splits are terrible? Here is the intuitive argument. A random pivot has a 50% chance of landing in the middle 50% of the sorted array (between the 25th and 75th percentile). If it does, the larger partition has at most 3n/4 elements. So on average, every other level has a good split.

After two levels (one bad, one good), the problem size drops from n to at most 3n/4. After 2 log4/3(n) levels, the problem size reaches 1. Since log4/3(n) = O(log n), the depth is O(log n). Each level does O(n) work (the partition scans), so the total is O(n log n).

More precisely, the expected number of levels to reduce from n to 1 is:

// Each level has a 1/2 chance of a "good" split (25-75 percentile)
// A good split reduces the larger subarray to at most 3n/4
// Expected levels for one good split: 2 (geometric distribution)
// Good splits needed to reach n=1: log_{4/3}(n)
// Total expected levels: 2 log_{4/3}(n) = O(log n)
// Total expected work: O(n) × O(log n) = O(n log n)

Comparison Table

CasePivot selectionSplit ratioDepthTime
WorstAlways min or maxn-1 : 0nΘ(n²)
BestAlways mediann/2 : n/2log nΘ(n log n)
AverageRandom elementVariesO(log n)Θ(n log n)
9:1 split90th percentile9n/10 : n/10log10/9 nΘ(n log n)
Debug scenario: Your quicksort implementation (using last element as pivot) runs in 0.5 seconds on random data of size 100,000 but takes over 5 minutes on data that is already sorted. What is happening, and what is the simplest fix?

Chapter 4: Randomized Quicksort

The worst case of O(n²) is triggered by a specific input pattern (sorted data) combined with a specific pivot choice (always last element). An adversary who knows your algorithm can always craft an input that triggers the worst case. How do we defend against this? By making the algorithm randomized: choose the pivot uniformly at random.

python
import random

def randomized_partition(A, lo, hi):
    """Pick a random pivot, then use Lomuto partition."""
    r = random.randint(lo, hi)   # random index in [lo, hi]
    A[r], A[hi] = A[hi], A[r]    # swap random element to end
    return lomuto_partition(A, lo, hi)

def randomized_quicksort(A, lo, hi):
    if lo < hi:
        q = randomized_partition(A, lo, hi)
        randomized_quicksort(A, lo, q - 1)
        randomized_quicksort(A, q + 1, hi)

One extra line — swap a random element to the pivot position — and the worst case disappears probabilistically. No adversarial input can consistently trigger O(n²) because the adversary does not know which random choices we will make.

Why randomization works. Without randomization, the algorithm's behavior is a deterministic function of the input. An adversary who knows your pivot strategy (e.g., "always last element") can construct an input that triggers worst case. With randomization, the algorithm's behavior depends on random coin flips that the adversary cannot predict. Even if the adversary provides the worst possible input array, the expected running time is O(n log n) because the random pivot choices will, on average, produce balanced partitions.

This is a profound principle in algorithm design: randomness defeats adversaries. A randomized algorithm with good expected performance is often preferable to a deterministic algorithm with good worst-case performance, because the randomized algorithm has good performance on ALL inputs (in expectation), while the deterministic algorithm may be slow on specific inputs.

Expected Number of Comparisons

Let us prove that randomized quicksort makes O(n log n) expected comparisons. This is one of the most elegant probabilistic analyses in CLRS.

Label the elements by their rank in sorted order: z1 < z2 < ... < zn. Define an indicator random variable Xij that equals 1 if zi and zj are ever compared, and 0 otherwise. The total number of comparisons is:

X = ∑i=1n-1j=i+1n Xij

By linearity of expectation:

E[X] = ∑i=1n-1j=i+1n Pr[zi and zj are compared]

When are zi and zj compared? Only when one of them is chosen as pivot while the set {zi, zi+1, ..., zj} is still together in the same subarray. If any element zk with i < k < j is chosen as pivot first, then zi and zj end up on opposite sides and are never compared. So zi and zj are compared if and only if the first element from {zi, zi+1, ..., zj} to be chosen as pivot is zi or zj.

Pr[zi compared with zj] = 2 / (j - i + 1)

There are j - i + 1 elements in the set, each equally likely to be chosen first, and exactly 2 of them (zi and zj) trigger a comparison. Now we sum:

E[X] = ∑i=1n-1j=i+1n 2/(j-i+1)

// Substitute k = j - i:
= ∑i=1n-1k=1n-i 2/(k+1)

// Each inner sum is bounded by the harmonic series:
< ∑i=1n-1k=1n 2/k
= ∑i=1n-1 2 Hn
= 2(n-1) Hn
= 2(n-1) · O(ln n)
= O(n log n)
The harmonic series is the hero. The expected comparisons are 2n ln n ≈ 1.39 n log2 n. Merge sort makes exactly n log2 n comparisons. So randomized quicksort does about 39% more comparisons on average — but it wins overall because each comparison involves swapping adjacent elements in a contiguous array (cache-friendly), while merge sort copies to an auxiliary array (cache-unfriendly).

Concrete Computation

Let us compute the expected comparisons for n = 8 to make this real, not just asymptotic.

// E[X] = ∑_{i=1}^{7} ∑_{j=i+1}^{8} 2/(j-i+1)

// For each gap k = j - i:
k=1:   7 pairs, each with prob 2/2 = 1.00 → 7.00
k=2:   6 pairs, each with prob 2/3 = 0.67 → 4.00
k=3:   5 pairs, each with prob 2/4 = 0.50 → 2.50
k=4:   4 pairs, each with prob 2/5 = 0.40 → 1.60
k=5:   3 pairs, each with prob 2/6 = 0.33 → 1.00
k=6:   2 pairs, each with prob 2/7 = 0.29 → 0.57
k=7:   1 pair,  each with prob 2/8 = 0.25 → 0.25
// Total: 16.92 expected comparisons

// For comparison:
// n log_2 n = 8 × 3 = 24 (merge sort)
// 1.39 × 8 × 3 = 33.3 (theoretical bound)
// Our exact answer 16.92 < 33.3 because the bound is loose for small n

Adjacent elements (k=1) are always compared — if zi and zi+1 are adjacent in sorted order, the first one chosen as pivot must be one of them. Distant elements (k=7, i.e., the min and max) are compared only 25% of the time — it is likely that some element between them will be chosen as pivot first, separating them.

Concentration: How Likely is the Worst Case?

The expected time is O(n log n), but how likely is it that we actually get close to O(n²)? The answer: astronomically unlikely. By Chebyshev's inequality and more refined concentration bounds, the probability that randomized quicksort takes more than cn log n time (for a constant c > 2) decreases exponentially in n. For n = 1,000,000, the probability of exceeding 10 × n log n comparisons is less than 10-100. You are more likely to be hit by a meteorite while writing the sort function.

Las Vegas vs Monte Carlo

Randomized quicksort is a Las Vegas algorithm: it always produces the correct answer (a sorted array), but the running time varies. The expected time is O(n log n), but any particular run might be faster or slower. Contrast this with a Monte Carlo algorithm, which always runs in a fixed time but might produce a wrong answer with some small probability.

The distinction matters in practice. You never need to check if randomized quicksort produced the right answer — it always does. You just cannot predict exactly how long it will take (though the expected time and the variance are well understood).

For the interview, remember: when someone says "expected O(n log n)", they mean a Las Vegas algorithm. The randomness is in the algorithm's choices (pivot selection), not in the input. Even on adversarial input, the expected time (averaging over the algorithm's random choices) is O(n log n).

This is subtly different from "average-case O(n log n)" which means: if the input is drawn uniformly at random, the deterministic algorithm runs in O(n log n). An adversary can defeat a deterministic algorithm by choosing the input; they cannot defeat a randomized algorithm because the random choices are hidden from them.

Can the adversary fight back?

What if the adversary can observe your random choices as you make them? For example, what if the adversary can somehow feed elements into the array adaptively? In the oblivious adversary model (the standard one), the adversary must commit to the input before the algorithm runs. Against an oblivious adversary, randomized quicksort is O(n log n) expected for any input.

Against an adaptive adversary (who can see your random choices and modify the input on the fly), randomized quicksort can be forced into O(n²). But this model is relevant only in cryptographic or adversarial computation settings, not in standard sorting.

The simulation below is dramatic. We sort the same already-sorted array twice: once with deterministic pivot (last element) and once with randomized pivot. Watch the recursion depth explode in the deterministic case.

Deterministic vs Randomized on Sorted Input

Left: last-element pivot (worst case). Right: random pivot. Same sorted input array.

Click Run to compare
Concept check: In the indicator variable analysis, Pr[zi compared with zj] = 2/(j-i+1). What does j-i+1 represent, and why is the numerator 2?

Chapter 5: Partition Variants

Lomuto partition is clean and simple, but it is not the only game in town. Real-world sorting libraries use more sophisticated partition schemes. Let us examine two important variants.

Hoare's Partition

Tony Hoare's original partition (from his 1962 paper) uses two pointers that start at opposite ends of the array and walk toward each other. It is more elegant and does fewer swaps on average.

python
def hoare_partition(A, lo, hi):
    """Hoare's partition: two pointers moving inward."""
    pivot = A[lo]        # pivot is the first element
    i = lo - 1
    j = hi + 1
    while True:
        # Move i right, skipping elements < pivot
        i += 1
        while A[i] < pivot:
            i += 1
        # Move j left, skipping elements > pivot
        j -= 1
        while A[j] > pivot:
            j -= 1
        # If pointers crossed, partition is done
        if i >= j:
            return j
        # Swap out-of-place elements
        A[i], A[j] = A[j], A[i]

Key differences from Lomuto:

PropertyLomutoHoare
Pivot positionLast elementFirst element (or any; swap first)
PointersOne pointer (j), one boundary (i)Two pointers (i, j) moving inward
Average swaps~n/2~n/6 (three times fewer!)
Pivot placementPivot ends at its final positionPivot may NOT be at its final position
Return valuePivot index q; recurse on [lo,q-1] and [q+1,hi]Split point j; recurse on [lo,j] and [j+1,hi]
All-equal inputO(n²) — always n-1:0 splitO(n log n) — roughly balanced split
Why fewer swaps? Lomuto scans left-to-right and swaps every element that is ≤ pivot into the left region — even elements that are already on the left side. Hoare only swaps when both pointers find an out-of-place element. On random data, about 1/3 of element pairs need swapping, giving ~n/6 swaps vs Lomuto's ~n/2.

Quicksort with Hoare Partition

The full quicksort using Hoare partition looks slightly different because Hoare returns a split point, not the pivot's final index:

python
def quicksort_hoare(A, lo, hi):
    """Quicksort using Hoare's partition scheme."""
    if lo < hi:
        j = hoare_partition(A, lo, hi)
        quicksort_hoare(A, lo, j)       # NOTE: j, not j-1
        quicksort_hoare(A, j + 1, hi)   # NOTE: j+1, not j+2

The subtlety: Hoare's partition guarantees that A[lo..j] ≤ A[j+1..hi], but the pivot might be anywhere in A[lo..j]. So we recurse on A[lo..j] (inclusive of j, because j might not be the pivot) and A[j+1..hi]. Getting this wrong — e.g., recursing on (lo, j-1) — will either miss elements or cause infinite recursion.

Hoare's Partition: Worked Example

Trace Hoare's partition on [8, 1, 6, 4, 0, 3, 9, 5] with pivot = A[0] = 8.

StepijA[i]A[j]ActionArray after
Init-18---[8, 1, 6, 4, 0, 3, 9, 5]
10785i stops at 8 (≥8), j stops at 5 (≤8). Swap.[5, 1, 6, 4, 0, 3, 9, 8]
26593i scans right, stops at 9 (≥8). j scans left, stops at 3 (≤8). i<j? 6>5, no! Return j=5.[5, 1, 6, 4, 0, 3, 9, 8]

Hoare returns j=5. The array is partitioned: A[0..5] = [5, 1, 6, 4, 0, 3] (all ≤ 8) and A[6..7] = [9, 8] (all ≥ 8). Notice the pivot 8 is NOT at position 5 — it has moved to position 7. This is the key difference from Lomuto: Hoare does not place the pivot in its final position. That is why with Hoare, we recurse on [lo..j] and [j+1..hi] (not [lo..q-1] and [q+1..hi]).

Also notice: only 1 swap was needed (positions 0 and 7). Lomuto would have needed ~4 swaps on this input. This is why Hoare's partition is preferred in practice — it does about 3x fewer swaps on average.

Hoare's Advantage on All-Equal Input

Consider the input [5, 5, 5, 5, 5] with both partition schemes:

Lomuto: Pivot = 5 (last element). Every element is ≤ 5, so the boundary i advances to the end. The pivot swaps with itself at position 4. Split: (4, 0). The next call has [5, 5, 5, 5], and the same thing happens. This is an (n-1):0 split every time, giving O(n²).

Hoare: Pivot = 5 (first element). i starts at -1, j starts at 5. i advances to 0 (A[0] = 5 ≥ 5, stop). j decrements to 4 (A[4] = 5 ≤ 5, stop). Swap A[0] and A[4]. i advances to 1, j decrements to 3. Swap. i=2, j=2, they cross. Return j=2. Split: (0..2) and (3..4) — roughly balanced! Hoare gives O(n log n) on all-equal input where Lomuto gives O(n²).

Critical implementation detail. When using Hoare's partition, the quicksort recursion MUST be quicksort(lo, j) and quicksort(j+1, hi), NOT quicksort(lo, q-1) and quicksort(q+1, hi). Using the wrong bounds is one of the most common bugs in quicksort implementations. The pivot might be anywhere in A[lo..j] — it is not at index j specifically. This also means Hoare's partition cannot be used with quickselect directly (you would need to check if the pivot ended up at index k separately).

Three-Way Partition (Dutch National Flag)

What happens when many elements are equal to the pivot? Both Lomuto and Hoare still do O(n) work on the "equal" elements in subsequent recursive calls. With many duplicates, this wastes enormous time.

Three-way partition (Dijkstra's Dutch National Flag problem) splits the array into three regions: elements < pivot, elements = pivot, and elements > pivot. All the equal elements are placed together and excluded from future recursion.

python
def three_way_partition(A, lo, hi):
    """Dutch National Flag partition.
    Returns (lt, gt) such that:
      A[lo..lt-1] < pivot
      A[lt..gt]   = pivot
      A[gt+1..hi] > pivot"""
    pivot = A[lo]
    lt = lo       # boundary of < region
    i  = lo + 1   # current element
    gt = hi       # boundary of > region

    while i <= gt:
        if A[i] < pivot:
            A[lt], A[i] = A[i], A[lt]
            lt += 1
            i += 1
        elif A[i] > pivot:
            A[gt], A[i] = A[i], A[gt]
            gt -= 1
            # don't increment i — swapped element is unexamined
        else:
            i += 1   # equal to pivot, just advance

    return lt, gt

def quicksort_3way(A, lo, hi):
    if lo < hi:
        lt, gt = three_way_partition(A, lo, hi)
        quicksort_3way(A, lo, lt - 1)   # sort < region
        quicksort_3way(A, gt + 1, hi)   # sort > region
        # = region is DONE — all duplicates of pivot are in place
When duplicates dominate. Consider sorting 1 million elements where only 100 distinct values exist (e.g., ages of people). Standard quicksort wastes time recursing into regions full of equal elements. Three-way partition is O(n) on an array where all elements are equal (one partition, both recursive calls have size 0). This is why Java's Arrays.sort uses a dual-pivot variant with three-way partitioning.

Three-Way Partition: Worked Example

Trace Dijkstra's three-way partition on [4, 2, 4, 1, 4, 3, 4, 2] with pivot = 4 (A[0]).

StepiltgtA[i]ActionArray
Init1072-[4, 2, 4, 1, 4, 3, 4, 2]
110722 < 4: swap A[lt]↔A[i], lt++, i++[2, 4, 4, 1, 4, 3, 4, 2]
221744 = 4: i++[2, 4, 4, 1, 4, 3, 4, 2]
331711 < 4: swap A[lt]↔A[i], lt++, i++[2, 1, 4, 4, 4, 3, 4, 2]
442744 = 4: i++[2, 1, 4, 4, 4, 3, 4, 2]
552733 < 4: swap A[lt]↔A[i], lt++, i++[2, 1, 3, 4, 4, 4, 4, 2]
663744 = 4: i++[2, 1, 3, 4, 4, 4, 4, 2]
773722 < 4: swap A[lt]↔A[i], lt++, i++[2, 1, 3, 2, 4, 4, 4, 4]
Done847-i > gt, stop[2, 1, 3, 2, 4, 4, 4, 4]

Result: lt=4, gt=7. The array is split into three regions:

• A[0..3] = [2, 1, 3, 2] — all < 4 (needs further sorting)

• A[4..7] = [4, 4, 4, 4] — all = 4 (DONE, never recursed into!)

Four out of eight elements were equal to the pivot and are immediately in their final positions. With standard Lomuto partition, those four elements would participate in several more levels of recursion.

Entropy-Optimal Sorting

Three-way quicksort is actually entropy-optimal for inputs with many duplicates. The entropy of a sequence is:

H = -∑k (nk/n) log(nk/n)

where nk is the count of the k-th distinct value. Three-way quicksort runs in O(nH) time, which is O(n log n) when all elements are distinct (H = log n) and O(n) when all elements are equal (H = 0). This is provably optimal — no comparison-based sorting algorithm can do better.

The simulation below compares all three partition schemes on the same array. Try the "Many Duplicates" button to see three-way partitioning shine — the swap counts tell the story.

Partition Showdown: Lomuto vs Hoare vs 3-Way

Compare swap counts across partition variants. Try "Many Duplicates" to see 3-way win.

Choose an input type
Design question: You are sorting an array of 10 million user records by age (integer 0-120). Which partition scheme would you choose, and why?

Chapter 6: The Sorting Laboratory

This is the payoff. Everything you have learned — Lomuto, Hoare, three-way, randomized pivot, median-of-three — comes together in one interactive sorting laboratory. You control the algorithm, the input distribution, and the speed. Watch comparisons, swaps, and recursion depth in real-time.

Real-world quicksort. Production sorting libraries do not use textbook quicksort. They combine multiple tricks: (1) Insertion sort for small subarrays (n < 16) — the overhead of recursion is not worth it for tiny arrays. (2) Median-of-three pivot selection — take the median of the first, middle, and last elements. Avoids worst case on sorted input without the cost of random number generation. (3) Tail call elimination — recurse on the smaller partition first, then loop back for the larger partition. Limits stack depth to O(log n). (4) Introsort — if recursion depth exceeds 2·log(n), abandon quicksort and switch to heapsort. Guarantees O(n log n) worst case. This is what C++ std::sort actually does.

Median-of-Three

Instead of picking the last element or a random element as pivot, examine three elements — the first, middle, and last — and use the median of the three. This avoids the worst case on sorted and reverse-sorted input (the most common adversarial patterns) with minimal overhead.

python
def median_of_three(A, lo, hi):
    """Move the median of A[lo], A[mid], A[hi] to A[hi]."""
    mid = (lo + hi) // 2
    if A[lo] > A[mid]:
        A[lo], A[mid] = A[mid], A[lo]
    if A[lo] > A[hi]:
        A[lo], A[hi] = A[hi], A[lo]
    if A[mid] > A[hi]:
        A[mid], A[hi] = A[hi], A[mid]
    # Now A[lo] ≤ A[mid] ≤ A[hi]. Use A[mid] as pivot.
    A[mid], A[hi - 1] = A[hi - 1], A[mid]
    return A[hi - 1]

Introsort: The Safety Net

Introsort (Musser, 1997) begins with quicksort but monitors the recursion depth. If it exceeds 2·floor(log2(n)), we know we have hit a bad pivot sequence. Introsort bails out and switches to heapsort, which guarantees O(n log n) worst case. The result: O(n log n) worst case with quicksort's excellent average-case constant factors.

python
import math

def introsort(A, lo, hi, depth_limit):
    if hi - lo < 16:
        insertion_sort(A, lo, hi)    # small arrays: insertion sort
        return
    if depth_limit == 0:
        heapsort(A, lo, hi)          # depth exceeded: heapsort
        return
    # Otherwise: quicksort with median-of-three
    pivot = median_of_three(A, lo, hi)
    q = lomuto_partition(A, lo, hi)
    introsort(A, lo, q - 1, depth_limit - 1)
    introsort(A, q + 1, hi, depth_limit - 1)

def sort(A):
    """Production-grade sort: introsort."""
    n = len(A)
    depth_limit = 2 * int(math.log2(n)) if n > 1 else 0
    introsort(A, 0, n - 1, depth_limit)

Now — the big simulation. Sort up to 120 elements with your choice of algorithm and input pattern.

The Sorting Laboratory

Choose an algorithm and input type. Watch comparisons, swaps, and recursion depth in real-time.

Algorithm
Input
Speed
Array size 40
Comparisons: 0 | Swaps: 0 | Max Depth: 0

Why Insertion Sort for Small Arrays?

Quicksort has overhead per recursive call: function call, partition setup, and base-case checks. For tiny subarrays (n < 16), this overhead exceeds the actual sorting work. Insertion sort has almost zero overhead and is very fast on small arrays because:

(1) It compiles to a tight inner loop — compare, shift, repeat.

(2) Small arrays fit entirely in L1 cache, so memory access is instant.

(3) On nearly-sorted data (which small subarrays from quicksort often are), insertion sort is O(n).

python
def insertion_sort(A, lo, hi):
    """Sort A[lo..hi] using insertion sort.
    Optimal for small subarrays (n < ~16)."""
    for i in range(lo + 1, hi + 1):
        key = A[i]
        j = i - 1
        while j >= lo and A[j] > key:
            A[j + 1] = A[j]
            j -= 1
        A[j + 1] = key

The threshold (16, 24, or 32) varies by implementation and hardware. C++ std::sort typically uses 16. Java's dual-pivot quicksort uses 47. The optimal value can be found by benchmarking on the target machine.

Analyzing the Threshold

Let us think about this quantitatively. Quicksort on a subarray of size k has overhead: function call (~5 ns), partition setup (~3 ns), base-case check (~1 ns), plus the actual partition work. For k = 4, the partition does 3 comparisons and ~1.5 swaps. Total: ~15 ns of overhead + ~10 ns of real work = ~25 ns. Insertion sort on k = 4 does at most 6 comparisons (worst case 4·3/2 = 6) with a tight inner loop: ~12 ns total. The crossover point is where quicksort's overhead exceeds insertion sort's total work.

// Quicksort per-call overhead: ~C_qs ns (function call + setup)
// Insertion sort on k elements: ~k²/4 comparisons (average case)
// Crossover: C_qs = k²/4 · C_cmp
// With C_qs ≈ 15 ns and C_cmp ≈ 2 ns:
// 15 = k²/4 · 2 → k² = 30 → k ≈ 5-6
// But in practice k = 16-32 is better because:
// (1) Insertion sort is branchless and tight on small arrays
// (2) Small arrays fit entirely in L1 cache
// (3) The quadratic term is still tiny for k < 32

Dual-Pivot Quicksort

Vladimir Yaroslavskiy's dual-pivot quicksort (2009) uses two pivots to create three partitions:

// Choose two pivots: p1 ≤ p2
// Partition into three regions:
A[lo..lt-1]    < p1
A[lt..gt]      ≥ p1 and ≤ p2
A[gt+1..hi]    > p2

// Three recursive calls instead of two:
quicksort(A, lo, lt - 1)
quicksort(A, lt, gt)
quicksort(A, gt + 1, hi)

Why is this faster? It does more comparisons per element (~1.9n vs ~1.4n per level), but the number of levels is fewer: each partition shrinks the subarray to ~n/3 instead of ~n/2, so the depth is log3(n) instead of log2(n). The net result is slightly fewer total cache misses and slightly fewer total swaps. Java benchmarks show a ~10% speedup over single-pivot quicksort on modern hardware.

Try these experiments:

ExperimentAlgorithmInputWhat to observe
1BasicSortedWatch the recursion depth explode — O(n) depth, O(n²) total work
2RandomizedSortedSame input, but recursion depth stays ~O(log n)
3BasicMany DuplicatesLots of wasted swaps on equal elements
4IntrosortSortedDetects the pathological case and switches to heapsort
5Median-of-ThreeRandomConsistently balanced partitions with minimal overhead
6RandomizedMany DuplicatesStill does extra work on duplicates — compare to 3-way
7BasicNearly SortedAlmost as bad as fully sorted — a few swaps don't help much

What the Numbers Tell You

After running these experiments, you should see these patterns:

Comparisons tell you about the algorithm's time complexity. O(n²) algorithms will show comparison counts in the tens of thousands for n=100. O(n log n) algorithms will show ~700 comparisons for n=100.

Swaps tell you about the algorithm's data movement. Fewer swaps = less time spent writing to memory. Hoare's partition typically does 3x fewer swaps than Lomuto.

Max depth tells you about stack space. O(n) depth means the algorithm could stack overflow on large inputs. O(log n) depth (5-7 for n=100) means the stack is safe. Introsort caps depth at 2 log n.

The showcase simulation lets you verify every claim in this lesson empirically. Do not just read about quicksort — watch it.

Chapter 7: Quickselect

Sometimes you do not need to sort the entire array. You just need the k-th smallest element. Finding the median of a dataset. Finding the 95th percentile of response times. Finding the top-10 most frequent search queries. Must you sort the entire array first?

Sorting costs O(n log n). But finding a single order statistic (the k-th smallest element) should intuitively cost less — you are asking one question about one element, not fully ordering all n elements. Can we do better?

No. Quickselect (also by Hoare, 1961) uses the partition subroutine but only recurses into one side — the side that contains the k-th element. This reduces the expected time from O(n log n) to O(n).

The Algorithm

After partitioning, the pivot is at index q. Three cases:

k = q
The pivot IS the k-th smallest. Return it. Done.
k < q
The k-th smallest is in the left partition. Recurse on A[lo..q-1].
k > q
The k-th smallest is in the right partition. Recurse on A[q+1..hi].
python
def quickselect(A, lo, hi, k):
    """Return the k-th smallest element (0-indexed) in A[lo..hi]."""
    if lo == hi:
        return A[lo]

    q = randomized_partition(A, lo, hi)

    if k == q:
        return A[q]
    elif k < q:
        return quickselect(A, lo, q - 1, k)
    else:
        return quickselect(A, q + 1, hi, k)

# Find the median of an array:
A = [7, 3, 9, 1, 5, 8, 2]
median = quickselect(A, 0, len(A) - 1, len(A) // 2)
print(median)  # 5

Why O(n) Expected Time?

In quicksort, we recurse on both sides: T(n) = T(left) + T(right) + O(n). In quickselect, we recurse on only one side. With a random pivot, the expected size of that side is n/2 (on average, the pivot is near the median). So:

T(n) = T(n/2) + O(n)    (expected, recurse on one side)
= O(n) + O(n/2) + O(n/4) + ...
= O(n) · (1 + 1/2 + 1/4 + ...)
= O(n) · 2
= O(n)

The geometric series converges to 2. We do at most 2n comparisons in expectation. Compare this to sorting, which requires Ω(n log n) comparisons. By only pursuing one branch of the recursion, we turned an O(n log n) algorithm into an O(n) algorithm.

Think of it this way: quicksort does O(n) work per level and has O(log n) levels, giving O(n log n). Quickselect does O(n) work at level 1, O(n/2) at level 2, O(n/4) at level 3, and so on — because it discards half the elements at each level. The total is a geometric series that converges to O(n).

Quickselect vs Sorting: When to Use Which

TaskBest algorithmTime
Sort the entire arrayQuicksort / introsortO(n log n)
Find the k-th smallestQuickselectO(n) expected
Find the medianQuickselect with k = n/2O(n) expected
Find the top k elements (unsorted)Quickselect with k, then take A[n-k:]O(n) expected
Find the top k elements (sorted)Quickselect + sort the k elementsO(n + k log k)
Find the top k elements (streaming, unknown n)Min-heap of size kO(n log k)
Worst case is still O(n²). If we are unlucky and always partition off just one element, quickselect degrades to T(n) = T(n-1) + O(n) = O(n²). Randomization makes this astronomically unlikely. For a deterministic O(n) worst-case algorithm, see the median of medians algorithm (CLRS Section 9.3), which uses groups of 5 to guarantee a good pivot. However, its constant factor is large (about 5x slower in practice), so randomized quickselect is preferred.

Median of Medians (Deterministic O(n) Selection)

The median of medians algorithm (also called BFPRT after its inventors: Blum, Floyd, Pratt, Rivest, Tarjan) guarantees a good pivot. The idea:

1. Divide into groups of 5
Split the n elements into ⌈n/5⌉ groups of 5 (the last group may have fewer).
2. Find each group's median
Sort each group of 5 (constant time per group) and take the middle element. This gives ⌈n/5⌉ medians.
3. Recursively find the median of medians
Call select recursively on the ⌈n/5⌉ medians to find their median. This is the pivot.
4. Partition around this pivot
Use this "median of medians" as the pivot for partition. It is guaranteed that at least 30% of elements are ≤ pivot and at least 30% are ≥ pivot.
5. Recurse on the correct side
As in quickselect, recurse only into the side containing the k-th element. The worst case is at most 70% of n elements.
// Recurrence for median of medians:
T(n) = T(n/5) + T(7n/10) + O(n)

// n/5 for the recursive call to find the median of medians
// 7n/10 for the worst-case partition (at most 70% of elements)
// O(n) for the partition step

// Since 1/5 + 7/10 = 9/10 < 1, the recurrence solves to:
T(n) = O(n)
Why groups of 5? Groups of 3 give a 2/3 : 1/3 split, and T(n) = T(n/3) + T(2n/3) + O(n) which is O(n log n), not O(n). Groups of 5 are the smallest group size that guarantees the recurrence converges. Groups of 7 or more also work but have larger constant factors.

Worked Example: Quickselect for k=3

Find the 3rd smallest element (0-indexed) in [7, 3, 9, 1, 5, 8, 2, 6].

StepSubarrayPivotAfter partitionPivot index qk vs q
1[7,3,9,1,5,8,2,6]6 (random)[3,1,5,2,6,8,9,7]43 < 4 → go left
2[3,1,5,2]2 (random)[1,2,5,3]13 > 1 → go right
3[5,3]3 (random)[3,5]23 = target index → but we are in subarray starting at idx 2, so global idx = 2+0=2... wait, let us be more careful.

Actually, let us trace this more carefully with global indices. The array is A = [7, 3, 9, 1, 5, 8, 2, 6]. We want k=3 (the element that would be at index 3 in sorted order, which is 5 since sorted = [1, 2, 3, 5, 6, 7, 8, 9]).

// Call quickselect(A, 0, 7, k=3)
// Suppose random pivot = A[7] = 6
// After Lomuto partition: [3, 1, 5, 2, 6, 8, 9, 7], q=4
// k=3 < q=4, so recurse left: quickselect(A, 0, 3, k=3)

// Subarray is [3, 1, 5, 2], pivot = A[3] = 2
// After partition: [1, 2, 5, 3], q=1
// k=3 > q=1, so recurse right: quickselect(A, 2, 3, k=3)

// Subarray is [5, 3], pivot = A[3] = 3
// After partition: [3, 5], q=2
// k=3 > q=2, so recurse right: quickselect(A, 3, 3, k=3)

// lo == hi == 3, return A[3] = 5. Answer: 5 ✓

Applications of Selection

Selection algorithms are used constantly in real systems, often without people realizing it:

Databases: SELECT * FROM orders ORDER BY amount LIMIT 10 does not need to sort all orders. It needs the top 10, which is a selection problem.

Machine learning: Finding the median of a feature column for splitting in a decision tree. Computing percentiles for normalization.

Statistics: Interquartile range (IQR) requires finding the 25th and 75th percentiles. Outlier detection often uses percentile-based thresholds.

Image processing: Median filtering (replacing each pixel with the median of its neighbors) requires O(k) selection per pixel, not O(k log k) sorting.

Networking: Finding the 95th percentile latency from a stream of request times. Load balancers use percentile-based health checks.

Graphics: K-nearest-neighbor search often uses selection to find the k-th closest point, then takes all points closer than it.

Watch quickselect find the k-th element. Notice how it only recurses into one partition — the work drops geometrically each level.

Quickselect: Finding the k-th Element

Watch quickselect find the k-th smallest. Only ONE side is recursed into (highlighted). Gray elements are eliminated.

Find k-th smallest (k) 7
Click Find to start
Interview question: You need to find the 10 largest elements in an array of 10 million numbers. What is the most efficient approach?

Chapter 8: Quicksort in Practice

Quicksort's O(n log n) average case is the same as merge sort and heapsort. So why is quicksort the default in most standard libraries? The answer lies in constant factors, memory access patterns, and practical engineering.

Why Quicksort Wins

Cache-friendliness. Quicksort accesses elements sequentially — the partition scan moves linearly through the array. Modern CPUs have multiple levels of cache (L1, L2, L3), and sequential access patterns exploit spatial locality: when you access A[i], the hardware pre-fetches A[i+1] through A[i+63] into cache. Merge sort, by contrast, alternates between reading from two subarrays and writing to a temporary buffer — three different memory regions that may not all fit in cache. On large arrays, cache misses dominate running time, and quicksort's sequential access pattern is 2-3x faster in practice.

In-place. Quicksort needs O(log n) stack space for recursion, but no auxiliary array. Merge sort needs O(n) extra space for the merge buffer. For sorting 1 GB of data, that is an extra 1 GB of memory. In memory-constrained environments (embedded systems, databases), this is a deal-breaker.

Small constant factors. Quicksort's inner loop (compare, conditionally swap, increment pointer) compiles to very tight machine code — a handful of instructions. Heapsort's inner loop involves index arithmetic for tree navigation (left child = 2i+1, parent = (i-1)/2) and more complex swap patterns, adding overhead even though the asymptotic complexity is the same.

When NOT to Use Quicksort

SituationProblem with QuicksortBetter Choice
Need stabilityEqual elements may be reorderedMerge sort (stable)
Need O(n log n) guaranteeWorst case is O(n²)Merge sort or introsort
Linked listsNo random access for partitionMerge sort (natural for lists)
External sorting (disk)Random access is slow on diskMerge sort (sequential I/O)
Parallel sortingPartition is sequential bottleneckMerge sort (natural parallelism)
Nearly sorted dataO(n²) without randomizationTimsort (O(n) on sorted data)

What Standard Libraries Actually Use

Language/LibraryAlgorithmDetails
C++ std::sortIntrosortQuicksort + heapsort fallback + insertion sort for small arrays
Java Arrays.sort (primitives)Dual-pivot quicksortTwo pivots create three partitions; switches to insertion sort for small arrays
Java Arrays.sort (objects)TimsortMerge sort variant optimized for partially sorted data (stable)
Python sorted()TimsortMerge sort + insertion sort, stable, O(n) on sorted input
Rust sort_unstablePattern-defeating quicksortQuicksort + heapsort fallback, detects patterns (sorted runs, duplicates)
Go sort.SortPattern-defeating quicksortSimilar to Rust's, with median-of-three and insertion sort for small arrays
C qsort()Varies by implementationglibc uses merge sort (for stability). musl uses smoothsort. BSD uses introsort.
.NET Array.SortIntrosortQuicksort + heapsort + insertion sort, similar to C++

Notice a pattern: every modern standard library uses a hybrid algorithm, not pure quicksort. The hybrid combines quicksort's excellent average-case performance with fallbacks for edge cases (heapsort for worst-case depth, insertion sort for small arrays, three-way partition for duplicates). This is the engineering reality: no single algorithm is best for all inputs.

Timsort: The Merge Sort Alternative

Python and Java (for objects) use Timsort, invented by Tim Peters in 2002. It is a hybrid merge sort that:

(1) Detects existing sorted runs in the input (ascending or descending).

(2) Uses insertion sort to extend short runs to a minimum run length (~32).

(3) Merges runs using a sophisticated merge strategy that minimizes comparisons.

Timsort's killer feature: it is adaptive. On already-sorted input, Timsort runs in O(n) time (it just detects the single run and returns). On nearly-sorted input with a few inversions, it runs in O(n + n_inversions). On random data, it falls back to O(n log n) like standard merge sort.

The trade-off: Timsort requires O(n) extra space (for the merge buffer) and is ~30% slower than quicksort on random data due to worse cache behavior. But it is stable and adaptive, making it ideal for sorting objects with complex comparison functions (where the number of comparisons matters more than cache behavior).

The design interview question. "You're building a database ORDER BY clause. Which sort algorithm?" The answer depends on context. For in-memory sorting of primitive columns: introsort (quicksort + heapsort fallback). For sorting with complex comparison keys where stability matters: Timsort. For sorting data that does not fit in memory (external sort): multi-way merge sort with B-way merges, where B is tuned to the disk block size. For sorting in a distributed system: sample sort (a parallel variant of quicksort that partitions data across machines).

Cache Analysis: Why Quicksort Really Wins

Let us quantify the cache advantage. A modern CPU has an L1 cache of ~32 KB with 64-byte cache lines. When sorting 4-byte integers, each cache line holds 16 elements.

// Quicksort's partition scan:
// j moves left-to-right through A[lo..hi]
// Access pattern: A[lo], A[lo+1], A[lo+2], ...
// Cache misses: n/16 (one miss per cache line)

// Merge sort's merge step:
// Read from A[lo..mid] and A[mid+1..hi] alternately
// Write to B[lo..hi] (auxiliary array)
// THREE active memory regions, potentially evicting each other
// Cache misses: up to 3n/16 in the worst case

// For n = 1,000,000:
// Quicksort total cache misses ≈ 20 × n/16 ≈ 1.25M
// Merge sort total cache misses ≈ 20 × 3n/16 ≈ 3.75M
// Each L1 cache miss costs ~4 ns
// Quicksort cache penalty: ~5 ms
// Merge sort cache penalty: ~15 ms

This 3x difference in cache misses, combined with merge sort's O(n) memory allocation overhead, is why quicksort is 2-3x faster in practice despite doing 39% more comparisons.

Branch Prediction

Modern CPUs predict which way a branch (if/else) will go before they know the answer. If the prediction is correct, execution continues at full speed. If wrong, the CPU must flush its pipeline and restart — costing ~15 clock cycles.

In quicksort's partition loop, the branch if A[j] <= pivot is taken about 50% of the time on random data. This is the worst case for branch prediction — the predictor cannot guess better than random. Each misprediction costs ~15 cycles.

Pattern-defeating quicksort (pdqsort) addresses this with block partitioning: instead of branching on every comparison, it scans a block of ~64 elements, writes the indices of elements that need swapping into a small buffer, and then performs all swaps in a separate loop. The index-writing loop has no data-dependent branches (it always writes, using conditional moves), eliminating misprediction entirely. The swap loop is also branchless. The result: ~30% faster partition on random data.

// Standard partition: 1 branch per element, ~50% misprediction
// Cost per element: ~4 cycles (compare) + ~7.5 cycles (misprediction penalty)
// = ~11.5 cycles per element

// Block partition: 0 branches per element (conditional moves)
// Cost per element: ~4 cycles (compare + cmov) + ~2 cycles (buffer write)
// = ~6 cycles per element (almost 2x faster!)
Memory Access Pattern Comparison

Quicksort scans linearly (cache-friendly). Merge sort alternates between three memory regions (cache-unfriendly).

Click to visualize memory accesses

Tail Call Optimization

Standard quicksort uses O(n) stack space in the worst case (one frame per recursion level). We can guarantee O(log n) stack space with a simple trick: always recurse on the smaller partition, and use a loop for the larger one.

python
def quicksort_tail(A, lo, hi):
    """Quicksort with tail call elimination.
    Stack depth is always O(log n)."""
    while lo < hi:
        q = randomized_partition(A, lo, hi)
        # Recurse on the smaller partition, loop on the larger
        if q - lo < hi - q:
            quicksort_tail(A, lo, q - 1)  # smaller left side
            lo = q + 1                      # iterate on larger right side
        else:
            quicksort_tail(A, q + 1, hi)  # smaller right side
            hi = q - 1                      # iterate on larger left side

Why does this limit the stack depth to O(log n)? Each recursive call is on a subarray of size at most n/2 (since we chose the smaller side). After log2(n) recursive calls, the subarray size reaches 1. The larger side is handled by the while loop, which reuses the current stack frame. No new stack frame is allocated for the larger partition.

This is a standard optimization that all production sorting libraries use. In C++, the compiler can sometimes perform tail call optimization automatically, but relying on the compiler is fragile — it is better to write the code explicitly as a loop + recursion on the smaller side.

The combination of tail call optimization with randomized pivot gives O(log n) expected stack depth and O(log n) guaranteed maximum stack depth. Without tail call optimization, randomized pivot gives O(log n) expected stack depth but O(n) worst case (astronomically unlikely but not impossible).

TechniqueExpected stack depthWorst-case stack depth
Basic quicksortO(log n)O(n)
Randomized pivotO(log n)O(n) (extremely unlikely)
Tail call optimizationO(log n)O(log n) guaranteed
Introsort (with heapsort fallback)O(log n)O(log n) guaranteed

Debug Scenarios

Scenario 1: "Sort takes quadratic time on some inputs."

Diagnosis: You are using deterministic pivot selection (first or last element) and the input is sorted or nearly sorted. Fix: switch to randomized pivot or median-of-three. Better: use introsort to guarantee O(n log n).

Scenario 2: "Sort produces wrong results."

Most common bugs in quicksort implementations: (1) Off-by-one in the partition boundary — using quicksort(A, lo, q) instead of quicksort(A, lo, q-1) with Lomuto partition causes infinite recursion when the pivot is at lo. (2) Forgetting to handle the base case (lo ≥ hi). (3) Using Hoare partition with Lomuto-style recursion bounds — Hoare returns a split point, not the pivot index.

Scenario 3: "Stack overflow on large inputs."

Recursion depth is O(n) in worst case. Fix: (1) Use randomized pivot to make O(n) depth astronomically unlikely. (2) Use tail call optimization — recurse on the smaller partition, iterate on the larger one, capping stack depth at O(log n). (3) Use introsort to switch to heapsort after depth 2·log(n).

Scenario 4: "Sort is very slow on data with many duplicates."

Diagnosis: Standard Lomuto partition always splits at the pivot, but if many elements equal the pivot, they all end up on one side (usually the left), creating a heavily unbalanced split. For n elements all equal, Lomuto gives an (n-1):0 split every time, resulting in O(n²). Fix: use three-way partition to group all equal elements together and exclude them from recursion.

Scenario 5: "Sort works correctly on small inputs but crashes on n > 50,000."

Diagnosis: Stack overflow from deep recursion. Python's default recursion limit is 1000. With worst-case pivot selection on sorted input, recursion depth equals n. Fix: (1) Add sys.setrecursionlimit(n + 100) (hacky). (2) Better: use tail call optimization to cap stack depth at O(log n). (3) Best: use iterative quicksort with an explicit stack.

python
def quicksort_iterative(A):
    """Iterative quicksort using an explicit stack."""
    stack = [(0, len(A) - 1)]
    while stack:
        lo, hi = stack.pop()
        if lo >= hi:
            continue
        q = randomized_partition(A, lo, hi)
        # Push the larger partition first (processed last)
        # so the smaller partition is processed next (stack discipline)
        if q - lo > hi - q:
            stack.append((lo, q - 1))
            stack.append((q + 1, hi))
        else:
            stack.append((q + 1, hi))
            stack.append((lo, q - 1))

Comparison: The Big Three Sorting Algorithms

In interviews, you should be able to discuss the trade-offs between quicksort, merge sort, and heapsort fluently. Here is the complete picture:

PropertyQuicksortMerge SortHeapsort
Average time1.39n log nn log n~2n log n
Worst timeO(n²)*O(n log n)O(n log n)
Extra spaceO(log n) stackO(n) bufferO(1)
Stable?NoYesNo
Cache behaviorExcellent (sequential)Poor (3 memory regions)Poor (tree-indexed jumps)
Adaptive?No (without pdqsort)Yes (Timsort variant)No
Parallelizable?Moderately (after partition)Highly (natural parallelism)Poorly
Best forGeneral-purpose in-memoryExternal sort, stability neededGuaranteed worst case, no extra space

* O(n²) worst case is eliminated by introsort (quicksort + heapsort fallback).

The one-sentence summary for each. Quicksort: fastest in practice due to cache-friendliness, but needs randomization to avoid worst case. Merge sort: guaranteed O(n log n) and stable, but uses O(n) extra space. Heapsort: guaranteed O(n log n) and in-place, but poor cache behavior makes it 2-3x slower than quicksort in practice.
Design question: You are sorting 50 GB of records on a machine with 4 GB of RAM. Why is quicksort a poor choice, and what would you use instead?

Chapter 9: Interview Arsenal

This is your cheat sheet. Every variant, every complexity, every common interview question — organized for rapid review.

Complexity Summary

AlgorithmBestAverageWorstSpaceStable?
Quicksort (basic)O(n log n)O(n log n)O(n²)O(log n)No
Quicksort (randomized)O(n log n)O(n log n)O(n²)*O(log n)No
Quicksort (3-way)O(n)O(n log n)O(n²)*O(log n)No
IntrosortO(n log n)O(n log n)O(n log n)O(log n)No
QuickselectO(n)O(n)O(n²)*O(log n)N/A
Merge sortO(n log n)O(n log n)O(n log n)O(n)Yes
HeapsortO(n log n)O(n log n)O(n log n)O(1)No

* Worst case is astronomically unlikely with randomized pivot. Expected time is the average column.

Coding Drills

These are the implementations you should be able to write from memory in an interview.

1. Lomuto partition — The building block. 10 lines. Must handle the invariant correctly.

2. Quicksort — 5 lines wrapping partition. Get the recursion bounds right (lo, q-1) and (q+1, hi).

3. Randomized quicksort — Add one line: swap random element to pivot position.

4. Three-way partition — The Dutch National Flag. Three pointers: lt, i, gt. Handle the "do not increment i when swapping from gt" case.

5. Quickselect — Like quicksort but recurse on ONE side only. Return value, not void.

The Five Interview Dimensions Applied to Quicksort

DimensionWhat they testExample questionStaff-level answer adds
ConceptFirst-principles math"Derive the expected number of comparisons in randomized quicksort"Explains the indicator variable technique, harmonic series, and why 1.39n log n is acceptable despite being 39% more than merge sort's n log n
DesignSystem architecture"Design a sorting module for a database engine"Discusses introsort for in-memory, external merge sort for disk, stability requirements for complex keys, and fallback strategies
CodeImplementation skill"Implement quickselect for the k-th smallest element"Handles edge cases (k out of bounds, empty array), uses randomized pivot, discusses iterative vs recursive
DebugFailure diagnosis"Our sort is O(n²) on production data"Checks input distribution (sorted? many duplicates?), pivot selection strategy, recursion depth monitoring
FrontierResearch awareness"What is pattern-defeating quicksort?"Explains pdqsort: detects sorted runs, falls back to heapsort, uses block partitioning to avoid branch mispredictions

Pattern-Defeating Quicksort (Frontier)

The latest evolution of quicksort is pattern-defeating quicksort (pdqsort), used in Rust's sort_unstable and Go's sort.Sort. It combines several ideas:

1. Block partitioning. Instead of branching on every comparison (which causes branch misprediction ~50% of the time on random data), pdqsort buffers the indices of elements that need swapping into small fixed-size blocks. This converts unpredictable branches into predictable sequential reads, giving a 15-30% speedup on modern CPUs.

2. Pattern detection. If the algorithm detects that the input is already sorted (the partition always puts the pivot at the end), it switches to a linear-time check and returns immediately. If it detects many equal elements, it switches to three-way partitioning.

3. Heapsort fallback. Like introsort, it monitors recursion depth and switches to heapsort if it exceeds 2·log(n), guaranteeing O(n log n) worst case.

4. Insertion sort for small arrays. For subarrays smaller than 24 elements, it uses insertion sort (with sentinel optimization).

The practical upshot. pdqsort is never more than 5% slower than the best known algorithm for any input pattern, and it is significantly faster than introsort on inputs with patterns (sorted, nearly sorted, many duplicates, pipe-organ shaped). If you are implementing a sort for a standard library today, pdqsort is the state of the art.

Lower Bounds: Why O(n log n) Is Optimal

Can we sort faster than O(n log n) using comparisons? No. Here is the proof, and interviewers sometimes ask for it.

Any comparison-based sorting algorithm can be modeled as a decision tree: a binary tree where each internal node is a comparison "A[i] ≤ A[j]?", the left child is the "yes" branch, and the right child is the "no" branch. Each leaf represents a specific permutation of the input.

There are n! possible permutations of n elements. The decision tree must have at least n! leaves (one for each permutation). A binary tree with L leaves has height at least log2(L). Therefore:

// Minimum height (worst-case comparisons):
h ≥ log2(n!)

// By Stirling's approximation: n! ≈ (n/e)^n · sqrt(2πn)
log2(n!) ≈ n log2(n) - n log2(e) + O(log n)
= n log2(n) - 1.44n + O(log n)
= Ω(n log n)

This means any comparison-based sorting algorithm must make at least Ω(n log n) comparisons in the worst case. Quicksort, merge sort, and heapsort are all optimal to within constant factors. The Ω(n log n) lower bound is tight — it is both necessary and achievable.

Breaking the bound. The O(n log n) bound applies only to comparison-based sorts. Non-comparison sorts like counting sort, radix sort, and bucket sort can achieve O(n) time by exploiting structure in the keys (integer values, fixed-length strings). But they require assumptions about the input that quicksort does not need.

The Evolution of Quicksort: A Timeline

YearContributionAuthorImpact
1960Quicksort inventedHoareThe original algorithm with Hoare partition
1962Published with analysisHoareFirst formal average-case analysis
1975Comprehensive analysisSedgewick (PhD thesis)Analyzed all major variants
1993Engineering a Sort FunctionBentley & McIlroyMedian-of-three, fat partition, production-grade C qsort
1997IntrosortMusserHeapsort fallback, O(n log n) worst case
2002TimsortPetersMerge sort + insertion sort, Python/Java standard
2009Dual-pivot quicksortYaroslavskiyTwo pivots, three partitions, Java standard
2021Pattern-defeating quicksortPetersAdaptive detection, Rust/Go standard

Common Interview Problems

ProblemTechniqueTime
K-th Largest Element (LC 215)Quickselect with k = n-kO(n) expected
Sort Colors (LC 75)Three-way partition (Dutch National Flag)O(n), one pass
Top K Frequent Elements (LC 347)Quickselect on frequency countsO(n) expected
Wiggle Sort II (LC 324)Quickselect for median + 3-way partitionO(n)
K Closest Points to Origin (LC 973)Quickselect on distance valuesO(n) expected
Sort an Array (LC 912)Randomized quicksort (or introsort)O(n log n) expected

Quick Correctness Check

Before we look at specific problems, let us establish a testing pattern for any quicksort-family algorithm. These are the inputs that break incorrect implementations:

Test caseWhy it breaks thingsWhat to verify
Empty array []Edge case: lo > hiShould return immediately, no crash
Single element [5]Base case: lo == hiShould return immediately
Two elements [2, 1]Smallest non-trivial caseShould swap to [1, 2]
Already sorted [1,2,3,4,5]Triggers worst case without randomizationMust still produce correct output
Reverse sorted [5,4,3,2,1]Also triggers worst caseMust still produce correct output
All equal [3,3,3,3,3]Lomuto gives n-1:0 split every timeMust terminate (not infinite loop)
Two distinct values [1,2,1,2,1]Tests partition with many duplicatesMust sort correctly
Large random array (n=10000)Tests performance, not just correctnessShould complete in < 1 second

Sort Colors: Worked Example

LeetCode 75: Given an array with values 0, 1, 2 (red, white, blue), sort it in one pass. This is literally the Dutch National Flag problem.

python
def sortColors(nums):
    """Sort [0,1,2] in-place using three-way partition."""
    lo = 0        # boundary of 0-region
    i  = 0        # current element
    hi = len(nums) - 1  # boundary of 2-region

    while i <= hi:
        if nums[i] == 0:
            nums[lo], nums[i] = nums[i], nums[lo]
            lo += 1
            i += 1
        elif nums[i] == 2:
            nums[hi], nums[i] = nums[i], nums[hi]
            hi -= 1
            # don't increment i
        else:
            i += 1

K-th Largest Element: Worked Example

python
def findKthLargest(nums, k):
    """Find the k-th largest element using quickselect."""
    # k-th largest = (n-k)-th smallest (0-indexed)
    target = len(nums) - k

    def select(lo, hi):
        # Randomized partition
        r = random.randint(lo, hi)
        nums[r], nums[hi] = nums[hi], nums[r]
        pivot = nums[hi]
        i = lo - 1
        for j in range(lo, hi):
            if nums[j] <= pivot:
                i += 1
                nums[i], nums[j] = nums[j], nums[i]
        nums[i+1], nums[hi] = nums[hi], nums[i+1]
        q = i + 1

        if q == target:
            return nums[q]
        elif q < target:
            return select(q + 1, hi)
        else:
            return select(lo, q - 1)

    return select(0, len(nums) - 1)

Key Takeaways for Interviews

Mock Interview: End-to-End

Here is how a complete quicksort interview question might unfold at a top tech company.

Interviewer: "Implement a function that sorts an array of integers in-place."

You (clarifying): "I'll implement quicksort. A few clarifications: Can I modify the input array? Are there constraints on space? Do I need stability?"

Interviewer: "In-place modification is fine. Minimize extra space. Stability not needed."

You: Write Lomuto partition first. Then write the quicksort wrapper. Then add randomized pivot selection.

Interviewer: "What's the time complexity?"

You: "O(n log n) expected with randomized pivot. Worst case O(n²), but the probability of significantly exceeding 2n ln n comparisons is exponentially small. If we need a hard guarantee, we can add introsort: monitor recursion depth and switch to heapsort after 2 log n levels, giving O(n log n) worst case."

Interviewer: "This works, but when I test it on an array of all identical elements, it is very slow. Why?"

You: "With Lomuto partition, an all-equal array always produces an (n-1):0 split because every element is ≤ pivot, so the entire array moves to the left of the pivot. The fix is three-way partition — Dijkstra's Dutch National Flag — which groups all elements equal to the pivot together and never recurses into them. On all-equal input, three-way partition runs in O(n): one partition call, both recursive calls have size 0."

You then write the three-way partition and modify quicksort to use it.

Interviewer: "Good. Now what if I asked you to find the median instead of sorting?"

You: "Quickselect. Same partition, but recurse only into the side containing the target index. Expected O(n) time."

This conversation demonstrates all five dimensions: concept (complexity analysis), design (choosing the right variant), code (clean implementation), debug (diagnosing the all-equal input), and frontier (mentioning introsort and pdqsort).

The five things to know cold. (1) Quicksort is O(n log n) average, O(n²) worst. Randomized pivot fixes the worst case probabilistically. (2) Partition is the key subroutine — understand Lomuto, Hoare, and three-way. (3) Quickselect finds the k-th element in O(n) expected time by recursing on only one side. (4) Three-way partition is essential when duplicates are common (Sort Colors, real-world data with few distinct values). (5) In practice, production sorts use introsort (quicksort + heapsort fallback) or pattern-defeating quicksort, not textbook quicksort.

Implementation Bug Checklist

These are the bugs that will cost you the interview. Memorize them.

BugSymptomFix
Wrong recursion bounds with LomutoInfinite recursion or wrong outputRecurse on (lo, q-1) and (q+1, hi), NOT (lo, q) and (q+1, hi)
Wrong recursion bounds with HoareInfinite recursionRecurse on (lo, j) and (j+1, hi), NOT (lo, j-1) and (j+1, hi)
Forgetting base caseStack overflowCheck lo < hi (not lo != hi) to handle empty subarrays
Partition returns wrong indexSorted output has elements in wrong positionsLomuto must return i+1 (not i). Test with [2,1].
Off-by-one in random pivotPivot is sometimes out of boundsUse randint(lo, hi) inclusive, not randint(lo, hi-1)
Not incrementing i in three-way when swapping from gtSkips elements or infinite loopWhen A[i] > pivot, swap A[i] and A[gt], decrement gt, but do NOT increment i (the swapped element is unexamined)
Using quicksort on linked listO(n) time per partition just for traversalUse merge sort for linked lists

Whiteboard Template

When asked to implement quicksort in an interview, write it in this order:

Step 1: Partition (the hard part). Write and test this first.

Step 2: Quicksort wrapper (3 lines). Base case, partition call, two recursive calls.

Step 3: Randomized pivot (1 extra line). Swap random element to pivot position.

Step 4: Mention optimizations. "In production I would add insertion sort for small subarrays, median-of-three, and introsort fallback."

This shows you know both the fundamentals and the engineering. Most candidates stop at Step 2.

Recommended Reading

ResourceWhat you learn
CLRS Chapter 7The formal analysis: recurrences, indicator variables, expected comparisons
Sedgewick's PhD thesis (1975)The most thorough analysis of quicksort variants ever written
Musser, "Introspective Sorting and Selection" (1997)The introsort algorithm — quicksort with a heapsort safety net
Bentley & McIlroy, "Engineering a Sort Function" (1993)How to build a production-grade sort. The paper behind C's qsort.
Yaroslavskiy, "Dual-Pivot Quicksort" (2009)The algorithm behind Java's Arrays.sort for primitives
Peters, "Pattern-defeating Quicksort" (2021)Modern quicksort that adapts to input patterns. Used in Rust and Go.
Final question: An interviewer asks: "Implement a function that, given an unsorted array of n integers, returns the median in O(n) average time without sorting the entire array." What algorithm do you use, and what is the key insight?

A Closing Perspective

Quicksort is more than a sorting algorithm. It is a masterclass in algorithm design:

The power of randomization. A single random choice (the pivot) transforms a potentially O(n²) algorithm into an expected O(n log n) algorithm. This principle — using randomness to defeat worst cases — appears throughout computer science: hash tables, skip lists, random forests, stochastic gradient descent.

The value of simplicity. The core algorithm is three lines. Every optimization (insertion sort for small arrays, median-of-three, introsort fallback) is layered on top without changing the fundamental structure. This makes quicksort easy to implement correctly, easy to debug, and easy to teach.

The importance of constants. Asymptotic analysis says O(n log n) = O(n log n), but in practice, quicksort beats merge sort by 2-3x because of cache behavior and in-place operation. The constant factors that big-O hides can dominate real-world performance.

Closing thought. Tony Hoare, in his Turing Award lecture (1980), said: "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies." Quicksort — three lines of recursion atop a simple partition loop — is a shining example of the first way. Its simplicity is its greatest strength: simple to implement, simple to analyze, simple to optimize, and yet powerful enough to be the default sorting algorithm in most programming languages sixty years after its invention.