Quickselect, median of medians — finding the k-th element in linear time.
You have a million server response times from the last hour. Your boss asks: "What is the 95th percentile latency?" In other words, what value is larger than 95% of all measurements? You need the element at rank 950,000 out of a million.
The obvious approach: sort the array, then index into position 950,000. Sorting takes O(n log n). For a million elements, that is roughly 20 million operations. You get your answer, but you also did an enormous amount of unnecessary work. You sorted ALL million elements into their correct positions, when you only needed ONE of them.
Think of it this way. Suppose you want to find the tallest person in a room. Do you line everyone up in height order? Of course not. You scan the room once, keeping track of the tallest person you have seen. One pass, O(n). Sorting is overkill.
Finding the minimum is easy. But what about the median? Or the 95th percentile? Or the k-th smallest element for arbitrary k? These are harder. You cannot just scan once — you need to know where k elements fall relative to the rest. But you still should not need to sort the entire array.
The answer is yes: both in expectation (with randomized quickselect) and in the worst case (with the median-of-medians algorithm). This chapter builds both algorithms from scratch.
Here is the roadmap for this lesson:
But first, let us see just how wasteful sorting is when you only need one element. The simulation below generates a random array and "sorts" it to find the k-th element. Watch how many comparisons are performed. Most of them are completely irrelevant to the answer.
We sort the whole array just to find one element. The orange bar is the target (k-th smallest). Gray bars are elements whose final position we never needed to know.
Notice the count. For 20 elements, sorting does roughly 60-80 comparisons. But we only needed ONE element. The rest of the sorting work — placing every other element in its exact position — is wasted effort. A selection algorithm would find the answer in roughly 20-40 comparisons. For a million elements, the savings go from millions of comparisons to just a few million. That is the difference between O(n log n) and O(n).
Some order statistics are trivially easy. Some are hard. The difficulty depends on where k falls relative to n.
| k | Name | Difficulty | Best Algorithm |
|---|---|---|---|
| 1 | Minimum | Easy (n - 1 comparisons) | Single scan |
| 2 | Second smallest | Moderate (n + log n - 2) | Tournament + rescan losers |
| ⌈n/2⌉ | Median | Hardest in this family | Quickselect / BFPRT |
| n - 1 | Second largest | Moderate (same as k=2) | Tournament |
| n | Maximum | Easy (n - 1 comparisons) | Single scan |
The median is the "hardest" order statistic because it maximizes the amount of information we need. Finding the minimum requires ruling out n - 1 candidates. Finding the median requires correctly classifying every element as above or below the answer — which requires examining every element. But we do NOT need to determine the relative order among elements above the median, nor among elements below it. That is why selection is O(n) while sorting is O(n log n).
Let us compare the exact operation counts for selection vs. sorting at various scales:
The gap between selection and sorting grows as n increases, because log n grows without bound while the constant factor for quickselect stays at ~3.4. For a billion elements, sorting does ~30 billion comparisons while selection does ~3.4 billion — nearly a 10x difference.
Before we tackle general selection, let us start with the two simplest order statistics: the minimum (k=1) and the maximum (k=n). These warm up our thinking about lower bounds and comparison counting — ideas we will need when we get to the harder algorithms.
The minimum algorithm is trivially simple. Scan the array, keeping track of the smallest element seen so far. Each new element requires exactly one comparison against the current minimum.
python def find_min(A): """Find the minimum element. Exactly n-1 comparisons.""" minimum = A[0] for i in range(1, len(A)): if A[i] < minimum: # one comparison per element minimum = A[i] return minimum
For an array of n elements, we make exactly n - 1 comparisons. Can we do better? No. Here is the proof: think of a tournament. Each element except the final winner must "lose" at least one comparison. That means at least n - 1 comparisons are needed just to determine which element never loses. So n - 1 is a tight lower bound for finding the minimum.
There is a more rigorous way to prove the n - 1 lower bound. Consider an adversary that constructs the worst-case input as the algorithm runs. The adversary's strategy: maintain a set of "potential minimums" — elements that have never lost a comparison. Initially, all n elements are potential minimums.
Each comparison eliminates at most one element from the potential minimum set (the loser). The algorithm can only conclude which element is the minimum when exactly one potential minimum remains. Starting from n potential minimums and removing at most one per comparison, we need at least n - 1 comparisons. This adversary argument generalizes to prove lower bounds for many selection problems.
Here is a subtler problem: find the second smallest element. The naive approach uses 2n - 3 comparisons (find the min with n - 1, remove it, find the min of the rest with n - 2). Can we do better?
Yes. The tournament method finds the second smallest in n + ⌈log2 n⌉ - 2 comparisons. The idea: run a knockout tournament to find the minimum. The second smallest must have lost to the minimum at some point — it was eliminated by the eventual winner. The minimum participated in ⌈log2 n⌉ comparisons, so there are only ⌈log2 n⌉ candidates for second place. Find the min among those candidates with ⌈log2 n⌉ - 1 additional comparisons.
Total: (n - 1) + (⌈log2 n⌉ - 1) = n + ⌈log2 n⌉ - 2. For n = 1,000,000, that is 1,000,018 comparisons instead of 1,999,997. A significant improvement.
Now here is a more interesting problem. What if you need BOTH the minimum and the maximum? The naive approach: scan once for the min (n - 1 comparisons), scan again for the max (n - 1 comparisons). Total: 2n - 2 comparisons.
Can we do better? Yes, with a clever trick: process elements in pairs.
Here is the idea. Take two elements at a time. Compare them to each other first (1 comparison). The smaller one can only be a candidate for the minimum, and the larger one can only be a candidate for the maximum. Now compare the smaller against the running min (1 comparison) and the larger against the running max (1 comparison). That is 3 comparisons for 2 elements, instead of 4 comparisons with the naive approach.
python def min_and_max(A): """Find both min and max using 3*floor(n/2) comparisons.""" n = len(A) if n % 2 == 0: # Even: compare first pair to initialize if A[0] < A[1]: lo, hi = A[0], A[1] else: lo, hi = A[1], A[0] start = 2 else: # Odd: first element is both min and max candidate lo = hi = A[0] start = 1 # Process remaining elements in pairs for i in range(start, n - 1, 2): if A[i] < A[i + 1]: # 1 comparison: pair against each other small, big = A[i], A[i + 1] else: small, big = A[i + 1], A[i] if small < lo: # 2nd comparison: small vs running min lo = small if big > hi: # 3rd comparison: big vs running max hi = big return lo, hi
For n elements, this performs 3⌊n/2⌋ comparisons. Compare that to the naive 2n - 2:
| n | Naive (2n-2) | Pair trick (3⌊n/2⌋) | Savings |
|---|---|---|---|
| 10 | 18 | 15 | 17% |
| 100 | 198 | 150 | 24% |
| 1,000 | 1,998 | 1,500 | 25% |
| 1,000,000 | 1,999,998 | 1,500,000 | 25% |
As n grows, the pair trick saves 25% of comparisons. This is provably optimal — you cannot find both min and max with fewer than 3⌊n/2⌋ - 2 comparisons.
Let us trace the pair trick on [47, 12, 83, 5, 29, 61, 38, 91] (n=8, even).
| Step | Pair | Internal Compare | vs Min | vs Max | Running Min | Running Max | Comps So Far |
|---|---|---|---|---|---|---|---|
| Init | (47, 12) | 47 > 12 → small=12, big=47 | — | — | 12 | 47 | 1 |
| 1 | (83, 5) | 83 > 5 → small=5, big=83 | 5 < 12 → yes | 83 > 47 → yes | 5 | 83 | 4 |
| 2 | (29, 61) | 29 < 61 → small=29, big=61 | 29 > 5 → no | 61 < 83 → no | 5 | 83 | 7 |
| 3 | (38, 91) | 38 < 91 → small=38, big=91 | 38 > 5 → no | 91 > 83 → yes | 5 | 91 | 10 |
Result: min = 5, max = 91, total = 10 comparisons. The formula gives 3 × ⌊8/2⌋ = 12 — but we saved 2 because the initialization pair costs only 1 comparison (not 3). The general formula accounting for initialization: for even n, the count is 3(n/2) - 2 = 3n/2 - 2. For n = 8: 3(4) - 2 = 10. Matches!
The naive approach would use 2(8 - 1) = 14 comparisons. We saved 4 comparisons, a 29% reduction.
Watch the pair trick in action. The simulation processes elements two at a time, comparing the pair first, then routing each to the right contest.
Elements are processed in pairs. Blue = current pair. Green = running min. Red = running max. Watch the comparison count.
We know how to find the min and max in O(n). But what about the k-th smallest for arbitrary k? Finding the median, the 95th percentile, or any order statistic?
The key insight comes from quicksort's partition. Remember what partition does: it picks a pivot, rearranges the array so all elements ≤ pivot are on the left and all elements > pivot are on the right, and places the pivot at its final sorted position. After partition, the pivot is at index q — it IS the (q+1)-th smallest element.
Now here is the trick. If k = q + 1, we are done — the pivot is our answer. If k < q + 1, the k-th smallest must be in the left subarray. If k > q + 1, it must be in the right subarray. Either way, we only need to recurse into ONE side. Quicksort recurses into both sides. Quickselect recurses into only one.
python import random def randomized_partition(A, lo, hi): """Lomuto partition with random pivot.""" r = random.randint(lo, hi) A[r], A[hi] = A[hi], A[r] # move random pivot to end pivot = A[hi] i = lo - 1 for j in range(lo, hi): if A[j] <= pivot: i += 1 A[i], A[j] = A[j], A[i] A[i + 1], A[hi] = A[hi], A[i + 1] return i + 1 def randomized_select(A, lo, hi, k): """Find k-th smallest element in A[lo..hi] (1-indexed).""" if lo == hi: return A[lo] # base case: one element q = randomized_partition(A, lo, hi) rank = q - lo + 1 # rank of pivot within A[lo..hi] if k == rank: return A[q] # pivot IS the k-th element elif k < rank: return randomized_select(A, lo, q - 1, k) # recurse left else: return randomized_select(A, q + 1, hi, k - rank) # recurse right
Let us find the 5th smallest element in [7, 3, 9, 1, 5, 8, 2, 6, 4] (n=9, so the 5th smallest is the median).
| Step | Subarray | Pivot | After Partition | Pivot Rank | Decision |
|---|---|---|---|---|---|
| 1 | [7,3,9,1,5,8,2,6,4] | 4 | [3,1,2,4,5,8,7,6,9] | 4 | k=5 > 4, go right with k=1 |
| 2 | [5,8,7,6,9] | 6 | [5,6,7,8,9] | 2 | k=1 < 2, go left |
| 3 | [5] | 5 | [5] | 1 | k=1 = 1. Found: 5 |
Three partition steps. The first scanned 9 elements, the second scanned 5, the third scanned 1. Total work: 9 + 5 + 1 = 15 comparisons. Sorting would have needed roughly 9 × log2(9) ≈ 28 comparisons. And we only recursed into one side each time — the right subarray at step 1, then the left at step 2.
The difference between one recursive call and two is the difference between O(n) and O(n log n). Let us see why with a concrete comparison.
Quicksort (two recursive calls): At each level of the recursion tree, every element participates in exactly one partition call. So each level does O(n) work total. With good pivots, there are O(log n) levels. Total: O(n log n).
Quickselect (one recursive call): At the first level, we scan all n elements. At the next level, we scan at most n/2 elements (on average). Then n/4. Then n/8. Total: n + n/2 + n/4 + ... = 2n = O(n). The geometric series converges because we throw away half the array at each step.
This is the same reason binary search is O(log n) instead of O(n) — halving the problem and taking only one branch turns a linear scan into a logarithmic one. Quickselect applies the same principle: halving the problem and doing O(n) work per level turns O(n log n) into O(n).
Because quickselect only recurses into one side, we can trivially convert it to an iterative algorithm. No recursion stack needed — just update lo and hi:
python def quickselect_iterative(A, k): """Find k-th smallest (1-indexed). In-place, O(1) extra space.""" lo, hi = 0, len(A) - 1 while lo < hi: # Random pivot r = random.randint(lo, hi) A[r], A[hi] = A[hi], A[r] pivot = A[hi] i = lo - 1 for j in range(lo, hi): if A[j] <= pivot: i += 1 A[i], A[j] = A[j], A[i] A[i + 1], A[hi] = A[hi], A[i + 1] q = i + 1 rank = q - lo + 1 if k == rank: return A[q] elif k < rank: hi = q - 1 # narrow to left side else: k -= rank lo = q + 1 # narrow to right side return A[lo]
This version uses O(1) extra space (no recursion stack). It is the version you should write in interviews — simpler, no risk of stack overflow, and clearly O(1) space.
Let us find the 1st smallest (minimum) and the 12th smallest (maximum) in [58, 23, 91, 47, 12, 85, 34, 69, 7, 56, 73, 40] (n = 12).
Finding k = 1 (minimum):
Two partition steps. Total comparisons: 11 + 3 = 14. The minimum algorithm would have done 11 comparisons (single scan). For k = 1, quickselect is slightly worse due to partition overhead — but asymptotically still O(n).
Finding k = 12 (maximum):
This illustrates an important debugging point: rank tracking in quickselect is where most bugs hide. The rank within the SUBARRAY changes each time we recurse. When we recurse right past a pivot at rank r, we subtract r from k. This is the number one source of off-by-one errors in interviews.
Step through quickselect yourself. Choose k, then click Step to watch the algorithm partition, decide which side to recurse into, and narrow down to the answer.
Orange = pivot. Green = found answer. Dimmed bars = eliminated (wrong side). Click Step to advance.
Quickselect inherits quicksort's Achilles' heel: bad pivots. If every pivot happens to be the minimum or maximum, partition peels off only one element per level. The recurrence becomes T(n) = T(n-1) + O(n), which gives T(n) = O(n²). For a million elements, that is a trillion operations. Catastrophic.
Can an adversary force this worst case on randomized quickselect? Not easily. The adversary would need to predict the random pivot choices in advance. With true randomness, the probability of getting more than 4 log n bad pivots in a row is at most (1/2)4 log n = 1/n4 — vanishingly small. For n = 1,000,000, the probability of O(n²) behavior is less than 1 in 1024. You are more likely to be struck by lightning while winning the lottery.
But just like randomized quicksort, the expected case for randomized quickselect is O(n). Let us prove this carefully.
When we pick a random pivot, its rank is uniformly distributed from 1 to n. The "good" pivots — those that split the array so that neither side has more than 3n/4 elements — are the pivots with rank between n/4 and 3n/4. That is half of all possible pivots.
Think of ranks 1 through n as a number line. The "bad zone" is ranks 1 to n/4 (too small — the left subarray is tiny) and ranks 3n/4 to n (too large — the right subarray is tiny). Everything in between is a "good" pivot that gives at most a 3:1 split. The good zone covers n/2 out of n ranks — exactly half.
The expected work forms a geometric series:
That is O(n) — linear. The constant factor is not great (roughly 8), but the asymptotic behavior is as good as finding the minimum. Let us see the formal recurrence.
CLRS proves the expected time more precisely using indicator random variables. Define Xk as the indicator that the pivot has rank k. Since we choose uniformly at random, Pr[Xk = 1] = 1/n for each k.
If the pivot has rank k and we want the i-th element with i > k, we recurse on the right subarray of size n - k. If i < k, we recurse on the left subarray of size k - 1. In the worst case for the recurrence (not knowing which side we recurse into), the subproblem has size max(k - 1, n - k).
The max(k-1, n-k) term is at most n-1 when k=1 or k=n, and at most ⌈n/2⌉ when k = ⌊n/2⌋. We can show that:
Each value from ⌈n/2⌉ to n-1 appears at most twice (once from the left half, once from the right). Substituting and solving with the guess E[T(n)] ≤ cn for some constant c:
The substitution works with c = 4a. So E[T(n)] = O(n). The expected number of comparisons is at most 4an for the partition-cost constant a. In practice, quickselect averages about 3.4n comparisons.
The simulation below shows quickselect finding the median under two scenarios. On the left: lucky pivots (always near the median). On the right: unlucky pivots (always near the extreme). Watch how the problem size shrinks.
Left: good pivots shrink the problem by ~half each step. Right: bad pivots peel off one element per step. Watch the total comparison count.
With good pivots, the problem shrinks geometrically: n, n/2, n/4, n/8... — total work is n + n/2 + n/4 + … = 2n = O(n). With bad pivots, it shrinks by 1 each time: n, n-1, n-2, … — total work is n + (n-1) + … + 1 = n(n+1)/2 = O(n²). The difference between O(n) and O(n²) comes down entirely to pivot quality.
Let us compute the expected number of comparisons for a concrete case: finding the median of n = 16 elements with randomized quickselect.
The constant factor of 3.4 comes from the full indicator variable analysis. For practical purposes, quickselect performs about 2.75n to 3.5n comparisons on average, depending on how many equal elements exist.
Standard Lomuto partition has a problem with duplicate elements. If the array is [5, 5, 5, 5, 5] and the pivot is 5, all elements go to the ≤ side, giving a 4:0 split. If every element is the same, we get O(n²) even with random pivots.
The fix is three-way partition (Dutch National Flag): split into three regions < pivot, = pivot, > pivot. If the target rank falls within the "= pivot" region, we are done immediately. This is the version used in practice.
python def three_way_partition(A, lo, hi): """Dutch National Flag partition. Returns (lt, gt) where A[lo..lt-1] < pivot, A[lt..gt] == pivot, A[gt+1..hi] > pivot.""" pivot = A[lo + random.randint(0, hi - lo)] lt, i, gt = lo, lo, hi while i <= gt: if A[i] < pivot: A[lt], A[i] = A[i], A[lt] lt += 1; i += 1 elif A[i] > pivot: A[i], A[gt] = A[gt], A[i] gt -= 1 else: i += 1 return lt, gt
std::nth_element function uses introselect: quickselect with three-way partition and a fallback to BFPRT (median of medians) if the recursion depth exceeds 2 log n. This gives O(n) worst-case guarantees while being fast in practice.Randomized quickselect is O(n) in expectation, but O(n²) in the worst case. In 1973, five researchers — Blum, Floyd, Pratt, Rivest, and Tarjan (BFPRT) — published one of the most elegant results in algorithm design: a deterministic O(n) selection algorithm. No randomness. Worst-case linear. The trick is a clever method for choosing a "guaranteed good" pivot.
This is the crucial part. The median of medians x is the median of ⌈n/5⌉ group medians. That means x is ≥ at least half of those medians, which is at least ⌈(1/2)⌈n/5⌉⌉ medians. Each of those medians is, in turn, ≥ at least 3 elements in its group (the median itself and the two smaller elements). So x is greater than or equal to at least:
elements. By the same argument, x is less than or equal to at least 3n/10 - 6 elements. So the partition around x has at most n - (3n/10 - 6) = 7n/10 + 6 elements on each side.
python def select(A, lo, hi, k): """Deterministic O(n) selection (BFPRT / Median of Medians).""" n = hi - lo + 1 if n <= 5: # Base case: sort small array, return k-th element sub = sorted(A[lo:hi+1]) return sub[k - 1] # Step 1 & 2: divide into groups of 5, find medians medians = [] for i in range(lo, hi + 1, 5): group = sorted(A[i:min(i + 5, hi + 1)]) medians.append(group[len(group) // 2]) # Step 3: recursively find median of medians mom = select(medians, 0, len(medians) - 1, (len(medians) + 1) // 2) # Step 4: partition around the median of medians pivot_idx = A.index(mom, lo, hi + 1) A[pivot_idx], A[hi] = A[hi], A[pivot_idx] q = lomuto_partition(A, lo, hi) # Step 5: recurse on the correct side rank = q - lo + 1 if k == rank: return A[q] elif k < rank: return select(A, lo, q - 1, k) else: return select(A, q + 1, hi, k - rank)
This is THE key visual for this entire chapter. The canvas below shows how median of medians works on a concrete array. Each group of 5 is shown as a column. The median of each group is highlighted. Then the median of those medians is found and used as the pivot. The guaranteed-eliminated region (at least 30%) is shown in gray.
Each column is a group of 5 (sorted within). Teal = group medians. Orange = median of medians (the pivot). Gray region = guaranteed eliminated. At least 30% of elements are eliminated regardless of input.
Let us trace BFPRT on the 15-element array [42, 17, 83, 5, 61, 29, 73, 11, 54, 38, 95, 22, 67, 3, 48] to find the 8th smallest (the median).
Step 1 — Divide into groups of 5:
Step 2 — Median of medians: medians = [42, 38, 48]. The median of [38, 42, 48] is 42. Use 42 as pivot.
Step 3 — Partition around 42: elements ≤ 42: [17, 5, 29, 11, 38, 22, 3, 42] (rank 8). Elements > 42: [83, 61, 73, 54, 95, 67, 48].
Step 4: We want the 8th smallest. Pivot rank = 8. Done! The answer is 42.
Notice: the pivot had rank exactly 8 (lucky!). Even if it had not, we would recurse on at most 10 elements (70% of 15) — guaranteed.
The recurrence is T(n) = T(⌈n/5⌉) + T(7n/10 + 6) + O(n). We want to show T(n) ≤ cn for some constant c. Assume it holds for smaller inputs:
Pick c = 20a and the inequality holds for n ≥ 140. For n < 140, sorting costs O(1) since 140 is a constant. So T(n) = O(n) with a constant around 20a, where a is the per-element cost of partition. In practice, this constant is around 40-70, which is why BFPRT is not used in real code.
The guarantee works through a beautiful cascading argument. Picture the elements arranged in a grid: each column is a group of 5, sorted vertically. The median of each group is in the middle row. The median of medians x is the median of the middle row.
Now, x is ≥ at least half the group medians (it is their median). Each of those group medians is ≥ the 2 elements below it in its group. So x is ≥ at least 3 elements in each of those groups. That gives us at least 3 × ⌈n/10⌉ elements that are guaranteed ≤ x.
By symmetry, at least 3 × ⌈n/10⌉ elements are guaranteed ≥ x. So each side of the partition has at least 30% of the elements, leaving at most 70% on the other side. No adversary can force a worse split.
Let us trace the full algorithm on a 25-element array to see every step in detail.
The guaranteed worst-case split was 70:30 (at most 17 elements on one side). The actual split was 52:48 — much better. The 30% guarantee is a floor, not the typical case. In practice, the median of medians tends to give near-even splits.
The BFPRT algorithm uses groups of 5. Why not 3? Why not 7? This is one of the most satisfying "why" questions in algorithm design, and the answer comes down to cold, hard recurrence math.
With groups of 3, we find ⌈n/3⌉ medians, then recursively find their median. The median of medians is ≥ at least 2 · ⌈(1/2)⌈n/3⌉⌉ ≥ n/3 - 4 elements. So the worst-case partition has at most n - n/3 + 4 = 2n/3 + 4 elements on the larger side.
The recurrence becomes:
Check the fractions: 1/3 + 2/3 = 1. That is not less than 1. The geometric series does not converge. This recurrence solves to T(n) = O(n log n), no better than just sorting! The benefit of median of medians evaporates entirely.
Actually, let us compute this precisely for each group size. The number of elements guaranteed to be eliminated is at least 3⌈(1/2)⌈n/g⌉⌉ if g is odd. The fraction of elements on the larger side of the partition is at most 1 - 3/(2g) + O(1/n). The full recurrence has subproblem sizes n/g (for the recursive median finding) and n(1 - 3/(2g)) (for the selection after partition).
| Group size g | Median-finding T(n/g) | Max subproblem | Sum of fractions | Result |
|---|---|---|---|---|
| 3 | T(n/3) | T(2n/3) | 1/3 + 2/3 = 1.00 | O(n log n) — fails! |
| 5 | T(n/5) | T(7n/10) | 1/5 + 7/10 = 0.90 | O(n) works |
| 7 | T(n/7) | T(5n/7) | 1/7 + 5/7 = 6/7 ≈ 0.86 | O(n) works |
| 9 | T(n/9) | T(13n/18) | 1/9 + 13/18 = 15/18 ≈ 0.83 | O(n) works |
Groups of 3 fail because the fractions sum to exactly 1. Groups of 5 are the smallest group size that works.
Let us trace through the argument more carefully for g = 3. With groups of 3, we get ⌈n/3⌉ group medians. The median of medians is ≥ at least 2 × ⌈(1/2)⌈n/3⌉⌉ elements (2 elements from each group whose median is ≤ x). So x is ≥ at least n/3 elements and ≤ at least n/3 elements. The worst-case partition leaves 2n/3 elements on one side.
So groups of 3 give T(n) = O(n log n) — the same as sorting. The whole point of median of medians was to beat sorting, so groups of 3 are useless. The algorithm becomes an elaborate way to achieve something you could do with merge sort.
What about even group sizes? Groups of 4 do not have a clean median (you would take the 2nd or 3rd element). The analysis still works for g = 4 — the guaranteed fraction is 2 × ⌈(1/2)⌈n/4⌉⌉, giving at most 3n/4 on one side. The recurrence is T(n) = T(n/4) + T(3n/4) + O(n), with fractions 1/4 + 3/4 = 1. Same problem as g = 3. So groups of 4 also fail.
The smallest odd group that works is 5. The smallest even group that works is 6. But groups of 6 require more work per group (sorting 6 elements costs up to 10 comparisons vs. 7 for 5 elements) with no improvement in the convergence rate. So 5 is the practical and theoretical sweet spot.
Let us compute the total work for each group size on n = 1000, tracing the worst-case recursion:
So groups of 7 have a better asymptotic constant (7 vs 10), but the per-group overhead (sorting 7 elements vs 5) adds about 2.3× more comparison work. Empirically, groups of 5 end up faster for typical input sizes. The break-even point where g=7 outperforms g=5 is around n > 108, which is rarely encountered in practice.
Groups of 7 also give O(n) — in fact, a better convergence rate (6/7 < 9/10). So why not use groups of 7? Because finding the median of 7 elements costs more comparisons than finding the median of 5 elements.
Sorting 5 elements takes at most 7 comparisons (or 6 with an optimal sorting network). Sorting 7 elements takes up to 16 comparisons. Per group, the extra work for groups of 7 outweighs the faster convergence rate. Groups of 5 hit the sweet spot: smallest group size that gives O(n), with minimal per-group overhead.
The introselect algorithm, used in C++ std::nth_element and other standard libraries, combines the best of both worlds:
Introselect is to selection what introsort is to sorting: use the fast randomized algorithm, but have a deterministic fallback to prevent worst-case blowup. The probability of triggering the fallback is negligible in practice — it is there as insurance, not as the primary algorithm.
Let us compare the actual comparison counts for n = 1000:
| Algorithm | Expected Comparisons | Worst-Case Comparisons |
|---|---|---|
| Sort + index | ~10,000 (merge sort) | ~10,000 |
| Quickselect | ~3,400 | ~500,000 (but astronomically unlikely) |
| BFPRT alone | ~50,000 | ~50,000 |
| Introselect | ~3,400 | ~50,000 |
BFPRT's worst case (50,000) is 5x better than quickselect's worst case (500,000), but 15x worse than quickselect's expected case (3,400). Introselect gets quickselect's expected case with BFPRT's worst-case guarantee — the best of both worlds.
The simulation below lets you see how the recurrence unfolds for different group sizes. Watch the problem shrink at each level and compare the total work.
Three bars show problem size shrinking per recursion level for groups of 3, 5, and 7. For g=3, the total work grows like n log n. For g=5 and g=7, it converges to O(n).
This is the payoff. We have three ways to find the k-th element: sort-then-index, randomized quickselect, and deterministic BFPRT. The simulation below runs all three side by side on the same array, counting every comparison. You pick k, you pick the array size, and you watch them race.
Three algorithms find the k-th smallest element. Watch comparison counts, recursion depth, and which elements get examined. Orange = target element. Green = found answer.
Things to try:
After running several races, you should notice three patterns:
Pattern 1: Sort is consistent but slow. Merge sort always does exactly the same number of comparisons for a given n. There is no variance. But it does the most work — about n log n comparisons — because it fully sorts the array.
Pattern 2: Quickselect is fast but variable. Run the race 10 times with the same n and k. You will get different comparison counts each time because of the random pivot selection. Sometimes quickselect gets lucky (good pivots, 2n comparisons). Sometimes unlucky (bad pivots, 5n comparisons). But the average is always around 3.4n.
Pattern 3: BFPRT is predictable but has overhead. BFPRT does more comparisons than quickselect's average because of the median-finding overhead. But it never has a bad run — the comparison count is always within a predictable range. For small n, BFPRT might actually do MORE work than sorting, because the constant factor is so large.
Order statistics are not just a theoretical exercise. They appear everywhere in systems, data science, and algorithm design. Let us tour the most important applications.
Every monitoring dashboard shows p50, p95, and p99 latencies. The p50 is the median (the 50th percentile). The p99 is the value larger than 99% of all observations. Each is an order statistic: the ⌊0.99n⌋-th smallest element.
For a static dataset, quickselect gives you any percentile in O(n). But what about a stream of data — new values arriving constantly? You cannot re-run quickselect every time a new element arrives.
The streaming median problem: maintain the median as elements arrive one by one. The classic solution uses two heaps:
python import heapq class StreamingMedian: def __init__(self): self.lo = [] # max-heap (negate values) self.hi = [] # min-heap def add(self, x): # Push to max-heap (negate for max-heap behavior) heapq.heappush(self.lo, -x) # Move max of lo to hi (rebalance) heapq.heappush(self.hi, -heapq.heappop(self.lo)) # Keep lo >= hi in size if len(self.hi) > len(self.lo): heapq.heappush(self.lo, -heapq.heappop(self.hi)) def median(self): if len(self.lo) > len(self.hi): return -self.lo[0] return (-self.lo[0] + self.hi[0]) / 2
Each insertion is O(log n) — two heap operations. Reading the median is O(1). This is the standard interview solution for "Find Median from Data Stream" (LeetCode 295).
The two-heap approach maintains three invariants at all times:
Given these invariants, the median is always accessible in O(1):
Let us trace the two-heap approach on the stream [5, 15, 1, 3, 8]:
| Element | Action | Max-Heap (lo) | Min-Heap (hi) | Median |
|---|---|---|---|---|
| 5 | Push 5 to lo, move max to hi, rebalance | [5] | [] | 5 |
| 15 | Push 15 to lo, move 15 to hi | [5] | [15] | (5+15)/2 = 10 |
| 1 | Push 1 to lo, move 5 to hi, rebalance: move 5 back | [1, 5] | [15] | 5 |
| 3 | Push 3 to lo, move 5 to hi | [1, 3] | [5, 15] | (3+5)/2 = 4 |
| 8 | Push 8 to lo, move 8 to hi, rebalance: move 5 back | [1, 3, 5] | [8, 15] | 5 |
After all 5 elements, the sorted array would be [1, 3, 5, 8, 15] — median is indeed 5. The two-heap approach tracked this correctly with O(log n) work per element, never needing to sort or scan the full dataset.
A balanced BST augmented with subtree sizes can find ANY order statistic in O(log n), not just the median. Each node stores the size of its left subtree. To find the k-th element, compare k to the left subtree size and recurse accordingly. This is the order-statistic tree from CLRS Chapter 14.
Trade-offs: BSTs support O(log n) insertion, deletion, AND rank queries. Two heaps only support median efficiently. But BSTs have larger constants and higher implementation complexity. For the streaming median problem specifically, two heaps are simpler and faster.
Watch the two-heap trick work in real time. Elements arrive one at a time. The max-heap (left half) and min-heap (right half) stay balanced, and the median is always available instantly at the heap roots.
Left (teal) = max-heap (smaller half). Right (orange) = min-heap (larger half). The median is always at the top of the larger heap. Click Add to push random values.
In the standard median, each element has equal weight. In a weighted median, each element xi has a weight wi (where weights sum to 1). The weighted median is the element xm such that the total weight of elements less than xm is at most 1/2, and the total weight of elements greater than xm is at most 1/2.
Application: the 1-D facility location problem. You have n customers at positions x1, ..., xn with demands w1, ..., wn. Where should you place a warehouse to minimize total weighted distance ∑ wi|xi - p|? The answer is the weighted median. This generalizes to the 1-median problem in operations research.
python def weighted_median(values, weights): """Find weighted median in O(n) expected time.""" # Pair values with weights, use quickselect-style approach pairs = list(zip(values, weights)) total_weight = sum(weights) def select(items, target_weight): if len(items) <= 2: items.sort() return items[0][0] # Random pivot pivot = random.choice(items) lo = [(v,w) for v,w in items if v < pivot[0]] eq = [(v,w) for v,w in items if v == pivot[0]] hi = [(v,w) for v,w in items if v > pivot[0]] w_lo = sum(w for _,w in lo) w_eq = sum(w for _,w in eq) if w_lo >= target_weight: return select(lo, target_weight) elif w_lo + w_eq >= target_weight: return pivot[0] else: return select(hi, target_weight - w_lo - w_eq) return select(pairs, total_weight / 2)
A common interview pattern: "Find the K largest elements." There are three approaches, each with different tradeoffs:
| Approach | Time | Space | Returns sorted? | When to use |
|---|---|---|---|---|
| Sort + slice | O(n log n) | O(1)* | Yes | K is close to n, or you need all K sorted |
| Min-heap of size K | O(n log K) | O(K) | No | Streaming data, K << n |
| Quickselect + partition | O(n) expected | O(1) | No | Batch data, K can be anything |
The mean is sensitive to outliers. One bad measurement can skew it arbitrarily. The median is robust — you can corrupt up to half the data points without changing the median by more than the gap between adjacent values. This property, called the breakdown point, is 50% for the median and 0% for the mean.
In systems monitoring, this matters enormously. If one request takes 10 seconds due to a timeout (while normal requests take 10ms), the mean latency jumps to ~100ms. The median stays at 10ms. The p95 and p99 catch the outlier. This is why monitoring dashboards show percentiles, not means.
Order statistics and selection appear in interviews constantly, often disguised. Here is your cheat sheet for the most common patterns.
| Dimension | What They Test | Your Response |
|---|---|---|
| CONCEPT | Do you know what an order statistic is? | The k-th smallest element; min (k=1), max (k=n), median (k=n/2). Selection vs. sorting. |
| DESIGN | Can you choose the right selection method? | Static: quickselect O(n) expected. Stream: two heaps O(log n) per insert. Top-K: quickselect + partial sort. |
| CODE | Can you implement quickselect cleanly? | Partition function + recursive select. Handle edge cases: duplicates, k out of bounds, empty array. |
| DEBUG | What goes wrong? | Off-by-one in rank (0-indexed vs 1-indexed). Stack overflow on O(n²) inputs. Infinite loop with all-equal elements. |
| FRONTIER | What are the limits? | BFPRT for worst-case O(n). Introselect. Lower bound of n + min(k, n-k) - 2 comparisons. Streaming median. |
Given an integer array nums and an integer k, return the k-th largest element. This is the k-th largest, not the k-th distinct largest.
python def findKthLargest(nums, k): """O(n) expected time using quickselect.""" # k-th largest = (n - k + 1)-th smallest target = len(nums) - k + 1 def select(lo, hi, rank): if lo == hi: return nums[lo] # Random pivot r = random.randint(lo, hi) nums[r], nums[hi] = nums[hi], nums[r] pivot = nums[hi] i = lo - 1 for j in range(lo, hi): if nums[j] <= pivot: i += 1 nums[i], nums[j] = nums[j], nums[i] nums[i + 1], nums[hi] = nums[hi], nums[i + 1] q = i + 1 r_val = q - lo + 1 if rank == r_val: return nums[q] elif rank < r_val: return select(lo, q - 1, rank) else: return select(q + 1, hi, rank - r_val) return select(0, len(nums) - 1, target)
Given an integer array nums and an integer k, return the k most frequent elements. Use quickselect on frequencies.
python from collections import Counter def topKFrequent(nums, k): # Count frequencies, then quickselect on counts counts = Counter(nums) unique = list(counts.keys()) def partition(lo, hi): pivot_freq = counts[unique[hi]] i = lo - 1 for j in range(lo, hi): if counts[unique[j]] >= pivot_freq: i += 1 unique[i], unique[j] = unique[j], unique[i] unique[i + 1], unique[hi] = unique[hi], unique[i + 1] return i + 1 def select(lo, hi): q = partition(lo, hi) if q == k - 1: return elif q < k - 1: select(q + 1, hi) else: select(lo, q - 1) select(0, len(unique) - 1) return unique[:k]
Implement the MedianFinder class with addNum(int num) and findMedian(). We covered the algorithm in Ch 7. Here is the complete, interview-ready implementation:
python import heapq class MedianFinder: def __init__(self): self.lo = [] # max-heap (negate values for heapq) self.hi = [] # min-heap def addNum(self, num: int) -> None: # Always push to lo first (as negative for max-heap) heapq.heappush(self.lo, -num) # Then push lo's max to hi (ensures lo's max ≤ hi's min) heapq.heappush(self.hi, -heapq.heappop(self.lo)) # Rebalance: lo should be ≥ hi in size if len(self.hi) > len(self.lo): heapq.heappush(self.lo, -heapq.heappop(self.hi)) def findMedian(self) -> float: if len(self.lo) > len(self.hi): return -self.lo[0] return (-self.lo[0] + self.hi[0]) / 2.0 # Usage: mf = MedianFinder() for x in [5, 15, 1, 3, 8]: mf.addNum(x) print(mf.findMedian()) # 5, 10, 5, 4, 5
Complexity: O(log n) per addNum, O(1) per findMedian. Space: O(n). This is optimal for the streaming median problem — you cannot do better than O(log n) amortized per insertion while supporting O(1) median queries.
Given two sorted arrays nums1 and nums2, find the median of the merged array in O(log(min(m,n))) time. This is a binary search problem, not a selection problem — but the underlying idea is the same: find the k-th element without merging. This is widely considered the hardest "standard" interview problem.
The key insight: we want to partition both arrays such that all elements in the left partitions are ≤ all elements in the right partitions, and the total left partition size is (m+n+1)/2. Binary search on the partition point in the shorter array determines the partition in the longer array.
python def findMedianSortedArrays(nums1, nums2): # Ensure nums1 is shorter if len(nums1) > len(nums2): nums1, nums2 = nums2, nums1 m, n = len(nums1), len(nums2) half = (m + n + 1) // 2 lo, hi = 0, m while lo <= hi: i = (lo + hi) // 2 # partition point in nums1 j = half - i # partition point in nums2 left1 = nums1[i - 1] if i > 0 else float('-inf') right1 = nums1[i] if i < m else float('inf') left2 = nums2[j - 1] if j > 0 else float('-inf') right2 = nums2[j] if j < n else float('inf') if left1 <= right2 and left2 <= right1: # Found the correct partition if (m + n) % 2 == 1: return max(left1, left2) return (max(left1, left2) + min(right1, right2)) / 2 elif left1 > right2: hi = i - 1 else: lo = i + 1
Given an integer array, reorder it such that nums[0] < nums[1] > nums[2] < nums[3] > nums[4].... The key insight: find the median using quickselect, then use three-way partition to place elements relative to the median, interleaving small and large values.
python def wiggleSort(nums): # Step 1: Find median in O(n) n = len(nums) median = findKthSmallest(nums, (n + 1) // 2) # Step 2: Three-way partition around median # using virtual indexing for O(1) space def newIdx(i): return (2 * i + 1) % (n | 1) lo, hi, i = 0, n - 1, 0 while i <= hi: if nums[newIdx(i)] > median: nums[newIdx(i)], nums[newIdx(lo)] = nums[newIdx(lo)], nums[newIdx(i)] lo += 1; i += 1 elif nums[newIdx(i)] < median: nums[newIdx(i)], nums[newIdx(hi)] = nums[newIdx(hi)], nums[newIdx(i)] hi -= 1 else: i += 1
A visual comparison of all the selection methods, showing how comparison counts scale with n.
Comparison counts for finding the median at different array sizes. Sort = O(n log n), Quickselect = O(n) expected, BFPRT = O(n) worst case.
| If you see... | Think... | Algorithm |
|---|---|---|
| "k-th largest/smallest" | Selection | Quickselect O(n) |
| "top K elements" | Partial sort | Quickselect + partition, or min-heap size K |
| "median of stream" | Streaming order stat | Two heaps (max + min) |
| "percentile" or "p99" | Order statistic | Quickselect on batch data |
| "minimize sum of distances" | Weighted median | Modified quickselect |
| "median of two sorted arrays" | Binary search on partition | Binary search O(log min(m,n)) |
| "wiggle sort" | Median + partition | Quickselect + three-way partition |
| "closest K to target" | Modified selection | Quickselect on |x - target| values |
Selection sits at the intersection of several fundamental topics. Let us map the connections and look at what lies beyond.
| Method | Time (avg / worst) | Space | When to Use |
|---|---|---|---|
| Sort + index | O(n log n) / O(n log n) | O(1) to O(n) | When you need multiple order statistics from same data |
| Quickselect | O(n) / O(n²) | O(log n) stack | Default choice. Expected O(n) is fast enough for almost everything |
| BFPRT | O(n) / O(n) | O(n) | Theoretical guarantee. Never used in practice alone |
| Introselect | O(n) / O(n) | O(n) | C++ std::nth_element. Quickselect with BFPRT fallback |
| Min-heap (size K) | O(n log K) / O(n log K) | O(K) | Streaming top-K when data does not fit in memory |
| Two heaps | O(log n) per insert / O(log n) | O(n) | Streaming median |
Selection and sorting are not separate problems — they sit on a spectrum. At one extreme, finding one order statistic (selection) is O(n). At the other, finding ALL order statistics (sorting) is O(n log n). What about finding k order statistics for 1 < k < n?
| Task | Best Time | Algorithm |
|---|---|---|
| 1 order statistic | O(n) | Quickselect |
| 2 order statistics (e.g., Q1 and Q3) | O(n) | Two quickselect calls |
| O(1) order statistics | O(n) | Constant number of quickselect calls |
| O(log n) order statistics | O(n log log n) | Complicated — rarely used |
| O(n) order statistics (all of them) | O(n log n) | Sorting |
If you need a constant number of percentiles (like p50, p95, p99 — that is 3 order statistics), you can find all three in O(n) with three quickselect calls. No need to sort.
The information-theoretic argument: sorting n elements requires log2(n!) = Θ(n log n) bits of information (distinguishing among n! permutations). Selection requires only log2(n) bits (which of the n elements is the answer). Linear-time selection is possible because we need exponentially less information than sorting.
The exact lower bound for selection is known. Finding the k-th smallest element requires at least n - 1 comparisons (just finding the minimum requires that many). The tight lower bound for finding the median specifically is approximately (2 + ε)n comparisons, proven by Dor and Zwick (1999). The exact optimal number of comparisons for finding the median remains an open problem.
| Problem | Lower Bound | Best Known Upper | Gap |
|---|---|---|---|
| Minimum | n - 1 | n - 1 | Tight |
| 2nd smallest | n + ⌈log2 n⌉ - 2 | n + ⌈log2 n⌉ - 2 | Tight |
| Median | (2 + ε)n | ~5.4305n | Open |
| General k-th | max(n-1, n + min(k, n-k) - 2) | ~O(n) | Varies |
| Chapter | Connection to Selection |
|---|---|
| Ch 7: Quicksort | Shares the partition subroutine. Quickselect = quicksort with one recursive call instead of two. |
| Ch 6: Heapsort | Priority queues enable streaming median (two heaps) and top-K (min-heap). |
| Ch 8: Sorting in Linear Time | Shows that Ω(n log n) is a lower bound for comparison sorting. Selection breaks this barrier. |
| Ch 14: Augmenting Data Structures | Order-statistic trees support O(log n) selection with O(log n) insertion. |
| Ch 5: Probabilistic Analysis | Indicator variables and linearity of expectation, used to analyze randomized quickselect. |
The BFPRT result was published in 1973 by Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald Rivest, and Robert Tarjan — an extraordinary group. Floyd had already won the Turing Award (1978, for shortest paths and parsing). Tarjan would win it (1986, for data structures). Rivest co-invented RSA. Blum won it (1995, for computational complexity). The paper "Time Bounds for Selection" is a masterclass in algorithm design — a short, elegant paper that resolved a fundamental question.
What makes the result remarkable is not just the O(n) bound, but the technique: using recursion to solve a subproblem (finding a good pivot) within the same algorithm. The median of medians is itself found by SELECT — the algorithm calls itself to solve an auxiliary problem. This self-referential structure is what makes the algorithm beautiful and the proof subtle.
Despite 50 years of research since BFPRT, some questions about selection remain open:
"The purpose of computing is insight, not numbers."
— Richard Hamming