Counting sort, radix sort, bucket sort — breaking the O(n log n) barrier.
You have a million exam scores, each between 0 and 100. You need them sorted. You reach for merge sort — O(n log n), guaranteed. For n = 1,000,000 that is about 20 million operations. Not bad. But look at your data more carefully: the values are integers in a tiny range [0, 100]. You have a million numbers but only 101 distinct possible values. Do you really need 20 million comparisons?
No. You could walk the array once, count how many times each score appears, and reconstruct the sorted output in a single pass. That is two passes through the data: O(n). No comparisons needed at all.
This chapter is about a fundamental question in computer science: how fast can we sort? The answer depends on what operations you allow yourself to use.
Every sorting algorithm you have met so far — merge sort, quicksort, heapsort, insertion sort — works by comparing pairs of elements. "Is a[i] ≤ a[j]?" The algorithm's decisions are entirely driven by the answers to these yes/no questions. We call these comparison sorts.
The stunning result of this chapter: any comparison sort must make at least Ω(n log n) comparisons in the worst case. This is not a statement about a specific algorithm being slow. It is a mathematical proof that no comparison-based algorithm — not one that exists today, not one that will ever be invented — can beat n log n. It is a lower bound on the problem itself.
To see why comparison sorts cannot beat n log n, let us visualize what they actually do. Every comparison sort on n elements can be drawn as a decision tree: a binary tree where each internal node is a comparison "a[i] ≤ a[j]?", the left branch is "yes", the right branch is "no", and each leaf is one possible sorted order (a permutation).
For n = 3 elements (call them a, b, c), there are 3! = 6 possible orderings. The tree must have at least 6 leaves. Since a binary tree with L leaves has height at least ⌈log2 L⌉, the tree must have height at least ⌈log2 6⌉ = 3. That means at least 3 comparisons in the worst case.
Every path from root to leaf is one possible execution of a comparison sort on three elements. Count the leaves: there must be at least 6 (one per permutation). The longest path = worst-case comparisons.
The tree above has exactly 6 leaves (the 6 permutations of three elements) and height 3. That is tight: you can sort 3 elements with exactly 3 comparisons in the worst case. But as n grows, the number of leaves explodes: n! grows much faster than 2n, and the minimum height of the tree grows as Ω(n log n).
In Chapter 0 we saw the intuition: comparison sorts gather one bit per comparison, and they need enough bits to identify one of n! orderings. Now let us make this argument airtight.
A comparison sort is any sorting algorithm that determines the sorted order only by comparing pairs of input elements. Given elements a1, a2, ..., an, the algorithm may ask questions of the form "is ai ≤ aj?" and branch based on the answer. It may not look at the values of the elements directly — only at the results of comparisons.
We model any such algorithm as a decision tree:
| Tree element | Meaning |
|---|---|
| Internal node | A comparison ai : aj (is ai ≤ aj?) |
| Left child | Branch taken if ai ≤ aj (yes) |
| Right child | Branch taken if ai > aj (no) |
| Leaf | A permutation π that describes the sorted order |
| Root-to-leaf path | The sequence of comparisons for one particular input |
| Height of tree | Worst-case number of comparisons |
Theorem 8.1 (CLRS): Any comparison sort algorithm requires Ω(n log n) comparisons in the worst case.
Proof. The decision tree must have at least n! leaves (one for each permutation of n elements — each permutation is a possible input ordering, and the algorithm must be able to produce each one as output). A binary tree of height h has at most 2h leaves. Therefore:
Taking logarithms:
By Stirling's approximation, n! ≈ √(2πn) · (n/e)n, so:
This is why merge sort, heapsort, and the best quicksort variants all converge to O(n log n). They are optimal within the comparison model. To go faster, we must step outside the model entirely.
See how the theoretical lower bound log2(n!) compares to the actual worst-case comparisons of merge sort (n⌈log n⌉) and insertion sort (n(n-1)/2). Drag the slider to change n.
The lower bound says comparison sorts cannot beat Ω(n log n). It does not say:
1. That sorting itself is Ω(n log n). Only comparison-based sorting is. If you know the data is integers in [0, k], counting sort does it in O(n + k).
2. That all inputs require n log n comparisons. Many inputs are easier — insertion sort on nearly-sorted data is O(n). The lower bound is about the worst case.
3. That constant factors do not matter. Quicksort's O(n log n) average case has smaller constants than merge sort's O(n log n) worst case, which is why quicksort is often faster in practice despite a worse worst case.
Merge sort asks "which of these two elements is smaller?" over and over. But what if you already know where each element belongs — because you know its value, and you know how many elements are smaller? That is counting sort.
Suppose you have n integers, each in the range [0, k]. Counting sort works in three passes:
Input A = [2, 5, 3, 0, 2, 3, 0, 3], range k = 5.
Pass 1 (count): C = [2, 0, 2, 3, 0, 1]. Value 0 appears twice, value 2 appears twice, value 3 appears three times, value 5 appears once.
Pass 2 (prefix sum): C = [2, 2, 4, 7, 7, 8]. This means: elements ≤ 0 occupy positions 0-1, elements ≤ 2 occupy positions 0-3, elements ≤ 3 occupy positions 0-6, elements ≤ 5 occupy positions 0-7.
Pass 3 (place, right to left):
| i (R-to-L) | A[i] | C[A[i]] | Place at | Output B |
|---|---|---|---|---|
| 7 | 3 | 7 → 6 | B[6] | [_, _, _, _, _, _, 3, _] |
| 6 | 0 | 2 → 1 | B[1] | [_, 0, _, _, _, _, 3, _] |
| 5 | 3 | 6 → 5 | B[5] | [_, 0, _, _, _, 3, 3, _] |
| 4 | 2 | 4 → 3 | B[3] | [_, 0, _, 2, _, 3, 3, _] |
| 3 | 0 | 1 → 0 | B[0] | [0, 0, _, 2, _, 3, 3, _] |
| 2 | 3 | 5 → 4 | B[4] | [0, 0, _, 2, 3, 3, 3, _] |
| 1 | 5 | 8 → 7 | B[7] | [0, 0, _, 2, 3, 3, 3, 5] |
| 0 | 2 | 3 → 2 | B[2] | [0, 0, 2, 2, 3, 3, 3, 5] |
Result: [0, 0, 2, 2, 3, 3, 3, 5]. Sorted. No comparisons made.
Watch counting sort build the count array, compute prefix sums, and place elements into the output array. Click Step to advance one operation at a time, or Run to animate the whole thing.
Space: O(n + k) — the output array B[0..n-1] plus the count array C[0..k].
When is this useful? When k (the range of values) is not much larger than n. Sorting a million exam scores in [0, 100]? k = 100, n = 1,000,000. O(n + k) = O(n). Brilliant. Sorting a million 64-bit integers? k = 264. The count array alone would not fit in any computer's memory. Terrible. For large ranges, we need radix sort.
python def counting_sort(A, k): """Sort array A of integers in [0, k]. Returns new sorted array.""" n = len(A) C = [0] * (k + 1) # count array B = [0] * n # output array # Pass 1: count occurrences for val in A: C[val] += 1 # Pass 2: prefix sums for i in range(1, k + 1): C[i] += C[i - 1] # Pass 3: place elements (right-to-left for stability) for i in range(n - 1, -1, -1): val = A[i] C[val] -= 1 B[C[val]] = val return B
Counting sort is brilliant when the range k is small. But what if your integers are large — say, 9-digit numbers? k = 999,999,999 and a count array that big is absurd. Radix sort solves this by sorting one digit at a time, using counting sort on each digit position.
Your first instinct might be to sort by the most significant digit first, like sorting words in a dictionary (alphabetical from left). But that creates subproblems: after grouping by the first digit, you must recursively sort within each group. That is MSD (Most Significant Digit) radix sort — it works, but it is recursive and complex.
LSD (Least Significant Digit) radix sort is simpler and more elegant. You sort by the last digit first, then the second-to-last, and so on. It sounds backwards, but it works perfectly — as long as each digit-sort is stable.
Sort: [329, 457, 657, 839, 436, 720, 355]
Pass 1 — sort by ones digit (d=1):
| Value | Ones digit |
|---|---|
| 720 | 0 |
| 355 | 5 |
| 436 | 6 |
| 457 | 7 |
| 657 | 7 |
| 329 | 9 |
| 839 | 9 |
After pass 1: [720, 355, 436, 457, 657, 329, 839]. Note: 457 before 657 (stable), 329 before 839 (stable).
Pass 2 — sort by tens digit (d=2):
| Value | Tens digit |
|---|---|
| 720 | 2 |
| 329 | 2 |
| 436 | 3 |
| 839 | 3 |
| 355 | 5 |
| 457 | 5 |
| 657 | 5 |
After pass 2: [720, 329, 436, 839, 355, 457, 657]. Stability preserved: 720 before 329 (both tens digit 2, and 720 was before 329 after pass 1).
Pass 3 — sort by hundreds digit (d=3):
| Value | Hundreds digit |
|---|---|
| 329 | 3 |
| 355 | 3 |
| 436 | 4 |
| 457 | 4 |
| 657 | 6 |
| 720 | 7 |
| 839 | 8 |
After pass 3: [329, 355, 436, 457, 657, 720, 839]. Fully sorted.
Watch radix sort process [329, 457, 657, 839, 436, 720, 355] digit by digit. Each pass uses counting sort on one digit position (highlighted). Click Step to advance one pass, or Run to animate all passes.
In practice, radix sort with base 256 (8-bit digits, d = 4 for 32-bit integers) is extremely fast. Four passes of counting sort, each operating on just 256 buckets. Cache-friendly, branch-free, and predictable.
python def radix_sort(A, base=10): """LSD radix sort for non-negative integers.""" if not A: return A max_val = max(A) exp = 1 # current digit position while max_val // exp > 0: A = _counting_sort_by_digit(A, exp, base) exp *= base return A def _counting_sort_by_digit(A, exp, base): """Stable counting sort on the digit at position exp.""" n = len(A) B = [0] * n C = [0] * base # Count digit occurrences for val in A: digit = (val // exp) % base C[digit] += 1 # Prefix sums for i in range(1, base): C[i] += C[i - 1] # Place (right-to-left for stability) for i in range(n - 1, -1, -1): digit = (A[i] // exp) % base C[digit] -= 1 B[C[digit]] = A[i] return B
Counting sort assumes integers in a known range. Radix sort assumes integers (or strings) you can decompose digit-by-digit. What if your data is floating-point numbers uniformly distributed in [0, 1)? Neither counting sort nor radix sort applies cleanly. Enter bucket sort.
The idea is beautifully simple: if n numbers are uniformly distributed in [0, 1), divide the interval into n equal-sized buckets [0, 1/n), [1/n, 2/n), ..., [(n-1)/n, 1). On average, each bucket gets about one element. Sort each bucket (with insertion sort — efficient for tiny lists), then concatenate.
Input A = [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68], n = 10.
| Bucket | Range | Elements | After sort |
|---|---|---|---|
| 0 | [0.0, 0.1) | - | - |
| 1 | [0.1, 0.2) | 0.17, 0.12 | 0.12, 0.17 |
| 2 | [0.2, 0.3) | 0.26, 0.21, 0.23 | 0.21, 0.23, 0.26 |
| 3 | [0.3, 0.4) | 0.39 | 0.39 |
| 4-5 | [0.4, 0.6) | - | - |
| 6 | [0.6, 0.7) | 0.68 | 0.68 |
| 7 | [0.7, 0.8) | 0.78, 0.72 | 0.72, 0.78 |
| 8 | [0.8, 0.9) | - | - |
| 9 | [0.9, 1.0) | 0.94 | 0.94 |
Concatenated: [0.12, 0.17, 0.21, 0.23, 0.26, 0.39, 0.68, 0.72, 0.78, 0.94].
Expected time: O(n) when input is uniformly distributed. Worst case: O(n2) when all elements land in the same bucket.
Watch elements get distributed into buckets, sorted within each bucket, and concatenated. Toggle between uniform and skewed distributions to see how bucket sizes change.
python def bucket_sort(A): """Sort floats in [0, 1) using bucket sort.""" n = len(A) buckets = [[] for _ in range(n)] # Distribute into buckets for x in A: idx = int(n * x) if idx == n: # edge case: x = 1.0 idx = n - 1 buckets[idx].append(x) # Sort each bucket (insertion sort) for b in buckets: _insertion_sort(b) # Concatenate result = [] for b in buckets: result.extend(b) return result def _insertion_sort(A): """In-place insertion sort.""" for i in range(1, len(A)): key = A[i] j = i - 1 while j >= 0 and A[j] > key: A[j + 1] = A[j] j -= 1 A[j + 1] = key
Time for the payoff. Below is a head-to-head comparison of every sorting algorithm we have studied: three comparison-based sorts (merge sort, quicksort, heapsort) and three linear-time sorts (counting sort, radix sort, bucket sort). You control the data — its size, its range, and its distribution. Watch the operation counts diverge as you find each algorithm's sweet spot and its weakness.
Configure the data, then click Race to run all algorithms on the same input and compare operation counts. Operations = comparisons (comparison sorts) or array reads+writes (linear sorts).
| Experiment | What happens | Why |
|---|---|---|
| Integers [0, n], n=1000 | Counting sort wins by a landslide | k = n, so counting sort is O(n). Comparison sorts do ~10,000 ops |
| Integers [0, n³], n=100 | Radix sort wins, counting sort chokes | k = 106 is too large for counting sort. Radix breaks the number into manageable digits |
| Uniform floats, n=500 | Bucket sort wins | Uniform distribution means ~1 element per bucket: O(n) |
| Skewed floats, n=500 | Bucket sort collapses, merge sort wins | Most elements land in a few buckets: O(n2) degradation |
| Nearly sorted, n=1000 | Insertion sort (inside bucket sort) is fast, but merge sort is consistent | Adaptive algorithms exploit existing order |
The lesson: there is no universally best sorting algorithm. The right choice depends on your data. Knowing these five factors — range, distribution, data type, stability needs, and memory budget — is what separates a junior programmer from an engineer.
A sort algorithm is a tool. Like any tool, the question is not "which is best?" but "which is best for this job?" Here is the decision framework.
Answer the questions to find the best sort for your situation. Click a path to follow it.
| Algorithm | Best | Average | Worst | Space | Stable? | When to use |
|---|---|---|---|---|---|---|
| Merge sort | O(n log n) | O(n log n) | O(n log n) | O(n) | Yes | General purpose, guaranteed worst case, need stability |
| Quicksort | O(n log n) | O(n log n) | O(n2) | O(log n) | No | General purpose, best average-case constants, in-place preferred |
| Heapsort | O(n log n) | O(n log n) | O(n log n) | O(1) | No | Strict O(1) extra space needed, guaranteed worst case |
| Counting sort | O(n+k) | O(n+k) | O(n+k) | O(n+k) | Yes | Small integer range k = O(n) |
| Radix sort | O(d(n+k)) | O(d(n+k)) | O(d(n+k)) | O(n+k) | Yes | Fixed-width integers/strings, many digits, k per digit is small |
| Bucket sort | O(n) | O(n) | O(n2) | O(n) | Yes* | Uniformly distributed floats in known range |
*Bucket sort is stable if the sub-sort is stable (insertion sort is stable).
| Scenario | Best algorithm | Why |
|---|---|---|
| Sort ages of all US residents (0-120) | Counting sort | k = 120 « n = 330M. O(n) trivially |
| Sort 10M 32-bit IP addresses | Radix sort (base 256) | 4 passes of counting sort on 256 buckets. Cache-friendly |
| Sort 1M uniformly random floats | Bucket sort | Uniform distribution, expected O(n) |
| Sort database rows by name | Merge sort | Strings need comparison, need stability for secondary keys |
| Sort 1000 sensor readings (arbitrary floats) | Quicksort (introsort) | Small n, in-place, best constants. std::sort in C++ |
| Sort-then-unique on log timestamps | Merge sort or Timsort | Nearly sorted input, stability helps, Timsort exploits runs |
Two students both scored 85 on the midterm. In the original roster, Alice is listed before Bob. After sorting by score, a stable sort keeps Alice before Bob. An unstable sort might swap them.
This sounds like a minor detail. It is not. Stability is the reason radix sort works. It is the reason database sorts compose. It is a property that interviewers ask about because it reveals whether you understand sorting at a deep level.
A sorting algorithm is stable if, whenever two elements have equal keys, they appear in the output in the same relative order as in the input. Formally: if A[i] = A[j] and i < j, then in the sorted output, A[i] appears before A[j].
1. Multi-key sorting. You want to sort students by grade, then by name within each grade. If you sort by name first, then sort by grade with a stable sort, students with the same grade remain in alphabetical order. Two stable sorts compose into a multi-key sort. This is exactly how radix sort works: sort by LSD, then next digit, etc.
2. Preserving meaningful order. Database rows often have a "natural" order (insertion time, row ID). A stable sort preserves this order among ties, which users expect.
3. Correctness of radix sort. If the digit-level sort is unstable, radix sort breaks. The LSD-first strategy relies on stability to preserve the order established by previous digit passes.
| Stable | Unstable |
|---|---|
| Counting sort | Quicksort |
| Merge sort | Heapsort |
| Insertion sort | Selection sort |
| Bubble sort | Shellsort |
| Radix sort (uses stable sub-sort) | Introsort (std::sort) |
| Timsort (Python sorted()) | Tree sort (BST-based) |
Elements with the same key have different colors. A stable sort preserves the left-to-right color order among equal keys. An unstable sort may scramble them. Compare the two side by side.
Any sort can be made stable by appending the original index as a tiebreaker. If a[i] = a[j] and i < j, compare as (a[i], i) < (a[j], j). This always breaks ties in favor of the earlier element. The cost: O(n) extra space for the index array, and slightly larger comparisons.
python def stable_sort(A): """Make any sort stable by using (value, index) pairs.""" decorated = [(val, i) for i, val in enumerate(A)] decorated.sort() # Python's sort compares tuples lexicographically return [val for val, _ in decorated]
This is your reference chapter. Cheat sheets, coding drills, and the patterns that interviewers love to test.
| Algorithm | Time (avg) | Time (worst) | Space | Stable | In-place | Comparison? |
|---|---|---|---|---|---|---|
| Insertion | O(n2) | O(n2) | O(1) | Yes | Yes | Yes |
| Merge | O(n log n) | O(n log n) | O(n) | Yes | No | Yes |
| Quick | O(n log n) | O(n2) | O(log n) | No | Yes | Yes |
| Heap | O(n log n) | O(n log n) | O(1) | No | Yes | Yes |
| Counting | O(n+k) | O(n+k) | O(n+k) | Yes | No | No |
| Radix | O(d(n+k)) | O(d(n+k)) | O(n+k) | Yes | No | No |
| Bucket | O(n) | O(n2) | O(n) | Yes* | No | Hybrid |
Implement counting sort from memory. This is a common interview question because it tests whether you understand prefix sums and stability.
python # Drill: implement from scratch in 2 minutes def counting_sort(A, k): C = [0] * (k + 1) B = [0] * len(A) for x in A: # count C[x] += 1 for i in range(1, k+1): # prefix sum C[i] += C[i-1] for i in range(len(A)-1, -1, -1): # place (R-to-L) C[A[i]] -= 1 B[C[A[i]]] = A[i] return B
Given an array, sort elements by frequency (most frequent first). If two elements have the same frequency, maintain their order of first appearance. This combines counting and stability.
python from collections import Counter def sort_by_frequency(A): """Sort by frequency (descending). Stable for ties.""" freq = Counter(A) # Decorate-sort-undecorate # Key: (-frequency, first_occurrence_index) first_seen = {} for i, x in enumerate(A): if x not in first_seen: first_seen[x] = i return sorted(A, key=lambda x: (-freq[x], first_seen[x])) # Example: [1,1,2,2,2,3] -> [2,2,2,1,1,3]
Given n elements, find the k most frequent. Bucket sort approach: O(n) time.
python from collections import Counter def top_k_frequent(A, k): """O(n) using bucket sort by frequency.""" freq = Counter(A) n = len(A) # Buckets: index = frequency, value = list of elements buckets = [[] for _ in range(n + 1)] for val, count in freq.items(): buckets[count].append(val) # Collect from highest frequency down result = [] for i in range(n, 0, -1): for val in buckets[i]: result.append(val) if len(result) == k: return result return result # Example: top_k_frequent([1,1,1,2,2,3], 2) -> [1, 2]
Radix sort is not just for integers. You can sort fixed-length strings by applying counting sort to each character position, from right to left (LSD).
python def radix_sort_strings(A, strlen): """LSD radix sort for fixed-length ASCII strings.""" R = 256 # alphabet size (ASCII) for d in range(strlen - 1, -1, -1): # Counting sort by character at position d C = [0] * R B = [""] * len(A) for s in A: C[ord(s[d])] += 1 for i in range(1, R): C[i] += C[i-1] for i in range(len(A)-1, -1, -1): ch = ord(A[i][d]) C[ch] -= 1 B[C[ch]] = A[i] A = B return A
| Question | Key idea | Algorithm |
|---|---|---|
| Sort an array where values are in [0, k] | Small range: count occurrences | Counting sort O(n+k) |
| Sort 1M 32-bit integers | Break into 4 bytes, sort each | Radix sort O(n) |
| Find k most frequent elements | Count then bucket by frequency | Counting + bucket O(n) |
| Sort colors (Dutch national flag) | 3-way partition | O(n) one-pass partitioning |
| Can we sort faster than n log n? | Only with non-comparison methods | Decision tree lower bound |
| Sort linked list | Merge sort (no random access needed) | Merge sort O(n log n) |
| Sort nearly-sorted array | Insertion sort is O(nk) when k swaps away | Insertion sort or Timsort |
Linear-time sorting does not exist in a vacuum. Every algorithm in this chapter builds on ideas from earlier chapters and opens doors to later ones.
| Topic | Connection |
|---|---|
| CLRS Ch 2 (Insertion Sort, Merge Sort) | Insertion sort is the sub-routine inside bucket sort. Merge sort is the gold standard comparison sort that counting/radix/bucket aim to beat |
| CLRS Ch 6 (Heapsort) | O(n log n) worst-case, O(1) space, but unstable. Cannot be used as radix sort's sub-routine |
| CLRS Ch 7 (Quicksort) | O(n log n) average, best constants in practice, but O(n2) worst case and unstable. The partition step is used in many linear-time selection and sorting problems |
| CLRS Ch 9 (Order Statistics) | Finding the k-th smallest element in O(n). Uses the partition idea from quicksort. Does NOT need to fully sort |
| Hash tables | Counting sort's count array is essentially a hash table for integers. The prefix sum step is a cumulative distribution function |
| Parallel algorithms | Radix sort parallelizes beautifully: each digit pass is independent across elements. GPU sort implementations (CUB, Thrust) use radix sort as the backbone |
This lesson brought you through levels 1-3. Level 4 comes with experience — writing production code that sorts billions of records, profiling cache misses in radix sort, and understanding why NVIDIA's CUB library uses a 4-bit radix.
"The purpose of computing is insight, not numbers." — Richard Hamming