Sorting in Linear Time — CLRS Chapter 8

Chapter 0: The Problem

You have a million exam scores, each between 0 and 100. You need them sorted. You reach for merge sort — O(n log n), guaranteed. For n = 1,000,000 that is about 20 million operations. Not bad. But look at your data more carefully: the values are integers in a tiny range [0, 100]. You have a million numbers but only 101 distinct possible values. Do you really need 20 million comparisons?

No. You could walk the array once, count how many times each score appears, and reconstruct the sorted output in a single pass. That is two passes through the data: O(n). No comparisons needed at all.

This chapter is about a fundamental question in computer science: how fast can we sort? The answer depends on what operations you allow yourself to use.

The Comparison Sorting Wall

Every sorting algorithm you have met so far — merge sort, quicksort, heapsort, insertion sort — works by comparing pairs of elements. "Is a[i] ≤ a[j]?" The algorithm's decisions are entirely driven by the answers to these yes/no questions. We call these comparison sorts.

The stunning result of this chapter: any comparison sort must make at least Ω(n log n) comparisons in the worst case. This is not a statement about a specific algorithm being slow. It is a mathematical proof that no comparison-based algorithm — not one that exists today, not one that will ever be invented — can beat n log n. It is a lower bound on the problem itself.

But we CAN break the barrier. The lower bound only applies to comparison sorts. If we exploit additional information about the data — like knowing the values are integers in a bounded range — we can sort in O(n) time. Counting sort, radix sort, and bucket sort do exactly this. They never compare two elements. They use the values themselves as addresses.

Decision Tree for Sorting 3 Elements

To see why comparison sorts cannot beat n log n, let us visualize what they actually do. Every comparison sort on n elements can be drawn as a decision tree: a binary tree where each internal node is a comparison "a[i] ≤ a[j]?", the left branch is "yes", the right branch is "no", and each leaf is one possible sorted order (a permutation).

For n = 3 elements (call them a, b, c), there are 3! = 6 possible orderings. The tree must have at least 6 leaves. Since a binary tree with L leaves has height at least ⌈log₂ L⌉, the tree must have height at least ⌈log₂ 6⌉ = 3. That means at least 3 comparisons in the worst case.

Decision Tree for Sorting [a, b, c]

Every path from root to leaf is one possible execution of a comparison sort on three elements. Count the leaves: there must be at least 6 (one per permutation). The longest path = worst-case comparisons.

The tree above has exactly 6 leaves (the 6 permutations of three elements) and height 3. That is tight: you can sort 3 elements with exactly 3 comparisons in the worst case. But as n grows, the number of leaves explodes: n! grows much faster than 2ⁿ, and the minimum height of the tree grows as Ω(n log n).

The core insight. A comparison sort is an information-gathering machine. Each comparison gives you one bit of information (yes or no). To distinguish between n! possible orderings, you need at least log₂(n!) bits. By Stirling's approximation, log₂(n!) ≈ n log₂ n. Therefore: at least n log n comparisons. No clever algorithm can dodge this — it is information-theoretic.

Quick check: Merge sort runs in O(n log n). The lower bound for comparison sorts is Ω(n log n). What does this tell us about merge sort?

Merge sort is asymptotically optimal among comparison sorts — you cannot beat its worst-case growth rate with any comparison-based algorithm Merge sort is always the fastest sorting algorithm in practice Merge sort cannot be improved at all

Chapter 1: Decision Tree Lower Bound

In Chapter 0 we saw the intuition: comparison sorts gather one bit per comparison, and they need enough bits to identify one of n! orderings. Now let us make this argument airtight.

The Model

A comparison sort is any sorting algorithm that determines the sorted order only by comparing pairs of input elements. Given elements a₁, a₂, ..., a_n, the algorithm may ask questions of the form "is a_i ≤ a_j?" and branch based on the answer. It may not look at the values of the elements directly — only at the results of comparisons.

We model any such algorithm as a decision tree:

Tree element	Meaning
Internal node	A comparison a_i : a_j (is a_i ≤ a_j?)
Left child	Branch taken if a_i ≤ a_j (yes)
Right child	Branch taken if a_i > a_j (no)
Leaf	A permutation π that describes the sorted order
Root-to-leaf path	The sequence of comparisons for one particular input
Height of tree	Worst-case number of comparisons

The Theorem

Theorem 8.1 (CLRS): Any comparison sort algorithm requires Ω(n log n) comparisons in the worst case.

Proof. The decision tree must have at least n! leaves (one for each permutation of n elements — each permutation is a possible input ordering, and the algorithm must be able to produce each one as output). A binary tree of height h has at most 2^h leaves. Therefore:

2^h ≥ n!

Taking logarithms:

h ≥ log₂(n!)

By Stirling's approximation, n! ≈ √(2πn) · (n/e)ⁿ, so:

// Stirling's approximation
log₂(n!) = log₂(√(2πn)) + n · log₂(n/e)

// The dominant term is n log(n/e) = n log n - n log e
log₂(n!) = n log₂ n - n log₂ e + O(log n)

// Therefore:
h ≥ n log₂ n - n log₂ e = Ω(n log n) ■

This is why merge sort, heapsort, and the best quicksort variants all converge to O(n log n). They are optimal within the comparison model. To go faster, we must step outside the model entirely.

A tighter bound for specific n. For n = 3: log₂(3!) = log₂(6) ≈ 2.585, so h ≥ 3. For n = 4: log₂(4!) = log₂(24) ≈ 4.585, so h ≥ 5. For n = 10: log₂(10!) ≈ 21.8, so at least 22 comparisons. These exact bounds match what the best comparison sorts achieve.

Lower Bound vs Actual Comparisons

See how the theoretical lower bound log₂(n!) compares to the actual worst-case comparisons of merge sort (n⌈log n⌉) and insertion sort (n(n-1)/2). Drag the slider to change n.

n 10

What the Lower Bound Does NOT Say

The lower bound says comparison sorts cannot beat Ω(n log n). It does not say:

1. That sorting itself is Ω(n log n). Only comparison-based sorting is. If you know the data is integers in [0, k], counting sort does it in O(n + k).

2. That all inputs require n log n comparisons. Many inputs are easier — insertion sort on nearly-sorted data is O(n). The lower bound is about the worst case.

3. That constant factors do not matter. Quicksort's O(n log n) average case has smaller constants than merge sort's O(n log n) worst case, which is why quicksort is often faster in practice despite a worse worst case.

Check: A new sorting algorithm is proposed that runs in O(n log log n) worst-case time. It only uses pairwise comparisons. Is this possible?

Yes, there is still room to improve comparison sorts No — the decision tree lower bound proves any comparison sort requires Ω(n log n), and n log log n is asymptotically less than n log n, so the claim is impossible It depends on whether the algorithm is deterministic or randomized

Chapter 2: Counting Sort

Merge sort asks "which of these two elements is smaller?" over and over. But what if you already know where each element belongs — because you know its value, and you know how many elements are smaller? That is counting sort.

The Algorithm

Suppose you have n integers, each in the range [0, k]. Counting sort works in three passes:

Pass 1: Count

Walk the input array A[0..n-1]. For each value v, increment C[v]. After this pass, C[v] = number of elements equal to v.

↓

Pass 2: Prefix Sum

For v = 1 to k: set C[v] = C[v] + C[v-1]. Now C[v] = number of elements ≤ v. This tells you: the last element with value v belongs at position C[v]-1 in the output.

↓

Pass 3: Place

Walk A from RIGHT to LEFT. For each A[i], place it at B[C[A[i]] - 1], then decrement C[A[i]]. Walking right-to-left preserves the original order of equal elements (stability).

Worked Example

Input A = [2, 5, 3, 0, 2, 3, 0, 3], range k = 5.

Pass 1 (count): C = [2, 0, 2, 3, 0, 1]. Value 0 appears twice, value 2 appears twice, value 3 appears three times, value 5 appears once.

Pass 2 (prefix sum): C = [2, 2, 4, 7, 7, 8]. This means: elements ≤ 0 occupy positions 0-1, elements ≤ 2 occupy positions 0-3, elements ≤ 3 occupy positions 0-6, elements ≤ 5 occupy positions 0-7.

Pass 3 (place, right to left):

i (R-to-L)	A[i]	C[A[i]]	Place at	Output B
7	3	7 → 6	B[6]	[_, _, _, _, _, _, 3, _]
6	0	2 → 1	B[1]	[_, 0, _, _, _, _, 3, _]
5	3	6 → 5	B[5]	[_, 0, _, _, _, 3, 3, _]
4	2	4 → 3	B[3]	[_, 0, _, 2, _, 3, 3, _]
3	0	1 → 0	B[0]	[0, 0, _, 2, _, 3, 3, _]
2	3	5 → 4	B[4]	[0, 0, _, 2, 3, 3, 3, _]
1	5	8 → 7	B[7]	[0, 0, _, 2, 3, 3, 3, 5]
0	2	3 → 2	B[2]	[0, 0, 2, 2, 3, 3, 3, 5]

Result: [0, 0, 2, 2, 3, 3, 3, 5]. Sorted. No comparisons made.

Stability is built in. We walk A right-to-left in Pass 3. This means that among elements with the same value, the one that appeared later in A is placed first (at a higher index), and the one that appeared earlier is placed after it (at a lower index). This preserves original relative order. Stability is critical for radix sort, which uses counting sort as a subroutine.

Counting Sort Step-Through

Watch counting sort build the count array, compute prefix sums, and place elements into the output array. Click Step to advance one operation at a time, or Run to animate the whole thing.

Press Step or Run to begin.

Complexity

// Pass 1: count occurrences
O(n)    one pass through input

// Pass 2: prefix sums
O(k)    one pass through count array

// Pass 3: place elements
O(n)    one pass through input (reversed)

// Total:
O(n + k)    linear when k = O(n)

Space: O(n + k) — the output array B[0..n-1] plus the count array C[0..k].

When is this useful? When k (the range of values) is not much larger than n. Sorting a million exam scores in [0, 100]? k = 100, n = 1,000,000. O(n + k) = O(n). Brilliant. Sorting a million 64-bit integers? k = 2⁶⁴. The count array alone would not fit in any computer's memory. Terrible. For large ranges, we need radix sort.

Code

python
def counting_sort(A, k):
    """Sort array A of integers in [0, k]. Returns new sorted array."""
    n = len(A)
    C = [0] * (k + 1)   # count array
    B = [0] * n           # output array

    # Pass 1: count occurrences
    for val in A:
        C[val] += 1

    # Pass 2: prefix sums
    for i in range(1, k + 1):
        C[i] += C[i - 1]

    # Pass 3: place elements (right-to-left for stability)
    for i in range(n - 1, -1, -1):
        val = A[i]
        C[val] -= 1
        B[C[val]] = val

    return B

Check: You have n = 10,000 integers, each in the range [0, 10,000,000]. Is counting sort a good choice?

Yes — it runs in O(n) which is always fast No — k = 10,000,000 dwarfs n = 10,000. The count array alone needs 10 million entries (40 MB for 4-byte ints), and the O(k) prefix sum pass dominates. Use radix sort or a comparison sort instead It depends on whether the integers are unique

Chapter 3: Radix Sort

Counting sort is brilliant when the range k is small. But what if your integers are large — say, 9-digit numbers? k = 999,999,999 and a count array that big is absurd. Radix sort solves this by sorting one digit at a time, using counting sort on each digit position.

The Key Insight: Sort Least Significant Digit First

Your first instinct might be to sort by the most significant digit first, like sorting words in a dictionary (alphabetical from left). But that creates subproblems: after grouping by the first digit, you must recursively sort within each group. That is MSD (Most Significant Digit) radix sort — it works, but it is recursive and complex.

LSD (Least Significant Digit) radix sort is simpler and more elegant. You sort by the last digit first, then the second-to-last, and so on. It sounds backwards, but it works perfectly — as long as each digit-sort is stable.

Why LSD-first works. After sorting by digit position d, all numbers are in correct order with respect to digits 1 through d (least significant through d). When you then sort by digit d+1, stability ensures that numbers with the same digit d+1 stay in their previous relative order — which was correct for digits 1 through d. By induction, after sorting on all d digits, the numbers are fully sorted.

Worked Example

Sort: [329, 457, 657, 839, 436, 720, 355]

Pass 1 — sort by ones digit (d=1):

Value	Ones digit
720	0
355	5
436	6
457	7
657	7
329	9
839	9

After pass 1: [720, 355, 436, 457, 657, 329, 839]. Note: 457 before 657 (stable), 329 before 839 (stable).

Pass 2 — sort by tens digit (d=2):

Value	Tens digit
720	2
329	2
436	3
839	3
355	5
457	5
657	5

After pass 2: [720, 329, 436, 839, 355, 457, 657]. Stability preserved: 720 before 329 (both tens digit 2, and 720 was before 329 after pass 1).

Pass 3 — sort by hundreds digit (d=3):

Value	Hundreds digit
329	3
355	3
436	4
457	4
657	6
720	7
839	8

After pass 3: [329, 355, 436, 457, 657, 720, 839]. Fully sorted.

Radix Sort Visualization

Watch radix sort process [329, 457, 657, 839, 436, 720, 355] digit by digit. Each pass uses counting sort on one digit position (highlighted). Click Step to advance one pass, or Run to animate all passes.

Press Step or Run to begin.

Complexity

// d = number of digits, k = base (radix)
// Each pass: counting sort on n values in [0, k-1] = O(n + k)
// Total: d passes

T(n) = O(d(n + k))

// For b-bit integers with radix r = 2^t (t bits per digit):
// d = b/t digits, each in [0, 2^t - 1], so k = 2^t
T(n) = O((b/t)(n + 2^t))

// Optimal choice: t = log n, giving k = n, so
T(n) = O(bn/log n)

// For 32-bit integers: O(32n/log n) = O(n/log n) × 32
// For n = 1M: ~32M/20 = 1.6M ops (vs ~20M for merge sort)

In practice, radix sort with base 256 (8-bit digits, d = 4 for 32-bit integers) is extremely fast. Four passes of counting sort, each operating on just 256 buckets. Cache-friendly, branch-free, and predictable.

Code

python
def radix_sort(A, base=10):
    """LSD radix sort for non-negative integers."""
    if not A:
        return A
    max_val = max(A)
    exp = 1  # current digit position
    while max_val // exp > 0:
        A = _counting_sort_by_digit(A, exp, base)
        exp *= base
    return A

def _counting_sort_by_digit(A, exp, base):
    """Stable counting sort on the digit at position exp."""
    n = len(A)
    B = [0] * n
    C = [0] * base

    # Count digit occurrences
    for val in A:
        digit = (val // exp) % base
        C[digit] += 1

    # Prefix sums
    for i in range(1, base):
        C[i] += C[i - 1]

    # Place (right-to-left for stability)
    for i in range(n - 1, -1, -1):
        digit = (A[i] // exp) % base
        C[digit] -= 1
        B[C[digit]] = A[i]

    return B

Check: In radix sort, why do we sort by the LEAST significant digit first (not the most significant)?

It is faster because smaller digits have fewer values MSD-first would produce incorrect results LSD-first is simpler: each pass uses a stable sort, so by induction, after sorting on digit d, all digits 1..d are correctly ordered. MSD-first works too, but requires recursive sub-grouping — LSD avoids that complexity entirely

Chapter 4: Bucket Sort

Counting sort assumes integers in a known range. Radix sort assumes integers (or strings) you can decompose digit-by-digit. What if your data is floating-point numbers uniformly distributed in [0, 1)? Neither counting sort nor radix sort applies cleanly. Enter bucket sort.

The Algorithm

The idea is beautifully simple: if n numbers are uniformly distributed in [0, 1), divide the interval into n equal-sized buckets [0, 1/n), [1/n, 2/n), ..., [(n-1)/n, 1). On average, each bucket gets about one element. Sort each bucket (with insertion sort — efficient for tiny lists), then concatenate.

Step 1: Create n empty buckets

Bucket i covers the interval [i/n, (i+1)/n).

↓

Step 2: Distribute

For each element x, compute bucket index ⌊n · x⌋ and insert x into that bucket.

↓

Step 3: Sort each bucket

Use insertion sort on each bucket (expected O(1) elements per bucket).

↓

Step 4: Concatenate

Walk through buckets 0, 1, ..., n-1 in order. Output all elements from each bucket.

Worked Example

Input A = [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68], n = 10.

Bucket	Range	Elements	After sort
0	[0.0, 0.1)	-	-
1	[0.1, 0.2)	0.17, 0.12	0.12, 0.17
2	[0.2, 0.3)	0.26, 0.21, 0.23	0.21, 0.23, 0.26
3	[0.3, 0.4)	0.39	0.39
4-5	[0.4, 0.6)	-	-
6	[0.6, 0.7)	0.68	0.68
7	[0.7, 0.8)	0.78, 0.72	0.72, 0.78
8	[0.8, 0.9)	-	-
9	[0.9, 1.0)	0.94	0.94

Concatenated: [0.12, 0.17, 0.21, 0.23, 0.26, 0.39, 0.68, 0.72, 0.78, 0.94].

Why insertion sort? With uniform distribution, each bucket has O(1) expected elements. Insertion sort on a list of length m is O(m²), but when m is O(1), that is O(1). The total expected time across all buckets is O(n). If the distribution is heavily skewed, some buckets get many elements and the O(m²) hurts — bucket sort degrades to O(n²) worst case.

Expected Time Analysis

// Let n_i = number of elements in bucket i
// Insertion sort on bucket i costs O(n_i^2)
// Total cost = sum_{i=0}^{n-1} O(n_i^2)

E[T(n)] = O(n) + ∑_i=0^n-1 E[n_i²]

// For uniform distribution: E[n_i] = 1, Var[n_i] = (n-1)/n^2
// E[n_i^2] = Var[n_i] + (E[n_i])^2 = (n-1)/n^2 + 1 = 2 - 1/n

E[T(n)] = O(n) + n · (2 - 1/n) = O(n) + O(n) = O(n)

Expected time: O(n) when input is uniformly distributed. Worst case: O(n²) when all elements land in the same bucket.

Bucket Sort Visualization

Watch elements get distributed into buckets, sorted within each bucket, and concatenated. Toggle between uniform and skewed distributions to see how bucket sizes change.

Press Step or Run to begin.

Code

python
def bucket_sort(A):
    """Sort floats in [0, 1) using bucket sort."""
    n = len(A)
    buckets = [[] for _ in range(n)]

    # Distribute into buckets
    for x in A:
        idx = int(n * x)
        if idx == n:  # edge case: x = 1.0
            idx = n - 1
        buckets[idx].append(x)

    # Sort each bucket (insertion sort)
    for b in buckets:
        _insertion_sort(b)

    # Concatenate
    result = []
    for b in buckets:
        result.extend(b)
    return result

def _insertion_sort(A):
    """In-place insertion sort."""
    for i in range(1, len(A)):
        key = A[i]
        j = i - 1
        while j >= 0 and A[j] > key:
            A[j + 1] = A[j]
            j -= 1
        A[j + 1] = key

Check: Your data is 1 million floats, but they are all clustered between 0.50 and 0.51 (not uniformly distributed). How does bucket sort perform?

Terribly — almost all elements land in bucket 500 (of 1M buckets). That single bucket has ~1M elements, and insertion sort on it is O(n²). You should use merge sort or radix sort instead O(n) — bucket sort always runs in linear time O(n log n) — same as merge sort

Chapter 5: The Sorting Arena

Time for the payoff. Below is a head-to-head comparison of every sorting algorithm we have studied: three comparison-based sorts (merge sort, quicksort, heapsort) and three linear-time sorts (counting sort, radix sort, bucket sort). You control the data — its size, its range, and its distribution. Watch the operation counts diverge as you find each algorithm's sweet spot and its weakness.

This is the showcase. Spend time here. Try each data type. Notice when linear-time sorts demolish comparison sorts — and when they do not. The numbers tell the story.

Sorting Arena: Head-to-Head

Configure the data, then click Race to run all algorithms on the same input and compare operation counts. Operations = comparisons (comparison sorts) or array reads+writes (linear sorts).

Size (n)

Data type

Configure, then click Race.

What to Try

Experiment	What happens	Why
Integers [0, n], n=1000	Counting sort wins by a landslide	k = n, so counting sort is O(n). Comparison sorts do ~10,000 ops
Integers [0, n³], n=100	Radix sort wins, counting sort chokes	k = 10⁶ is too large for counting sort. Radix breaks the number into manageable digits
Uniform floats, n=500	Bucket sort wins	Uniform distribution means ~1 element per bucket: O(n)
Skewed floats, n=500	Bucket sort collapses, merge sort wins	Most elements land in a few buckets: O(n²) degradation
Nearly sorted, n=1000	Insertion sort (inside bucket sort) is fast, but merge sort is consistent	Adaptive algorithms exploit existing order

The lesson: there is no universally best sorting algorithm. The right choice depends on your data. Knowing these five factors — range, distribution, data type, stability needs, and memory budget — is what separates a junior programmer from an engineer.

Chapter 6: When to Use Which

A sort algorithm is a tool. Like any tool, the question is not "which is best?" but "which is best for this job?" Here is the decision framework.

Decision Tree

Algorithm Selection Guide

Answer the questions to find the best sort for your situation. Click a path to follow it.

The Complete Comparison

Algorithm	Best	Average	Worst	Space	Stable?	When to use
Merge sort	O(n log n)	O(n log n)	O(n log n)	O(n)	Yes	General purpose, guaranteed worst case, need stability
Quicksort	O(n log n)	O(n log n)	O(n²)	O(log n)	No	General purpose, best average-case constants, in-place preferred
Heapsort	O(n log n)	O(n log n)	O(n log n)	O(1)	No	Strict O(1) extra space needed, guaranteed worst case
Counting sort	O(n+k)	O(n+k)	O(n+k)	O(n+k)	Yes	Small integer range k = O(n)
Radix sort	O(d(n+k))	O(d(n+k))	O(d(n+k))	O(n+k)	Yes	Fixed-width integers/strings, many digits, k per digit is small
Bucket sort	O(n)	O(n)	O(n²)	O(n)	Yes*	Uniformly distributed floats in known range

*Bucket sort is stable if the sub-sort is stable (insertion sort is stable).

Real-World Usage

Scenario	Best algorithm	Why
Sort ages of all US residents (0-120)	Counting sort	k = 120 « n = 330M. O(n) trivially
Sort 10M 32-bit IP addresses	Radix sort (base 256)	4 passes of counting sort on 256 buckets. Cache-friendly
Sort 1M uniformly random floats	Bucket sort	Uniform distribution, expected O(n)
Sort database rows by name	Merge sort	Strings need comparison, need stability for secondary keys
Sort 1000 sensor readings (arbitrary floats)	Quicksort (introsort)	Small n, in-place, best constants. std::sort in C++
Sort-then-unique on log timestamps	Merge sort or Timsort	Nearly sorted input, stability helps, Timsort exploits runs

The real world uses hybrids. Python's sorted() uses Timsort (merge sort + insertion sort, exploits natural runs). C++'s std::sort uses introsort (quicksort + heapsort fallback). Java's Arrays.sort uses dual-pivot quicksort for primitives, Timsort for objects. These hybrids combine the best properties of multiple algorithms.

Check: You need to sort 50 million 64-bit integers. Memory is plentiful, and you need the result to be stable. Which algorithm?

Quicksort — fastest in practice Radix sort (base 256) — 8 passes of counting sort on 256 buckets. O(8(n+256)) = O(n). Stable by construction. With 50M elements, radix sort's O(n) handily beats merge sort's O(n log n) with its ~26 factor Counting sort — always best for integers

Chapter 7: Stability

Two students both scored 85 on the midterm. In the original roster, Alice is listed before Bob. After sorting by score, a stable sort keeps Alice before Bob. An unstable sort might swap them.

This sounds like a minor detail. It is not. Stability is the reason radix sort works. It is the reason database sorts compose. It is a property that interviewers ask about because it reveals whether you understand sorting at a deep level.

Formal Definition

A sorting algorithm is stable if, whenever two elements have equal keys, they appear in the output in the same relative order as in the input. Formally: if A[i] = A[j] and i < j, then in the sorted output, A[i] appears before A[j].

Why Stability Matters

1. Multi-key sorting. You want to sort students by grade, then by name within each grade. If you sort by name first, then sort by grade with a stable sort, students with the same grade remain in alphabetical order. Two stable sorts compose into a multi-key sort. This is exactly how radix sort works: sort by LSD, then next digit, etc.

2. Preserving meaningful order. Database rows often have a "natural" order (insertion time, row ID). A stable sort preserves this order among ties, which users expect.

3. Correctness of radix sort. If the digit-level sort is unstable, radix sort breaks. The LSD-first strategy relies on stability to preserve the order established by previous digit passes.

Radix sort's dependency on stability. Consider sorting [21, 12, 22, 11] with radix sort (base 10). Pass 1 (ones digit): [21, 11, 12, 22]. Now sort by tens digit. 21 and 22 both have tens digit 2. A stable sort puts 21 before 22 (their order from pass 1). An unstable sort might produce [..., 22, 21, ...] — WRONG. Stability is not a nice-to-have. It is a correctness requirement for radix sort.

Stability Classification

Stable	Unstable
Counting sort	Quicksort
Merge sort	Heapsort
Insertion sort	Selection sort
Bubble sort	Shellsort
Radix sort (uses stable sub-sort)	Introsort (std::sort)
Timsort (Python sorted())	Tree sort (BST-based)

Stability Visualizer

Elements with the same key have different colors. A stable sort preserves the left-to-right color order among equal keys. An unstable sort may scramble them. Compare the two side by side.

Click Run Both to sort with merge sort (stable) and quicksort (unstable).

Making an Unstable Sort Stable

Any sort can be made stable by appending the original index as a tiebreaker. If a[i] = a[j] and i < j, compare as (a[i], i) < (a[j], j). This always breaks ties in favor of the earlier element. The cost: O(n) extra space for the index array, and slightly larger comparisons.

python
def stable_sort(A):
    """Make any sort stable by using (value, index) pairs."""
    decorated = [(val, i) for i, val in enumerate(A)]
    decorated.sort()  # Python's sort compares tuples lexicographically
    return [val for val, _ in decorated]

Check: You sort a list of (name, score) pairs by score using heapsort. Then you sort the result by name using heapsort again. Are students with the same name in order by score?

Yes — two sorts always compose correctly Not necessarily — heapsort is unstable. The second sort by name may scramble the score order among students with the same name. Use a stable sort (merge sort) for the second pass, or sort by (name, score) in a single pass Only if there are no duplicate names

Chapter 8: Interview Arsenal

This is your reference chapter. Cheat sheets, coding drills, and the patterns that interviewers love to test.

The Master Cheat Sheet

Algorithm	Time (avg)	Time (worst)	Space	Stable	In-place	Comparison?
Insertion	O(n²)	O(n²)	O(1)	Yes	Yes	Yes
Merge	O(n log n)	O(n log n)	O(n)	Yes	No	Yes
Quick	O(n log n)	O(n²)	O(log n)	No	Yes	Yes
Heap	O(n log n)	O(n log n)	O(1)	No	Yes	Yes
Counting	O(n+k)	O(n+k)	O(n+k)	Yes	No	No
Radix	O(d(n+k))	O(d(n+k))	O(n+k)	Yes	No	No
Bucket	O(n)	O(n²)	O(n)	Yes*	No	Hybrid

Coding Drill 1: Counting Sort

Implement counting sort from memory. This is a common interview question because it tests whether you understand prefix sums and stability.

python
# Drill: implement from scratch in 2 minutes
def counting_sort(A, k):
    C = [0] * (k + 1)
    B = [0] * len(A)

    for x in A:             # count
        C[x] += 1

    for i in range(1, k+1): # prefix sum
        C[i] += C[i-1]

    for i in range(len(A)-1, -1, -1):  # place (R-to-L)
        C[A[i]] -= 1
        B[C[A[i]]] = A[i]

    return B

Coding Drill 2: Sort by Frequency

Given an array, sort elements by frequency (most frequent first). If two elements have the same frequency, maintain their order of first appearance. This combines counting and stability.

python
from collections import Counter

def sort_by_frequency(A):
    """Sort by frequency (descending). Stable for ties."""
    freq = Counter(A)
    # Decorate-sort-undecorate
    # Key: (-frequency, first_occurrence_index)
    first_seen = {}
    for i, x in enumerate(A):
        if x not in first_seen:
            first_seen[x] = i

    return sorted(A, key=lambda x: (-freq[x], first_seen[x]))

# Example: [1,1,2,2,2,3] -> [2,2,2,1,1,3]

Coding Drill 3: Top K Frequent Elements

Given n elements, find the k most frequent. Bucket sort approach: O(n) time.

python
from collections import Counter

def top_k_frequent(A, k):
    """O(n) using bucket sort by frequency."""
    freq = Counter(A)
    n = len(A)

    # Buckets: index = frequency, value = list of elements
    buckets = [[] for _ in range(n + 1)]
    for val, count in freq.items():
        buckets[count].append(val)

    # Collect from highest frequency down
    result = []
    for i in range(n, 0, -1):
        for val in buckets[i]:
            result.append(val)
            if len(result) == k:
                return result
    return result

# Example: top_k_frequent([1,1,1,2,2,3], 2) -> [1, 2]

Coding Drill 4: Radix Sort for Strings

Radix sort is not just for integers. You can sort fixed-length strings by applying counting sort to each character position, from right to left (LSD).

python
def radix_sort_strings(A, strlen):
    """LSD radix sort for fixed-length ASCII strings."""
    R = 256  # alphabet size (ASCII)
    for d in range(strlen - 1, -1, -1):
        # Counting sort by character at position d
        C = [0] * R
        B = [""] * len(A)

        for s in A:
            C[ord(s[d])] += 1
        for i in range(1, R):
            C[i] += C[i-1]
        for i in range(len(A)-1, -1, -1):
            ch = ord(A[i][d])
            C[ch] -= 1
            B[C[ch]] = A[i]
        A = B

    return A

Common Interview Questions

Question	Key idea	Algorithm
Sort an array where values are in [0, k]	Small range: count occurrences	Counting sort O(n+k)
Sort 1M 32-bit integers	Break into 4 bytes, sort each	Radix sort O(n)
Find k most frequent elements	Count then bucket by frequency	Counting + bucket O(n)
Sort colors (Dutch national flag)	3-way partition	O(n) one-pass partitioning
Can we sort faster than n log n?	Only with non-comparison methods	Decision tree lower bound
Sort linked list	Merge sort (no random access needed)	Merge sort O(n log n)
Sort nearly-sorted array	Insertion sort is O(nk) when k swaps away	Insertion sort or Timsort

Interview drill: You are given an array of n strings, each of length L, over a fixed alphabet of size R. What is the fastest possible sort, and what is its time complexity?

LSD radix sort: L passes of counting sort with R buckets each. Time: O(L(n + R)). This is O(nL) when R is constant (e.g., R = 26 for lowercase letters). It examines every character of every string exactly once per pass — you cannot sort without at least reading all nL characters, so this is optimal Quicksort: O(nL log n) — compare strings in O(L) each Bucket sort: O(n) expected

Chapter 9: Connections

Linear-time sorting does not exist in a vacuum. Every algorithm in this chapter builds on ideas from earlier chapters and opens doors to later ones.

Where It Connects

Topic	Connection
CLRS Ch 2 (Insertion Sort, Merge Sort)	Insertion sort is the sub-routine inside bucket sort. Merge sort is the gold standard comparison sort that counting/radix/bucket aim to beat
CLRS Ch 6 (Heapsort)	O(n log n) worst-case, O(1) space, but unstable. Cannot be used as radix sort's sub-routine
CLRS Ch 7 (Quicksort)	O(n log n) average, best constants in practice, but O(n²) worst case and unstable. The partition step is used in many linear-time selection and sorting problems
CLRS Ch 9 (Order Statistics)	Finding the k-th smallest element in O(n). Uses the partition idea from quicksort. Does NOT need to fully sort
Hash tables	Counting sort's count array is essentially a hash table for integers. The prefix sum step is a cumulative distribution function
Parallel algorithms	Radix sort parallelizes beautifully: each digit pass is independent across elements. GPU sort implementations (CUB, Thrust) use radix sort as the backbone

The Hierarchy of Sorting Knowledge

Level 1: Know the algorithms

Can implement merge sort, quicksort, counting sort from memory. Knows time/space complexities.

↓

Level 2: Know when to use which

Can analyze data properties (range, distribution, size) and choose the optimal algorithm. Understands stability.

↓

Level 3: Know the lower bounds

Can prove the Ω(n log n) lower bound. Understands what assumptions enable linear-time sorting. Knows the information-theoretic argument.

↓

Level 4: Know the systems

Understands Timsort, introsort, parallel radix sort. Knows how real libraries choose algorithms. Can design a sorting pipeline for production data.

This lesson brought you through levels 1-3. Level 4 comes with experience — writing production code that sorts billions of records, profiling cache misses in radix sort, and understanding why NVIDIA's CUB library uses a 4-bit radix.

What comes next? Chapter 9 (CLRS) covers order statistics: finding the k-th smallest element without fully sorting. The key idea is that selection is fundamentally easier than sorting — O(n) worst case, provably. It reuses the partition step from quicksort and the adversary argument from this chapter's lower bound proof.

"The purpose of computing is insight, not numbers." — Richard Hamming

Final check: What is the single most important idea from this chapter?

Counting sort is the fastest sorting algorithm The speed of sorting depends on what you know about your data. Comparison sorts are Ω(n log n) because they learn only 1 bit per comparison. If you know the value range (counting sort), digit structure (radix sort), or distribution (bucket sort), you can exploit that knowledge to sort in O(n). The right algorithm depends on the right assumptions Linear-time sorts are always better than comparison sorts