Introduction to Algorithms (CLRS) — Chapter 8

Sorting in Linear Time

Counting sort, radix sort, bucket sort — breaking the O(n log n) barrier.

Prerequisites: Comparison sorting + Big-O. That's it.
10
Chapters
9+
Simulations
5
Interview Dimensions

Chapter 0: The Problem

You have a million exam scores, each between 0 and 100. You need them sorted. You reach for merge sort — O(n log n), guaranteed. For n = 1,000,000 that is about 20 million operations. Not bad. But look at your data more carefully: the values are integers in a tiny range [0, 100]. You have a million numbers but only 101 distinct possible values. Do you really need 20 million comparisons?

No. You could walk the array once, count how many times each score appears, and reconstruct the sorted output in a single pass. That is two passes through the data: O(n). No comparisons needed at all.

This chapter is about a fundamental question in computer science: how fast can we sort? The answer depends on what operations you allow yourself to use.

The Comparison Sorting Wall

Every sorting algorithm you have met so far — merge sort, quicksort, heapsort, insertion sort — works by comparing pairs of elements. "Is a[i] ≤ a[j]?" The algorithm's decisions are entirely driven by the answers to these yes/no questions. We call these comparison sorts.

The stunning result of this chapter: any comparison sort must make at least Ω(n log n) comparisons in the worst case. This is not a statement about a specific algorithm being slow. It is a mathematical proof that no comparison-based algorithm — not one that exists today, not one that will ever be invented — can beat n log n. It is a lower bound on the problem itself.

But we CAN break the barrier. The lower bound only applies to comparison sorts. If we exploit additional information about the data — like knowing the values are integers in a bounded range — we can sort in O(n) time. Counting sort, radix sort, and bucket sort do exactly this. They never compare two elements. They use the values themselves as addresses.

Decision Tree for Sorting 3 Elements

To see why comparison sorts cannot beat n log n, let us visualize what they actually do. Every comparison sort on n elements can be drawn as a decision tree: a binary tree where each internal node is a comparison "a[i] ≤ a[j]?", the left branch is "yes", the right branch is "no", and each leaf is one possible sorted order (a permutation).

For n = 3 elements (call them a, b, c), there are 3! = 6 possible orderings. The tree must have at least 6 leaves. Since a binary tree with L leaves has height at least ⌈log2 L⌉, the tree must have height at least ⌈log2 6⌉ = 3. That means at least 3 comparisons in the worst case.

Decision Tree for Sorting [a, b, c]

Every path from root to leaf is one possible execution of a comparison sort on three elements. Count the leaves: there must be at least 6 (one per permutation). The longest path = worst-case comparisons.

The tree above has exactly 6 leaves (the 6 permutations of three elements) and height 3. That is tight: you can sort 3 elements with exactly 3 comparisons in the worst case. But as n grows, the number of leaves explodes: n! grows much faster than 2n, and the minimum height of the tree grows as Ω(n log n).

The core insight. A comparison sort is an information-gathering machine. Each comparison gives you one bit of information (yes or no). To distinguish between n! possible orderings, you need at least log2(n!) bits. By Stirling's approximation, log2(n!) ≈ n log2 n. Therefore: at least n log n comparisons. No clever algorithm can dodge this — it is information-theoretic.
Quick check: Merge sort runs in O(n log n). The lower bound for comparison sorts is Ω(n log n). What does this tell us about merge sort?

Chapter 1: Decision Tree Lower Bound

In Chapter 0 we saw the intuition: comparison sorts gather one bit per comparison, and they need enough bits to identify one of n! orderings. Now let us make this argument airtight.

The Model

A comparison sort is any sorting algorithm that determines the sorted order only by comparing pairs of input elements. Given elements a1, a2, ..., an, the algorithm may ask questions of the form "is ai ≤ aj?" and branch based on the answer. It may not look at the values of the elements directly — only at the results of comparisons.

We model any such algorithm as a decision tree:

Tree elementMeaning
Internal nodeA comparison ai : aj (is ai ≤ aj?)
Left childBranch taken if ai ≤ aj (yes)
Right childBranch taken if ai > aj (no)
LeafA permutation π that describes the sorted order
Root-to-leaf pathThe sequence of comparisons for one particular input
Height of treeWorst-case number of comparisons

The Theorem

Theorem 8.1 (CLRS): Any comparison sort algorithm requires Ω(n log n) comparisons in the worst case.

Proof. The decision tree must have at least n! leaves (one for each permutation of n elements — each permutation is a possible input ordering, and the algorithm must be able to produce each one as output). A binary tree of height h has at most 2h leaves. Therefore:

2h ≥ n!

Taking logarithms:

h ≥ log2(n!)

By Stirling's approximation, n! ≈ √(2πn) · (n/e)n, so:

// Stirling's approximation
log2(n!) = log2(√(2πn)) + n · log2(n/e)

// The dominant term is n log(n/e) = n log n - n log e
log2(n!) = n log2 n - n log2 e + O(log n)

// Therefore:
h ≥ n log2 n - n log2 e = Ω(n log n)    ■

This is why merge sort, heapsort, and the best quicksort variants all converge to O(n log n). They are optimal within the comparison model. To go faster, we must step outside the model entirely.

A tighter bound for specific n. For n = 3: log2(3!) = log2(6) ≈ 2.585, so h ≥ 3. For n = 4: log2(4!) = log2(24) ≈ 4.585, so h ≥ 5. For n = 10: log2(10!) ≈ 21.8, so at least 22 comparisons. These exact bounds match what the best comparison sorts achieve.
Lower Bound vs Actual Comparisons

See how the theoretical lower bound log2(n!) compares to the actual worst-case comparisons of merge sort (n⌈log n⌉) and insertion sort (n(n-1)/2). Drag the slider to change n.

n 10

What the Lower Bound Does NOT Say

The lower bound says comparison sorts cannot beat Ω(n log n). It does not say:

1. That sorting itself is Ω(n log n). Only comparison-based sorting is. If you know the data is integers in [0, k], counting sort does it in O(n + k).

2. That all inputs require n log n comparisons. Many inputs are easier — insertion sort on nearly-sorted data is O(n). The lower bound is about the worst case.

3. That constant factors do not matter. Quicksort's O(n log n) average case has smaller constants than merge sort's O(n log n) worst case, which is why quicksort is often faster in practice despite a worse worst case.

Check: A new sorting algorithm is proposed that runs in O(n log log n) worst-case time. It only uses pairwise comparisons. Is this possible?

Chapter 2: Counting Sort

Merge sort asks "which of these two elements is smaller?" over and over. But what if you already know where each element belongs — because you know its value, and you know how many elements are smaller? That is counting sort.

The Algorithm

Suppose you have n integers, each in the range [0, k]. Counting sort works in three passes:

Pass 1: Count
Walk the input array A[0..n-1]. For each value v, increment C[v]. After this pass, C[v] = number of elements equal to v.
Pass 2: Prefix Sum
For v = 1 to k: set C[v] = C[v] + C[v-1]. Now C[v] = number of elements ≤ v. This tells you: the last element with value v belongs at position C[v]-1 in the output.
Pass 3: Place
Walk A from RIGHT to LEFT. For each A[i], place it at B[C[A[i]] - 1], then decrement C[A[i]]. Walking right-to-left preserves the original order of equal elements (stability).

Worked Example

Input A = [2, 5, 3, 0, 2, 3, 0, 3], range k = 5.

Pass 1 (count): C = [2, 0, 2, 3, 0, 1]. Value 0 appears twice, value 2 appears twice, value 3 appears three times, value 5 appears once.

Pass 2 (prefix sum): C = [2, 2, 4, 7, 7, 8]. This means: elements ≤ 0 occupy positions 0-1, elements ≤ 2 occupy positions 0-3, elements ≤ 3 occupy positions 0-6, elements ≤ 5 occupy positions 0-7.

Pass 3 (place, right to left):

i (R-to-L)A[i]C[A[i]]Place atOutput B
737 → 6B[6][_, _, _, _, _, _, 3, _]
602 → 1B[1][_, 0, _, _, _, _, 3, _]
536 → 5B[5][_, 0, _, _, _, 3, 3, _]
424 → 3B[3][_, 0, _, 2, _, 3, 3, _]
301 → 0B[0][0, 0, _, 2, _, 3, 3, _]
235 → 4B[4][0, 0, _, 2, 3, 3, 3, _]
158 → 7B[7][0, 0, _, 2, 3, 3, 3, 5]
023 → 2B[2][0, 0, 2, 2, 3, 3, 3, 5]

Result: [0, 0, 2, 2, 3, 3, 3, 5]. Sorted. No comparisons made.

Stability is built in. We walk A right-to-left in Pass 3. This means that among elements with the same value, the one that appeared later in A is placed first (at a higher index), and the one that appeared earlier is placed after it (at a lower index). This preserves original relative order. Stability is critical for radix sort, which uses counting sort as a subroutine.
Counting Sort Step-Through

Watch counting sort build the count array, compute prefix sums, and place elements into the output array. Click Step to advance one operation at a time, or Run to animate the whole thing.

Press Step or Run to begin.

Complexity

// Pass 1: count occurrences
O(n)    one pass through input

// Pass 2: prefix sums
O(k)    one pass through count array

// Pass 3: place elements
O(n)    one pass through input (reversed)

// Total:
O(n + k)    linear when k = O(n)

Space: O(n + k) — the output array B[0..n-1] plus the count array C[0..k].

When is this useful? When k (the range of values) is not much larger than n. Sorting a million exam scores in [0, 100]? k = 100, n = 1,000,000. O(n + k) = O(n). Brilliant. Sorting a million 64-bit integers? k = 264. The count array alone would not fit in any computer's memory. Terrible. For large ranges, we need radix sort.

Code

python
def counting_sort(A, k):
    """Sort array A of integers in [0, k]. Returns new sorted array."""
    n = len(A)
    C = [0] * (k + 1)   # count array
    B = [0] * n           # output array

    # Pass 1: count occurrences
    for val in A:
        C[val] += 1

    # Pass 2: prefix sums
    for i in range(1, k + 1):
        C[i] += C[i - 1]

    # Pass 3: place elements (right-to-left for stability)
    for i in range(n - 1, -1, -1):
        val = A[i]
        C[val] -= 1
        B[C[val]] = val

    return B
Check: You have n = 10,000 integers, each in the range [0, 10,000,000]. Is counting sort a good choice?

Chapter 3: Radix Sort

Counting sort is brilliant when the range k is small. But what if your integers are large — say, 9-digit numbers? k = 999,999,999 and a count array that big is absurd. Radix sort solves this by sorting one digit at a time, using counting sort on each digit position.

The Key Insight: Sort Least Significant Digit First

Your first instinct might be to sort by the most significant digit first, like sorting words in a dictionary (alphabetical from left). But that creates subproblems: after grouping by the first digit, you must recursively sort within each group. That is MSD (Most Significant Digit) radix sort — it works, but it is recursive and complex.

LSD (Least Significant Digit) radix sort is simpler and more elegant. You sort by the last digit first, then the second-to-last, and so on. It sounds backwards, but it works perfectly — as long as each digit-sort is stable.

Why LSD-first works. After sorting by digit position d, all numbers are in correct order with respect to digits 1 through d (least significant through d). When you then sort by digit d+1, stability ensures that numbers with the same digit d+1 stay in their previous relative order — which was correct for digits 1 through d. By induction, after sorting on all d digits, the numbers are fully sorted.

Worked Example

Sort: [329, 457, 657, 839, 436, 720, 355]

Pass 1 — sort by ones digit (d=1):

ValueOnes digit
7200
3555
4366
4577
6577
3299
8399

After pass 1: [720, 355, 436, 457, 657, 329, 839]. Note: 457 before 657 (stable), 329 before 839 (stable).

Pass 2 — sort by tens digit (d=2):

ValueTens digit
7202
3292
4363
8393
3555
4575
6575

After pass 2: [720, 329, 436, 839, 355, 457, 657]. Stability preserved: 720 before 329 (both tens digit 2, and 720 was before 329 after pass 1).

Pass 3 — sort by hundreds digit (d=3):

ValueHundreds digit
3293
3553
4364
4574
6576
7207
8398

After pass 3: [329, 355, 436, 457, 657, 720, 839]. Fully sorted.

Radix Sort Visualization

Watch radix sort process [329, 457, 657, 839, 436, 720, 355] digit by digit. Each pass uses counting sort on one digit position (highlighted). Click Step to advance one pass, or Run to animate all passes.

Press Step or Run to begin.

Complexity

// d = number of digits, k = base (radix)
// Each pass: counting sort on n values in [0, k-1] = O(n + k)
// Total: d passes

T(n) = O(d(n + k))

// For b-bit integers with radix r = 2^t (t bits per digit):
// d = b/t digits, each in [0, 2^t - 1], so k = 2^t
T(n) = O((b/t)(n + 2t))

// Optimal choice: t = log n, giving k = n, so
T(n) = O(bn/log n)

// For 32-bit integers: O(32n/log n) = O(n/log n) × 32
// For n = 1M: ~32M/20 = 1.6M ops (vs ~20M for merge sort)

In practice, radix sort with base 256 (8-bit digits, d = 4 for 32-bit integers) is extremely fast. Four passes of counting sort, each operating on just 256 buckets. Cache-friendly, branch-free, and predictable.

Code

python
def radix_sort(A, base=10):
    """LSD radix sort for non-negative integers."""
    if not A:
        return A
    max_val = max(A)
    exp = 1  # current digit position
    while max_val // exp > 0:
        A = _counting_sort_by_digit(A, exp, base)
        exp *= base
    return A

def _counting_sort_by_digit(A, exp, base):
    """Stable counting sort on the digit at position exp."""
    n = len(A)
    B = [0] * n
    C = [0] * base

    # Count digit occurrences
    for val in A:
        digit = (val // exp) % base
        C[digit] += 1

    # Prefix sums
    for i in range(1, base):
        C[i] += C[i - 1]

    # Place (right-to-left for stability)
    for i in range(n - 1, -1, -1):
        digit = (A[i] // exp) % base
        C[digit] -= 1
        B[C[digit]] = A[i]

    return B
Check: In radix sort, why do we sort by the LEAST significant digit first (not the most significant)?

Chapter 4: Bucket Sort

Counting sort assumes integers in a known range. Radix sort assumes integers (or strings) you can decompose digit-by-digit. What if your data is floating-point numbers uniformly distributed in [0, 1)? Neither counting sort nor radix sort applies cleanly. Enter bucket sort.

The Algorithm

The idea is beautifully simple: if n numbers are uniformly distributed in [0, 1), divide the interval into n equal-sized buckets [0, 1/n), [1/n, 2/n), ..., [(n-1)/n, 1). On average, each bucket gets about one element. Sort each bucket (with insertion sort — efficient for tiny lists), then concatenate.

Step 1: Create n empty buckets
Bucket i covers the interval [i/n, (i+1)/n).
Step 2: Distribute
For each element x, compute bucket index ⌊n · x⌋ and insert x into that bucket.
Step 3: Sort each bucket
Use insertion sort on each bucket (expected O(1) elements per bucket).
Step 4: Concatenate
Walk through buckets 0, 1, ..., n-1 in order. Output all elements from each bucket.

Worked Example

Input A = [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68], n = 10.

BucketRangeElementsAfter sort
0[0.0, 0.1)--
1[0.1, 0.2)0.17, 0.120.12, 0.17
2[0.2, 0.3)0.26, 0.21, 0.230.21, 0.23, 0.26
3[0.3, 0.4)0.390.39
4-5[0.4, 0.6)--
6[0.6, 0.7)0.680.68
7[0.7, 0.8)0.78, 0.720.72, 0.78
8[0.8, 0.9)--
9[0.9, 1.0)0.940.94

Concatenated: [0.12, 0.17, 0.21, 0.23, 0.26, 0.39, 0.68, 0.72, 0.78, 0.94].

Why insertion sort? With uniform distribution, each bucket has O(1) expected elements. Insertion sort on a list of length m is O(m2), but when m is O(1), that is O(1). The total expected time across all buckets is O(n). If the distribution is heavily skewed, some buckets get many elements and the O(m2) hurts — bucket sort degrades to O(n2) worst case.

Expected Time Analysis

// Let n_i = number of elements in bucket i
// Insertion sort on bucket i costs O(n_i^2)
// Total cost = sum_{i=0}^{n-1} O(n_i^2)

E[T(n)] = O(n) + ∑i=0n-1 E[ni2]

// For uniform distribution: E[n_i] = 1, Var[n_i] = (n-1)/n^2
// E[n_i^2] = Var[n_i] + (E[n_i])^2 = (n-1)/n^2 + 1 = 2 - 1/n

E[T(n)] = O(n) + n · (2 - 1/n) = O(n) + O(n) = O(n)

Expected time: O(n) when input is uniformly distributed. Worst case: O(n2) when all elements land in the same bucket.

Bucket Sort Visualization

Watch elements get distributed into buckets, sorted within each bucket, and concatenated. Toggle between uniform and skewed distributions to see how bucket sizes change.

Press Step or Run to begin.

Code

python
def bucket_sort(A):
    """Sort floats in [0, 1) using bucket sort."""
    n = len(A)
    buckets = [[] for _ in range(n)]

    # Distribute into buckets
    for x in A:
        idx = int(n * x)
        if idx == n:  # edge case: x = 1.0
            idx = n - 1
        buckets[idx].append(x)

    # Sort each bucket (insertion sort)
    for b in buckets:
        _insertion_sort(b)

    # Concatenate
    result = []
    for b in buckets:
        result.extend(b)
    return result

def _insertion_sort(A):
    """In-place insertion sort."""
    for i in range(1, len(A)):
        key = A[i]
        j = i - 1
        while j >= 0 and A[j] > key:
            A[j + 1] = A[j]
            j -= 1
        A[j + 1] = key
Check: Your data is 1 million floats, but they are all clustered between 0.50 and 0.51 (not uniformly distributed). How does bucket sort perform?

Chapter 5: The Sorting Arena

Time for the payoff. Below is a head-to-head comparison of every sorting algorithm we have studied: three comparison-based sorts (merge sort, quicksort, heapsort) and three linear-time sorts (counting sort, radix sort, bucket sort). You control the data — its size, its range, and its distribution. Watch the operation counts diverge as you find each algorithm's sweet spot and its weakness.

This is the showcase. Spend time here. Try each data type. Notice when linear-time sorts demolish comparison sorts — and when they do not. The numbers tell the story.
Sorting Arena: Head-to-Head

Configure the data, then click Race to run all algorithms on the same input and compare operation counts. Operations = comparisons (comparison sorts) or array reads+writes (linear sorts).

Size (n)
Data type
Configure, then click Race.

What to Try

ExperimentWhat happensWhy
Integers [0, n], n=1000Counting sort wins by a landslidek = n, so counting sort is O(n). Comparison sorts do ~10,000 ops
Integers [0, n³], n=100Radix sort wins, counting sort chokesk = 106 is too large for counting sort. Radix breaks the number into manageable digits
Uniform floats, n=500Bucket sort winsUniform distribution means ~1 element per bucket: O(n)
Skewed floats, n=500Bucket sort collapses, merge sort winsMost elements land in a few buckets: O(n2) degradation
Nearly sorted, n=1000Insertion sort (inside bucket sort) is fast, but merge sort is consistentAdaptive algorithms exploit existing order

The lesson: there is no universally best sorting algorithm. The right choice depends on your data. Knowing these five factors — range, distribution, data type, stability needs, and memory budget — is what separates a junior programmer from an engineer.

Chapter 6: When to Use Which

A sort algorithm is a tool. Like any tool, the question is not "which is best?" but "which is best for this job?" Here is the decision framework.

Decision Tree

Algorithm Selection Guide

Answer the questions to find the best sort for your situation. Click a path to follow it.

The Complete Comparison

AlgorithmBestAverageWorstSpaceStable?When to use
Merge sortO(n log n)O(n log n)O(n log n)O(n)YesGeneral purpose, guaranteed worst case, need stability
QuicksortO(n log n)O(n log n)O(n2)O(log n)NoGeneral purpose, best average-case constants, in-place preferred
HeapsortO(n log n)O(n log n)O(n log n)O(1)NoStrict O(1) extra space needed, guaranteed worst case
Counting sortO(n+k)O(n+k)O(n+k)O(n+k)YesSmall integer range k = O(n)
Radix sortO(d(n+k))O(d(n+k))O(d(n+k))O(n+k)YesFixed-width integers/strings, many digits, k per digit is small
Bucket sortO(n)O(n)O(n2)O(n)Yes*Uniformly distributed floats in known range

*Bucket sort is stable if the sub-sort is stable (insertion sort is stable).

Real-World Usage

ScenarioBest algorithmWhy
Sort ages of all US residents (0-120)Counting sortk = 120 « n = 330M. O(n) trivially
Sort 10M 32-bit IP addressesRadix sort (base 256)4 passes of counting sort on 256 buckets. Cache-friendly
Sort 1M uniformly random floatsBucket sortUniform distribution, expected O(n)
Sort database rows by nameMerge sortStrings need comparison, need stability for secondary keys
Sort 1000 sensor readings (arbitrary floats)Quicksort (introsort)Small n, in-place, best constants. std::sort in C++
Sort-then-unique on log timestampsMerge sort or TimsortNearly sorted input, stability helps, Timsort exploits runs
The real world uses hybrids. Python's sorted() uses Timsort (merge sort + insertion sort, exploits natural runs). C++'s std::sort uses introsort (quicksort + heapsort fallback). Java's Arrays.sort uses dual-pivot quicksort for primitives, Timsort for objects. These hybrids combine the best properties of multiple algorithms.
Check: You need to sort 50 million 64-bit integers. Memory is plentiful, and you need the result to be stable. Which algorithm?

Chapter 7: Stability

Two students both scored 85 on the midterm. In the original roster, Alice is listed before Bob. After sorting by score, a stable sort keeps Alice before Bob. An unstable sort might swap them.

This sounds like a minor detail. It is not. Stability is the reason radix sort works. It is the reason database sorts compose. It is a property that interviewers ask about because it reveals whether you understand sorting at a deep level.

Formal Definition

A sorting algorithm is stable if, whenever two elements have equal keys, they appear in the output in the same relative order as in the input. Formally: if A[i] = A[j] and i < j, then in the sorted output, A[i] appears before A[j].

Why Stability Matters

1. Multi-key sorting. You want to sort students by grade, then by name within each grade. If you sort by name first, then sort by grade with a stable sort, students with the same grade remain in alphabetical order. Two stable sorts compose into a multi-key sort. This is exactly how radix sort works: sort by LSD, then next digit, etc.

2. Preserving meaningful order. Database rows often have a "natural" order (insertion time, row ID). A stable sort preserves this order among ties, which users expect.

3. Correctness of radix sort. If the digit-level sort is unstable, radix sort breaks. The LSD-first strategy relies on stability to preserve the order established by previous digit passes.

Radix sort's dependency on stability. Consider sorting [21, 12, 22, 11] with radix sort (base 10). Pass 1 (ones digit): [21, 11, 12, 22]. Now sort by tens digit. 21 and 22 both have tens digit 2. A stable sort puts 21 before 22 (their order from pass 1). An unstable sort might produce [..., 22, 21, ...] — WRONG. Stability is not a nice-to-have. It is a correctness requirement for radix sort.

Stability Classification

StableUnstable
Counting sortQuicksort
Merge sortHeapsort
Insertion sortSelection sort
Bubble sortShellsort
Radix sort (uses stable sub-sort)Introsort (std::sort)
Timsort (Python sorted())Tree sort (BST-based)
Stability Visualizer

Elements with the same key have different colors. A stable sort preserves the left-to-right color order among equal keys. An unstable sort may scramble them. Compare the two side by side.

Click Run Both to sort with merge sort (stable) and quicksort (unstable).

Making an Unstable Sort Stable

Any sort can be made stable by appending the original index as a tiebreaker. If a[i] = a[j] and i < j, compare as (a[i], i) < (a[j], j). This always breaks ties in favor of the earlier element. The cost: O(n) extra space for the index array, and slightly larger comparisons.

python
def stable_sort(A):
    """Make any sort stable by using (value, index) pairs."""
    decorated = [(val, i) for i, val in enumerate(A)]
    decorated.sort()  # Python's sort compares tuples lexicographically
    return [val for val, _ in decorated]
Check: You sort a list of (name, score) pairs by score using heapsort. Then you sort the result by name using heapsort again. Are students with the same name in order by score?

Chapter 8: Interview Arsenal

This is your reference chapter. Cheat sheets, coding drills, and the patterns that interviewers love to test.

The Master Cheat Sheet

AlgorithmTime (avg)Time (worst)SpaceStableIn-placeComparison?
InsertionO(n2)O(n2)O(1)YesYesYes
MergeO(n log n)O(n log n)O(n)YesNoYes
QuickO(n log n)O(n2)O(log n)NoYesYes
HeapO(n log n)O(n log n)O(1)NoYesYes
CountingO(n+k)O(n+k)O(n+k)YesNoNo
RadixO(d(n+k))O(d(n+k))O(n+k)YesNoNo
BucketO(n)O(n2)O(n)Yes*NoHybrid

Coding Drill 1: Counting Sort

Implement counting sort from memory. This is a common interview question because it tests whether you understand prefix sums and stability.

python
# Drill: implement from scratch in 2 minutes
def counting_sort(A, k):
    C = [0] * (k + 1)
    B = [0] * len(A)

    for x in A:             # count
        C[x] += 1

    for i in range(1, k+1): # prefix sum
        C[i] += C[i-1]

    for i in range(len(A)-1, -1, -1):  # place (R-to-L)
        C[A[i]] -= 1
        B[C[A[i]]] = A[i]

    return B

Coding Drill 2: Sort by Frequency

Given an array, sort elements by frequency (most frequent first). If two elements have the same frequency, maintain their order of first appearance. This combines counting and stability.

python
from collections import Counter

def sort_by_frequency(A):
    """Sort by frequency (descending). Stable for ties."""
    freq = Counter(A)
    # Decorate-sort-undecorate
    # Key: (-frequency, first_occurrence_index)
    first_seen = {}
    for i, x in enumerate(A):
        if x not in first_seen:
            first_seen[x] = i

    return sorted(A, key=lambda x: (-freq[x], first_seen[x]))

# Example: [1,1,2,2,2,3] -> [2,2,2,1,1,3]

Coding Drill 3: Top K Frequent Elements

Given n elements, find the k most frequent. Bucket sort approach: O(n) time.

python
from collections import Counter

def top_k_frequent(A, k):
    """O(n) using bucket sort by frequency."""
    freq = Counter(A)
    n = len(A)

    # Buckets: index = frequency, value = list of elements
    buckets = [[] for _ in range(n + 1)]
    for val, count in freq.items():
        buckets[count].append(val)

    # Collect from highest frequency down
    result = []
    for i in range(n, 0, -1):
        for val in buckets[i]:
            result.append(val)
            if len(result) == k:
                return result
    return result

# Example: top_k_frequent([1,1,1,2,2,3], 2) -> [1, 2]

Coding Drill 4: Radix Sort for Strings

Radix sort is not just for integers. You can sort fixed-length strings by applying counting sort to each character position, from right to left (LSD).

python
def radix_sort_strings(A, strlen):
    """LSD radix sort for fixed-length ASCII strings."""
    R = 256  # alphabet size (ASCII)
    for d in range(strlen - 1, -1, -1):
        # Counting sort by character at position d
        C = [0] * R
        B = [""] * len(A)

        for s in A:
            C[ord(s[d])] += 1
        for i in range(1, R):
            C[i] += C[i-1]
        for i in range(len(A)-1, -1, -1):
            ch = ord(A[i][d])
            C[ch] -= 1
            B[C[ch]] = A[i]
        A = B

    return A

Common Interview Questions

QuestionKey ideaAlgorithm
Sort an array where values are in [0, k]Small range: count occurrencesCounting sort O(n+k)
Sort 1M 32-bit integersBreak into 4 bytes, sort eachRadix sort O(n)
Find k most frequent elementsCount then bucket by frequencyCounting + bucket O(n)
Sort colors (Dutch national flag)3-way partitionO(n) one-pass partitioning
Can we sort faster than n log n?Only with non-comparison methodsDecision tree lower bound
Sort linked listMerge sort (no random access needed)Merge sort O(n log n)
Sort nearly-sorted arrayInsertion sort is O(nk) when k swaps awayInsertion sort or Timsort
Interview drill: You are given an array of n strings, each of length L, over a fixed alphabet of size R. What is the fastest possible sort, and what is its time complexity?

Chapter 9: Connections

Linear-time sorting does not exist in a vacuum. Every algorithm in this chapter builds on ideas from earlier chapters and opens doors to later ones.

Where It Connects

TopicConnection
CLRS Ch 2 (Insertion Sort, Merge Sort)Insertion sort is the sub-routine inside bucket sort. Merge sort is the gold standard comparison sort that counting/radix/bucket aim to beat
CLRS Ch 6 (Heapsort)O(n log n) worst-case, O(1) space, but unstable. Cannot be used as radix sort's sub-routine
CLRS Ch 7 (Quicksort)O(n log n) average, best constants in practice, but O(n2) worst case and unstable. The partition step is used in many linear-time selection and sorting problems
CLRS Ch 9 (Order Statistics)Finding the k-th smallest element in O(n). Uses the partition idea from quicksort. Does NOT need to fully sort
Hash tablesCounting sort's count array is essentially a hash table for integers. The prefix sum step is a cumulative distribution function
Parallel algorithmsRadix sort parallelizes beautifully: each digit pass is independent across elements. GPU sort implementations (CUB, Thrust) use radix sort as the backbone

The Hierarchy of Sorting Knowledge

Level 1: Know the algorithms
Can implement merge sort, quicksort, counting sort from memory. Knows time/space complexities.
Level 2: Know when to use which
Can analyze data properties (range, distribution, size) and choose the optimal algorithm. Understands stability.
Level 3: Know the lower bounds
Can prove the Ω(n log n) lower bound. Understands what assumptions enable linear-time sorting. Knows the information-theoretic argument.
Level 4: Know the systems
Understands Timsort, introsort, parallel radix sort. Knows how real libraries choose algorithms. Can design a sorting pipeline for production data.

This lesson brought you through levels 1-3. Level 4 comes with experience — writing production code that sorts billions of records, profiling cache misses in radix sort, and understanding why NVIDIA's CUB library uses a 4-bit radix.

What comes next? Chapter 9 (CLRS) covers order statistics: finding the k-th smallest element without fully sorting. The key idea is that selection is fundamentally easier than sorting — O(n) worst case, provably. It reuses the partition step from quicksort and the adversary argument from this chapter's lower bound proof.

"The purpose of computing is insight, not numbers." — Richard Hamming

Final check: What is the single most important idea from this chapter?