Introduction to Algorithms (CLRS) — Chapter 11

Hash Tables

Hashing, chaining, open addressing, perfect hashing — O(1) lookup for everything.

Prerequisites: Arrays + Modular arithmetic. That's it.
10
Chapters
9+
Simulations
5
Interview Dimensions

Chapter 0: The Problem

You are building a spell checker. Your dictionary contains 100,000 words. A user types a word and you need to answer one question: is this word in the dictionary? How fast can you answer?

The simplest approach: store all 100K words in an unsorted array. To check a word, scan every element. That is linear search — O(n) time. For 100K words, that means up to 100,000 comparisons per lookup. If the user types 50 words per minute, your spell checker makes 5 million comparisons per minute. For a single lookup this is already slow. For a database serving millions of queries per second, it is catastrophic.

Can we do better? Sort the array and use binary search: O(log n). For 100K words, that is about 17 comparisons per lookup. Dramatically better. But we can do better still.

What if we could answer in O(1) — constant time, regardless of how many words are in the dictionary? Not 17 comparisons, not 100,000. Just one. That is the promise of a hash table: a data structure that maps any key to an array index using a hash function, then looks up the value at that index directly.

The core idea. A hash table is an array where the index is computed from the key itself. Instead of searching for where a key lives, you calculate where it lives. The function that does this calculation is called a hash function. If two different keys produce the same index (a collision), we need a strategy to handle it. That is the entire subject of this lesson.

The simulation below lets you see the difference. Click each lookup strategy and watch how many steps it takes to find (or not find) a word in a dictionary of varying size.

Dictionary Lookup: Three Strategies

Click "Lookup" to search for a key. Watch the number of comparisons for each strategy.

Dictionary size 50

Notice the pattern: as dictionary size grows, linear search becomes painfully slow, binary search grows gently (logarithmically), and hash table lookup stays flat — just 1 or 2 steps regardless of size. That constant-time lookup is why hash tables are the most widely used data structure in practice.

Let us put some numbers on this. A modern CPU can compare two 64-bit integers in about 1 nanosecond. For a dictionary of n = 1,000,000 keys:

MethodComparisonsTime (approx)
Linear scan1,000,000 (worst case)1 millisecond
Binary search20 (log₂ 10⁶)20 nanoseconds
Hash table1-2 (expected)1-2 nanoseconds

Binary search is already fast — 20 ns is plenty for most applications. But hash tables are 10x faster still. When your database serves 100,000 queries per second, that 10x matters. When your compiler resolves millions of symbol lookups during compilation, it matters even more.

Hash tables power Python dictionaries, JavaScript objects, database indexes, caches, compilers (symbol tables), routers (IP lookup), and virtually every application that needs fast key-value access. Understanding how they work — and when they break — is fundamental.

Here is the roadmap for this lesson. We will build hash tables from the ground up, starting with the simplest possible version (direct addressing), then adding the hash function (Chapter 2), handling collisions with chaining (Chapter 3) and open addressing (Chapter 4), analyzing performance mathematically (Chapter 5), achieving worst-case O(1) with perfect hashing (Chapter 6), surveying real-world implementations (Chapter 7), and finally practicing design and debugging patterns (Chapters 8-9).

Why Collisions Are Inevitable

By the pigeonhole principle, if you have more possible keys than table slots (|U| > m), at least two keys must map to the same slot. But collisions happen much sooner than you might expect. The birthday paradox tells us: in a room of just 23 people, there is a >50% chance two share a birthday (out of 365 days). Similarly, with a hash table of m slots, after inserting only about √(2m) keys, you have a >50% chance of a collision. For m = 1000 slots, that is just ~45 keys — not 1000.

This is not a flaw of hash tables. It is a mathematical certainty. The entire field of hash table design is about gracefully handling collisions, not avoiding them.

The birthday paradox in practice. The birthday paradox explains why UUIDs can collide sooner than you think, why hash collisions in cryptography are easier to find than preimages, and why your hash table with 1000 slots gets its first collision after only ~40 insertions. The math: after inserting k keys into m slots, the probability of at least one collision is approximately 1 - e-k(k-1)/(2m). Setting this to 0.5 and solving: k ≈ √(2m · ln2) ≈ 1.18√m. For m = 365 (birthdays): k ≈ 23. For m = 1000: k ≈ 37. For m = 264 (64-bit hashes): k ≈ 5 × 109 — about 5 billion items before a 50% collision chance.
Concept check: You have a sorted array of 1,000,000 elements. Binary search takes at most how many comparisons? And a hash table?

Chapter 1: Direct-Address Tables

Before we build a hash table, let us start with the simplest possible version. Suppose your keys are integers in the range [0, m-1]. Then you can just use an array of size m and store the value for key k at position T[k]. Done.

This is a direct-address table. Insert, delete, and search are all O(1) — you just compute the index and access it directly. No hash function needed, no collisions possible. It is the dream data structure... with one fatal flaw.

The Space Problem

What if your keys are Social Security numbers (9 digits, range 000000000 to 999999999)? You need an array of 1 billion slots, even if you only store 1,000 people. That is 109 slots for 103 keys — a 99.9999% waste of memory. At 8 bytes per slot, that is 8 GB for a table holding 1,000 entries.

What if your keys are strings? The universe of possible strings is effectively infinite. A 10-character ASCII string has 12810 ≈ 1021 possibilities. You cannot allocate an array of 1021 slots. Even if your keys are short English words (say, up to 20 characters), the universe is impossibly large for direct addressing.

What if your keys are IPv4 addresses? The range is 0 to 232-1, which is about 4 billion. A direct-address table would need 4 billion slots. That is feasible (16 GB at 4 bytes per slot), and in practice some routers do use direct addressing for IP lookups when memory is available. But for IPv6 addresses (128 bits), the universe has 2128 ≈ 3.4 × 1038 possible keys. Direct addressing is impossible.

The fundamental trade-off. Direct addressing gives O(1) everything, but requires space proportional to the universe of possible keys, not the number of keys actually stored. When the universe is much larger than the number of stored keys (which is almost always), we need a way to compress the universe down to a manageable array size. That compression is what a hash function does.

The simulation below shows a direct-address table. Insert a few keys and notice how most slots remain empty. This waste is exactly what hash tables fix.

Direct-Address Table

Enter a key (0-15) and a value, then click Insert. Notice the empty slots.

Operations on Direct-Address Tables

python
class DirectAddressTable:
    def __init__(self, m):
        """Create table with slots 0..m-1, all initially None."""
        self.table = [None] * m   # O(m) space

    def insert(self, key, value):
        self.table[key] = value     # O(1)

    def search(self, key):
        return self.table[key]     # O(1) — returns None if absent

    def delete(self, key):
        self.table[key] = None    # O(1)

Every operation is a single array access. No comparisons, no loops, no searching. The index IS the key. The cost? Space is O(m) where m is the universe size, not the number of stored keys n. When m >> n (which is the common case for real applications), this is unacceptable.

Think about the trade-off as a spectrum. On one end: direct addressing uses space O(|U|) but achieves O(1) time with no collisions. On the other end: a single linked list uses space O(n) but requires O(n) time. Hash tables sit in the sweet spot: space O(n) (or slightly more) and O(1) expected time, at the cost of potential collisions.

OperationTimeSpace
InsertO(1)O(m) where m = universe size
SearchO(1)
DeleteO(1)
When direct addressing works. Direct-address tables are not useless — they are perfect when the universe is small and dense. Counting sort uses a direct-address table. Bitmap sets for small integer domains use direct addressing. If your keys are ASCII characters (0-127), a 128-element array is the best data structure. The key question is always: is m (universe size) close to n (number of stored keys)?

The Bridge to Hash Tables

Direct addressing works when the universe is small. Hash tables work when it is not. The key insight: even if the universe is huge (all possible strings, all possible 64-bit integers), the number of keys you actually store is typically much smaller. A hash function compresses the universe [0, |U|-1] down to [0, m-1] where m is a manageable size proportional to the number of stored keys n.

This compression comes at a cost: two different keys might map to the same slot. That is a collision, and dealing with collisions is the central challenge of hash table design. The next eight chapters are about choosing hash functions that minimize collisions and resolving the collisions that inevitably occur.

Direct Addressing
Universe size m = key range. Array of size m. O(1) everything. Space: O(m). Works when m is small.
↓ add hash function
Hash Table
Table size m « |U|. Hash function h: U → [0,m-1]. O(1) expected. Space: O(n+m). Works for any universe.
Concept check: You need to track which of the 26 lowercase letters appear in a string. Which data structure is most appropriate?

Chapter 2: Hash Functions

A hash table uses an array of size m (much smaller than the universe U) and a hash function h that maps each key k in U to a slot h(k) in [0, m-1]. The value for key k is stored at position T[h(k)].

The problem is obvious: since |U| > m, multiple keys must map to the same slot. When h(k₁) = h(k₂) for k₁ ≠ k₂, we have a collision. A good hash function minimizes collisions by distributing keys as uniformly as possible across all m slots.

The Division Method

The simplest hash function: divide the key by m and take the remainder.

h(k) = k mod m

If keys are strings, first convert to an integer. A common approach: treat each character as a digit in a base-128 number. For the string "cat": c=99, a=97, t=116, so the numeric key is 99×128² + 97×128 + 116 = 1,634,420.

Choosing m wisely. The division method is sensitive to the choice of m. If m is a power of 2 (say 2k), then h(k) = k mod 2k simply extracts the lowest k bits of k. If your keys have a pattern in their lower bits (common with memory addresses, which are always multiples of 4 or 8), you get terrible clustering. Choose m as a prime number not close to a power of 2. For example, if you need ~1000 slots, use m = 997 (prime) rather than m = 1024 (210).

The Multiplication Method

Multiply the key by a constant A (0 < A < 1), extract the fractional part, then scale to [0, m-1].

h(k) = ⌊m · (k · A mod 1)⌋

Here "k · A mod 1" means "take the fractional part of k·A." Let us trace through an example to make this concrete:

// Key k = 123456, table size m = 1024, A = 0.6180339887
k × A = 123456 × 0.6180339887 = 76300.0041...
k × A mod 1 = 0.0041... // fractional part only
m × 0.0041... = 4.2...
⌊4.2⌋ = 4 // h(123456) = 4

The beauty: the value of m does not matter much — you can use any m, even a power of 2. Knuth suggests A ≈ (√5 - 1)/2 ≈ 0.6180339887, the golden ratio conjugate, which distributes keys particularly well.

Why the golden ratio? The three-distance theorem says that the fractional parts {k · A} for k = 1, 2, 3, ... partition [0,1] into at most three distinct gap sizes. For A = the golden ratio, these gaps are as equal as possible — giving the most uniform distribution. This is why the golden ratio appears everywhere in quasi-random sequences, low-discrepancy sampling, and hash functions.

Bit-Shifting Implementation

In practice, the multiplication method is implemented using integer arithmetic for speed. Choose w = word size (32 or 64 bits), and let s = ⌊A · 2w⌋. Then:

h(k) = (k · s) >> (w - p)     where m = 2p

No floating-point needed. The multiply-and-shift extracts the middle bits of the product, which are the most "mixed" bits — they depend on all bits of both k and s. This is why Knuth calls it a "multiply-shift" hash.

Universal Hashing

Both the division and multiplication methods are deterministic. An adversary who knows which hash function you use can craft keys that all collide, degrading your O(1) table to O(n). This is not theoretical — hash collision attacks have been used against web servers to cause denial-of-service.

Universal hashing defends against this by choosing the hash function randomly at runtime from a family of functions. No adversary can choose bad keys without knowing which function was selected.

A universal family for integer keys: pick a prime p ≥ |U|, then choose random a ∈ {1, ..., p-1} and b ∈ {0, ..., p-1}.

ha,b(k) = ((a · k + b) mod p) mod m

For any two distinct keys k₁ ≠ k₂, the probability of collision is at most 1/m — exactly the same as if h were truly random. This is the definition of a universal hash family.

Let us verify this with a concrete example. Let p = 17, m = 5. Choose a = 3, b = 7. Then:

h3,7(k) = ((3k + 7) mod 17) mod 5

h(0) = (7 mod 17) mod 5 = 7 mod 5 = 2
h(1) = (10 mod 17) mod 5 = 10 mod 5 = 0
h(2) = (13 mod 17) mod 5 = 13 mod 5 = 3
h(3) = (16 mod 17) mod 5 = 16 mod 5 = 1
h(4) = (19 mod 17) mod 5 = 2 mod 5 = 2
h(5) = (22 mod 17) mod 5 = 5 mod 5 = 0

Keys 0 and 4 collide (both map to slot 2), and keys 1 and 5 collide (both map to slot 0). But for a different random choice of a and b, the collisions would be different. That is the power of universality: no fixed set of keys can consistently cause collisions across all hash functions in the family.

Hash Function Explorer

Enter numeric keys and see where they land with different hash functions. Watch for collisions (red).

Table size (m) 11
python
import random

def hash_division(k, m):
    return k % m

def hash_multiplication(k, m, A=0.6180339887):
    return int(m * ((k * A) % 1))

class UniversalHash:
    def __init__(self, m, p=104729):
        """Pick random a, b at creation time."""
        self.m = m
        self.p = p  # prime ≥ universe size
        self.a = random.randint(1, p - 1)
        self.b = random.randint(0, p - 1)

    def __call__(self, k):
        return ((self.a * k + self.b) % self.p) % self.m
Hash functions for strings in practice. Real hash functions for strings do not convert the entire string to a huge integer first. They process one character at a time using a rolling formula like: h = h · 31 + char. Java's String.hashCode() uses exactly this with multiplier 31. Python's built-in hash() uses SipHash (a cryptographic function resistant to collision attacks). The principle is the same: spread keys uniformly across the table.

Worked Example: Hashing Three Keys

Let us hash the keys 42, 130, and 77 with table size m = 11 using all three methods.

Division method (h(k) = k mod m):

h(42) = 42 mod 11 = 9
h(130) = 130 mod 11 = 9 // Collision with 42!
h(77) = 77 mod 11 = 0

Multiplication method (h(k) = ⌊m · (k · 0.618 mod 1)⌋):

h(42) = ⌊11 · (42 × 0.618 mod 1)⌋ = ⌊11 × 0.956⌋ = 10
h(130) = ⌊11 · (130 × 0.618 mod 1)⌋ = ⌊11 × 0.34⌋ = 3
h(77) = ⌊11 · (77 × 0.618 mod 1)⌋ = ⌊11 × 0.586⌋ = 6

Notice: the multiplication method spread all three keys to different slots, while the division method caused a collision between 42 and 130. This does not mean multiplication is always better — it depends on the specific key distribution. But it illustrates why having multiple hash function strategies matters.

The Simple Uniform Hashing Assumption

Most theoretical analysis of hash tables assumes simple uniform hashing (SUH): each key is equally likely to hash to any of the m slots, independently of where other keys hash. This is an idealization — no deterministic hash function can achieve this because the distribution depends on the actual key set. But universal hashing comes close: for any fixed set of keys, a randomly chosen universal hash function distributes them nearly uniformly in expectation.

The SUH assumption lets us derive clean expected-case bounds. When your hash function is good (close to uniform), these bounds are accurate in practice. When your hash function is bad (clustering, many collisions), performance degrades. The gap between theory and practice is entirely about hash function quality.

How to Hash Composite Keys

What if your key is not a simple integer but a struct with multiple fields? For example, a (first_name, last_name) pair. The standard approach: combine the hashes of individual fields using a mixing function.

python
# Bad: XOR loses information (hash("AB","CD") == hash("CD","AB"))
def bad_hash(a, b):
    return hash(a) ^ hash(b)  # symmetric — order doesn't matter!

# Good: multiply-and-add preserves order
def good_hash(a, b):
    return hash(a) * 31 + hash(b)  # asymmetric — ("AB","CD") != ("CD","AB")

# Best: use Python's built-in tuple hashing
def best_hash(a, b):
    return hash((a, b))  # Python handles it

The "multiply by a prime and add" pattern is used by Java (31), Boost (multiply by golden ratio and XOR), and many others. The key requirement: the combining function must be order-sensitive (hash(a,b) ≠ hash(b,a)) and avalanching (a small change in any field should change many bits of the output).

Concept check: You are designing a hash table for a web server. An attacker can send arbitrary request keys. Which hash function strategy should you use?

Chapter 3: Collision Resolution — Chaining

Two keys hash to the same slot. Now what? The simplest strategy: store all keys that hash to slot j in a linked list hanging off that slot. This is called chaining (or separate chaining).

How Chaining Works

Insert(k, v): Compute h(k) — this takes O(1). Prepend the (k, v) pair to the head of the linked list at slot h(k). Time: O(1). We prepend rather than append because prepending to a singly-linked list is O(1) (just update the head pointer), while appending is O(nj) (walk to the end).

Search(k): Compute h(k). Walk the linked list at slot h(k), comparing each key to k. If found, return the value. If you reach the end of the list without finding k, the key is not in the table. Time: O(length of the chain at slot h(k)).

Delete(k): Search for k in its chain, then remove the node. With a doubly-linked list, once you have a pointer to the node, removal is O(1) (update the predecessor's next pointer and the successor's previous pointer). Total: O(search time) = O(length of chain). With a singly-linked list, you also need the predecessor, so you track it during the search.

Why prepend, not append? Prepending is O(1) because you just set new_node.next = head and head = new_node. Appending requires traversing to the end of the list, which takes O(nj). Some implementations check for duplicates before inserting (to support update semantics), which requires traversing the chain anyway — in that case, appending costs the same as prepending. But if you know there are no duplicates (or do not care), prepending is always faster.

The Load Factor

Define the load factor as α = n/m, where n is the number of stored keys and m is the table size. The load factor is the average chain length. If you have 100 keys in a table with 50 slots, α = 2, meaning each chain has 2 elements on average.

The load factor is everything. Under the assumption of simple uniform hashing (every key is equally likely to hash to any slot, independent of other keys), the expected time for an unsuccessful search is Θ(1 + α), and the expected time for a successful search is Θ(1 + α/2). Both are O(1) when α = O(1) — that is, when you keep the table size proportional to the number of keys. This is why hash tables resize when the load factor exceeds a threshold (typically 0.75).
Hash Table with Chaining

Insert keys and watch chains grow. Adjust table size to see how load factor affects performance.

Table size (m) 7

Worked Example

Insert keys 5, 28, 19, 15, 20, 33, 12, 17, 10, 0 into a hash table with m = 7 using h(k) = k mod 7.

Key kh(k) = k mod 7Slot
55 mod 7 = 55
2828 mod 7 = 00
1919 mod 7 = 55 (collision with 5)
1515 mod 7 = 11
2020 mod 7 = 66
3333 mod 7 = 55 (collision with 5, 19)
1212 mod 7 = 55 (collision with 5, 19, 33)
1717 mod 7 = 33
1010 mod 7 = 33 (collision with 17)
00 mod 7 = 00 (collision with 28)

Result: slot 0 has chain [0, 28], slot 1 has [15], slot 2 is empty, slot 3 has [10, 17], slot 4 is empty, slot 5 has [12, 33, 19, 5], slot 6 has [20]. Load factor: α = 10/7 ≈ 1.43. The longest chain has 4 elements (slot 5). An unsuccessful search in slot 5 requires 4 comparisons. This is why keeping α ≤ 1 matters.

Proving the Expected Search Time

Why is the expected unsuccessful search time Θ(1 + α)? Here is the argument, step by step.

Under simple uniform hashing, each of the n keys is equally likely to hash to any of the m slots. The expected number of keys in any given slot j is n/m = α. An unsuccessful search for key k first computes h(k) in O(1), then traverses the entire chain at slot h(k). The expected chain length is α. Total: 1 + α.

For a successful search, the analysis is slightly different. We search for a key that is in the table. The expected probes depend on where the key sits in the chain. On average, the key is at position (chain length)/2 from the head. The expected chain length is 1 + (n-1)/m ≈ 1 + α. So the expected successful search examines about α/2 keys past the first, giving Θ(1 + α/2).

The O(1) guarantee. Both 1 + α and 1 + α/2 are O(1) when α = O(1). This means: as long as you keep the table size m proportional to n (the number of stored keys), all operations take expected constant time. When n grows, resize the table to keep α bounded. The cost of resizing is O(n) (rehash all keys), but it happens only every O(n) insertions, so the amortized cost per insertion is still O(1).

Resizing: Amortized Analysis

When α exceeds a threshold (typically 0.75), we double the table size and rehash all n keys. This costs O(n). But how often does it happen?

If the table starts at size m = 1 and we double each time it fills, after inserting 2k elements the table has been resized k times. The total rehashing cost is 1 + 2 + 4 + ... + 2k = 2k+1 - 1 < 2n. So the total cost of n insertions is O(n) for the inserts themselves plus O(n) for all resizing combined — that is O(1) amortized per insertion.

python
class ChainingHashTable:
    def __init__(self, m=7):
        self.m = m
        self.table = [[] for _ in range(m)]
        self.n = 0

    def _hash(self, key):
        return key % self.m

    def insert(self, key, value):
        h = self._hash(key)
        # Check if key exists — update if so
        for i, (k, v) in enumerate(self.table[h]):
            if k == key:
                self.table[h][i] = (key, value)
                return
        self.table[h].append((key, value))
        self.n += 1

    def search(self, key):
        h = self._hash(key)
        for k, v in self.table[h]:
            if k == key:
                return v
        return None

    def delete(self, key):
        h = self._hash(key)
        self.table[h] = [(k, v) for k, v in self.table[h] if k != key]
        self.n -= 1

    def load_factor(self):
        return self.n / self.m
Concept check: A hash table with chaining has m = 10 slots and n = 30 keys. What is the expected number of comparisons for an unsuccessful search (under simple uniform hashing)?

Chapter 4: Open Addressing

Chaining uses linked lists, which means pointer overhead and cache-unfriendly memory access. Open addressing takes a different approach: store all elements directly in the table array. No linked lists. No pointers. When a collision occurs, you probe — try another slot, then another, until you find an empty one.

The probe sequence is the order in which slots are examined. Given a key k and a probe number i (starting from 0), the probe function h(k, i) returns the slot to try on the i-th probe.

Linear Probing

h(k, i) = (h'(k) + i) mod m

Try the home slot h'(k), then the next slot, then the next, wrapping around. Simple and cache-friendly (sequential memory access), but suffers from primary clustering: long runs of occupied slots tend to grow longer, because any key that hashes into the cluster must traverse the entire cluster to find an empty slot, extending it further.

Quadratic Probing

h(k, i) = (h'(k) + c₁i + c₂i²) mod m

Instead of stepping by 1 each time, step by increasing amounts (1, 3, 5, 7... with c₁=0, c₂=1, or similar). This breaks up primary clusters but introduces secondary clustering: keys with the same home slot follow the same probe sequence. Better than linear, but not ideal.

Double Hashing

h(k, i) = (h₁(k) + i · h₂(k)) mod m

Use a second hash function to determine the step size. Different keys with the same home slot will have different step sizes, so their probe sequences diverge immediately. This produces the best distribution among the three methods — close to the theoretical ideal of uniform probing.

The critical rule for h₂. The step size h₂(k) must never be zero (or you loop forever on the first slot). A common choice: h₂(k) = 1 + (k mod (m-1)), which always gives a positive step. Also, h₂(k) must be coprime with m to guarantee that the probe sequence visits every slot. The easiest way: make m prime.

Why Clustering Matters: A Hand Calculation

Consider m = 7, linear probing, and keys arriving in this order: 14, 21, 28, 35, 7, 0.

h'(14) = 14 mod 7 = 0 → slot 0 (empty, placed)
h'(21) = 21 mod 7 = 0 → slot 0 (full), probe slot 1 (empty, placed)
h'(28) = 28 mod 7 = 0 → slots 0,1 full, probe slot 2 (empty, placed)
h'(35) = 35 mod 7 = 0 → slots 0,1,2 full, probe slot 3 (placed) // 4 probes!
h'(7) = 7 mod 7 = 0 → slots 0,1,2,3 full, probe slot 4 (placed) // 5 probes!
h'(0) = 0 mod 7 = 0 → slots 0,1,2,3,4 full, probe slot 5 (placed) // 6 probes!

All six keys hash to slot 0, creating a cluster from slots 0 through 5. The last key needs 6 probes. Now any new key hashing to slots 0-5 must traverse the entire cluster. This is the worst case for linear probing — all keys collide. With a better hash function (one that does not map all these multiples of 7 to the same slot), this would not happen.

With double hashing and h₂(k) = 1 + (k mod 6):

h₂(14) = 1 + 14 mod 6 = 3. Probes: 0, 3, 6, 2, 5, 1 // visits all slots!
h₂(21) = 1 + 21 mod 6 = 4. Probes: 0, 4, 1, 5, 2, 6
h₂(28) = 1 + 28 mod 6 = 5. Probes: 0, 5, 3, 1, 6, 4

Different step sizes mean the probe sequences diverge immediately. Key 28 with step size 5 probes slot 0 (full), then slot 5 (empty) — just 2 probes instead of linear probing's 3. The savings compound as the table fills.

Open Addressing: Three Probing Strategies

Insert keys and watch the probe sequences. Notice how linear probing creates clusters while double hashing spreads keys out.

Worked Example: Linear Probing

Table size m = 11 (prime). Hash function h'(k) = k mod 11. Insert keys: 10, 22, 31, 4, 15, 28, 17, 88, 59.

Keyh'(k)ProbesFinal slot
1010slot 10 (empty)10
220slot 0 (empty)0
319slot 9 (empty)9
44slot 4 (empty)4
154slot 4 (occupied by 4), slot 5 (empty)5
286slot 6 (empty)6
176slot 6 (occupied), slot 7 (empty)7
880slot 0 (occupied), slot 1 (empty)1
594slot 4 (occ), 5 (occ), 6 (occ), 7 (occ), 8 (empty)8

Notice key 59: it hashes to slot 4 but needs 5 probes because of the cluster from slots 4 through 7. This is primary clustering in action. With double hashing, 59 would jump by h₂(59) = 1 + (59 mod 10) = 10 slots per step, landing at slot (4 + 10) mod 11 = 3 on the second probe — directly to an empty slot.

Deletion in Open Addressing

Deletion is tricky. You cannot simply set a slot to "empty" because it might break a probe chain. If key A was placed past slot j because j was occupied, and you later delete the occupant of slot j, a search for A will stop at the now-empty slot j and incorrectly report A as absent.

The solution: use a special DELETED sentinel (also called a tombstone). Search treats DELETED slots as occupied (keep probing). Insert treats them as empty (can place a new key there). This works but degrades performance over time as tombstones accumulate.

This is why chaining is often preferred for tables with frequent deletions. Chaining has no tombstone problem. Open addressing is best when the table is relatively static or deletion is rare — which is the common case for read-heavy workloads like caches and symbol tables.

Searching in Open Addressing

To search for key k, follow the same probe sequence used for insertion: compute h(k,0), h(k,1), h(k,2), ... until you either find k (success) or hit an empty slot (k is not in the table). If you encounter a DELETED tombstone, keep probing — the key might be beyond it.

python
class OpenAddressHashTable:
    DELETED = "__DELETED__"

    def __init__(self, m=11):
        self.m = m
        self.table = [None] * m
        self.n = 0

    def _probe(self, key, i):
        # Linear probing
        return (hash(key) + i) % self.m

    def insert(self, key, value):
        for i in range(self.m):
            slot = self._probe(key, i)
            if self.table[slot] is None or self.table[slot] == self.DELETED:
                self.table[slot] = (key, value)
                self.n += 1
                return
            if self.table[slot][0] == key:
                self.table[slot] = (key, value)  # update
                return
        raise Exception("Table full")

    def search(self, key):
        for i in range(self.m):
            slot = self._probe(key, i)
            if self.table[slot] is None:
                return None  # empty slot = not found
            if self.table[slot] != self.DELETED and self.table[slot][0] == key:
                return self.table[slot][1]
        return None

    def delete(self, key):
        for i in range(self.m):
            slot = self._probe(key, i)
            if self.table[slot] is None:
                return  # not found
            if self.table[slot] != self.DELETED and self.table[slot][0] == key:
                self.table[slot] = self.DELETED  # tombstone
                self.n -= 1
                return

Chaining vs Open Addressing: The Trade-Off

PropertyChainingOpen Addressing
Memory layoutPointer-chasing (bad for cache)Contiguous array (cache-friendly)
Space overheadExtra pointer per entryNo extra pointers
Load factorCan exceed 1.0 (chains grow)Must stay below 1.0
DeletionClean — just remove from listTombstones — accumulate, degrade perf
ClusteringNo clusteringPrimary/secondary clustering possible
Best forUnknown key count, frequent deletesKnown size, read-heavy, low α
The cache argument. On modern CPUs, a cache miss costs ~100 nanoseconds while a cache hit costs ~1 ns. Chaining follows pointers to heap-allocated nodes scattered in memory — almost guaranteed cache misses. Open addressing stores everything in one contiguous array. For small hash tables that fit in L1/L2 cache, open addressing can be 2-5x faster than chaining despite having the same asymptotic complexity. This is why most high-performance hash table implementations (Python dict, Rust HashMap, Google Swiss Table) use open addressing.
Concept check: In a hash table using open addressing with linear probing, you insert keys A, B, C that all hash to slot 5. You then delete B (slot 6). What happens when you search for C?

Chapter 5: Analysis of Hashing

We have seen that hash table performance depends on the load factor α = n/m. But how exactly? In this chapter we derive the expected probe counts and build an interactive graph that reveals the dramatic difference between probing strategies.

Theoretical Expected Probes

Under simple uniform hashing, the expected number of probes for different strategies is:

StrategyUnsuccessful SearchSuccessful Search
Chaining1 + α1 + α/2
Linear Probing½(1 + 1/(1-α)²)½(1 + 1/(1-α))
Double Hashing1/(1-α)-(1/α) ln(1-α)

Let us unpack what these formulas mean with concrete numbers:

αChaining (unsucc)Linear Probing (unsucc)Double Hashing (unsucc)
0.251.251.391.33
0.501.502.502.00
0.751.758.504.00
0.901.9050.5010.00
0.951.95200.5020.00

At 90% full, linear probing needs 50 probes on average for an unsuccessful search. At 95% full, it needs 200. Chaining at 95% needs just 1.95. This is why you must resize before the table gets too full.

The magic threshold: α = 0.75. Most real implementations resize when α exceeds 0.75. At this point, linear probing needs 8.5 probes (acceptable but getting expensive), and double hashing needs 4 (comfortable). After resizing (doubling m), α drops to ~0.375 and probe counts drop to near-ideal levels. Java's HashMap, Python's dict, and C++'s unordered_map all use thresholds between 0.5 and 0.75.
THE PROBE COUNT GRAPH

Drag the vertical line to see expected probe counts at any load factor. Watch how linear probing explodes near α = 1. The red dashed line marks the typical resize threshold of 0.75.

Load factor α 0.50

Deriving the Unsuccessful Search Bound for Double Hashing

Under the uniform hashing assumption (each probe sequence is equally likely to be any permutation of [0, m-1]), the probability that the first probe hits an occupied slot is n/m = α. Given that, the probability the second is also occupied is (n-1)/(m-1) ≤ α. The probability of needing at least i probes is at most αi-1.

The expected number of probes is:

E[probes] = ∑i=0 αi = 1/(1-α)

This is the geometric series. At α = 0.5, we get 2 probes. At α = 0.9, we get 10. At α = 0.99, we get 100. The function has a vertical asymptote at α = 1 (table full), which is why you must never let a hash table fill completely.

Deriving the Successful Search Bound

For a successful search, the expected probes depend on when the key was inserted. A key inserted when the load factor was α' needed 1/(1-α') probes on average. Averaging over all n keys (inserted at load factors 0/m, 1/m, ..., (n-1)/m):

E[probes, successful] = (1/n) ∑i=0n-1 1/(1 - i/m) = (m/n) ∑i=0n-1 1/(m-i)

This sum is a harmonic series fragment. Using the integral approximation ∑ 1/(m-i) ≈ ln(m/(m-n)) = ln(1/(1-α)), we get:

E[probes, successful] ≈ (1/α) ln(1/(1-α))

At α = 0.5: (1/0.5)·ln(2) ≈ 1.39 probes. At α = 0.9: (1/0.9)·ln(10) ≈ 2.56 probes. Even at high load, successful searches are much cheaper than unsuccessful ones because the key was inserted when the table was less full.

Why Linear Probing Is Worse

Linear probing does not satisfy the uniform hashing assumption because its probe sequences overlap. If keys A, B, C hash to consecutive slots 5, 6, 7, they form a cluster. Any future key hashing to slots 3, 4, 5, 6, or 7 must probe through the entire cluster. The cluster grows at its boundaries, and larger clusters grow faster — a positive feedback loop.

Knuth's analysis (1963!) showed that the expected probes for linear probing are:

Unsuccessful: ½(1 + 1/(1-α)²) // much worse than 1/(1-α)
Successful: ½(1 + 1/(1-α))

The unsuccessful search has (1-α)² in the denominator instead of (1-α). At α = 0.9, this gives 50.5 probes vs double hashing's 10. The squared term comes from the clustering effect: you need to find a gap in the cluster, and the expected gap size shrinks quadratically with load factor.

Amortized Cost of Resizing

When do you resize, and what does it cost? Most implementations use a doubling strategy: when α exceeds a threshold (typically 0.75), create a new table of size 2m and rehash all n elements.

The rehashing costs O(n) — you must recompute the hash and insert every element into the new, larger table. But this O(n) cost is spread over the O(n) insertions that filled the table to the threshold. Using the accounting method of amortized analysis:

// Charge each insert $3 instead of $1
// $1 pays for the insert itself
// $2 is "saved" for the eventual resize
// When the table doubles from m to 2m:
// m/2 elements inserted since last resize, each saved $2
// Bank balance = $m, rehashing m elements costs $m
// The bank exactly covers the resize cost!
Amortized insert cost = O(1)

This is the same argument used for dynamic arrays (ArrayList, Python list). Although individual operations can be expensive (O(n) for a resize), the expensive operations are rare enough that the average cost per operation remains O(1).

A visual way to think about it: plot the cumulative cost of n insertions. Without resizing, the line is straight (slope 1). With resizing, the line has occasional jumps (when a resize happens) but the slope is still O(1) — the jumps are bounded by the line 3n.

Shrinking the Table

Most implementations only grow, never shrink. If you insert 1 million keys and then delete 999,999, you still have a table of size ~1.3 million for a single key — massive memory waste. To fix this, some implementations halve the table when α drops below 0.25. But shrinking adds complexity: you must avoid thrashing (repeatedly growing and shrinking at the threshold boundary). The standard solution uses asymmetric thresholds: grow at α = 0.75, shrink at α = 0.25. The gap between 0.25 and 0.75 provides hysteresis that prevents thrashing.

Empirical Validation

Theory assumes uniform hashing. Does it hold in practice? The simulation below inserts N random keys and measures actual probe counts, overlaid on the theoretical curves.

Theory vs Practice

Click "Run Experiment" to insert random keys and measure actual probe counts. Circles are measured values; lines are theoretical predictions.

Concept check: Your hash table uses double hashing with α = 0.8. On average, how many probes does an unsuccessful search take? What if you resize to halve α?

Chapter 6: Perfect Hashing

Everything we have seen so far gives O(1) expected time. Can we guarantee O(1) worst-case? If we know all keys in advance (a static set), the answer is yes. The technique is called perfect hashing.

The Two-Level Scheme

The idea: use two levels of hashing. The first level is an ordinary hash table with chaining. If nj keys hash to slot j, the second level is a small hash table of size nj² with its own hash function — chosen to have zero collisions.

Wait — a table of size nj² for just nj keys? That sounds wasteful. But by the birthday paradox in reverse: if you hash nj keys into nj² slots using a random universal hash function, the expected number of collisions is less than 1/2. So by trying a few random hash functions, you can quickly find one with no collisions at all.

The space miracle. Even though each second-level table uses nj² slots, the total space across all second-level tables is Θ(n). Here is why: the sum of nj² over all j is at most 2n when the first-level hash function is chosen from a universal family. (This is because a universal hash function guarantees that the expected sum of nj² equals n + n(n-1)/m, which is Θ(n) when m = n.) The proof uses the fact that ∑nj² = n + 2 · (number of collisions at first level), and a universal hash function gives expected O(n/m) collisions per pair.

Construction Algorithm

1. Choose first-level hash
Pick h from a universal family. Hash all n keys. Count nj for each slot j. If ∑nj² > 4n, pick a new h. (Expected ≤ 2 tries.)
2. Build second-level tables
For each slot j with nj > 0: create a table of size mj = nj². Try random hash functions from a universal family until one produces zero collisions. Expected ≤ 2 tries per slot.
3. Done
Lookup: compute first-level hash to find the bucket, then second-level hash to find the slot. Two hash computations, zero collisions. O(1) worst-case.
Two-Level Perfect Hashing

Click "Build Perfect Hash" to construct a two-level hash table from the given keys. Notice: NO collisions at the second level.

PropertyChainingOpen AddressingPerfect Hashing
Search (worst)O(n)O(n)O(1)
Search (expected)O(1)O(1)O(1)
InsertO(1)O(1) amortizedO(n) rebuild
SpaceO(n + m)O(m)O(n)
Static or dynamic?DynamicDynamicStatic

Why nj² Slots Suffice

This is one of the most elegant arguments in CLRS. Consider hashing nj keys into a table of size mj = nj² using a universal hash function. The expected number of colliding pairs is:

E[collisions] = C(nj, 2) · (1/mj) = nj(nj-1)/2 · 1/nj² = (nj-1)/(2nj) < 1/2

Since the expected number of collisions is less than 1/2, by Markov's inequality, the probability of any collision is less than 1/2. So with probability at least 1/2, a random universal hash function gives zero collisions. On average, you need only 2 random tries to find a perfect function for each bucket.

The total space across all second-level tables is ∑ nj². Using the identity ∑ nj² = n + 2 · C (where C is the number of colliding pairs at the first level), and the fact that a universal hash gives expected C = n(n-1)/(2m), we get expected total space Θ(n) when m = Θ(n). So the whole structure uses linear space — no waste.

When to use perfect hashing. Perfect hashing is ideal for static sets that are queried frequently: programming language keyword tables (C has 44 keywords — perfect hash them), country code lookups, HTTP status codes, routing tables that change rarely, and read-only databases. The construction cost is O(n) expected, and every lookup after that is guaranteed O(1). The GNU `gperf` tool generates perfect hash functions for static key sets.
Concept check: In a two-level perfect hash, why is the second-level table size nj² instead of nj?

Chapter 7: Hash Tables in Practice

Real-world hash tables are more sophisticated than the textbook versions. Each major language and runtime has made different engineering trade-offs. Let us tour the landscape.

Python dict

Python's dict is one of the most studied hash table implementations in any language, and understanding its design choices is instructive. It uses open addressing with a custom probing strategy. The probe sequence mixes in higher-order bits of the hash to break up clustering:

j = ((5 * j) + 1 + perturb) mod 2k
perturb >>= 5 // perturb starts as the full hash, decays to 0

Table size is always a power of 2 (for fast modulo via bitmask: hash & (size-1) instead of hash % size, which avoids an expensive integer division). Load factor threshold: 2/3 (~0.667), which is lower than Java's 0.75 because open addressing is more sensitive to high load factors than chaining. On resize, the table doubles. Since Python 3.6, dict preserves insertion order using a compact layout: a dense array of (hash, key, value) entries plus a sparse index array of pointers. This reduced memory by 20-25%.

A key performance detail: Python caches the hash value of each key. Computing hash("hello") costs ~50 ns, but comparing two hashes (integers) costs ~1 ns. By storing the hash alongside each key, Python avoids recomputing it during probing and can quickly reject non-matching entries by comparing hashes before comparing keys. This is especially important for string keys, where full comparison is expensive.

Python 3.6 also introduced a compact representation. Instead of one large array of (hash, key, value) triples with empty slots, it uses two arrays: a dense "entries" array (no gaps) and a sparse "indices" array (small integers pointing into entries). This preserves insertion order as a free side effect — iterating the entries array gives items in insertion order.

python
# Conceptual layout of Python dict internals (simplified)
# Sparse index array (1 byte per slot when table small):
indices = [None, 0, None, 2, None, 1, None, None]

# Dense entries array (no gaps, preserves insertion order):
entries = [
    (hash("cat"), "cat", 3),    # entry 0
    (hash("dog"), "dog", 7),    # entry 1
    (hash("bird"), "bird", 1),  # entry 2
]
# To find "dog": compute hash, probe indices array, follow pointer to entries[1]

Java HashMap

Java's HashMap uses chaining, but with a twist: when a chain exceeds 8 elements, it converts from a linked list to a red-black tree (called "treeification"). This caps worst-case search at O(log n) instead of O(n). When the chain shrinks below 6, it converts back to a list. Load factor threshold: 0.75. Table size: power of 2.

Why the threshold of 8? Under random hashing with α = 0.75, the probability of any bucket having 8 or more elements follows a Poisson distribution with λ = 0.75. P(X ≥ 8) ≈ 0.000006 — roughly one in 166,000 buckets. So treeification almost never triggers in normal use. It is a safety net against collision attacks (or pathologically bad hash functions), not a common-case optimization. The choice of 8 balances the overhead of maintaining a tree (more memory per node, more complex code) against the protection it provides.

Java also uses a technique called tree-bin thinning: when a treeified bin shrinks below 6 elements (through deletions), it converts back to a linked list. The gap between 6 (un-treeify) and 8 (treeify) prevents thrashing at the boundary.

Robin Hood Hashing

Robin Hood hashing is a variant of open addressing with a beautiful invariant: when inserting, if the new key has a longer probe distance than the key currently in the slot, swap them. The new key takes the slot, and the displaced key continues probing. This "steals from the rich" (short probe distance) to "give to the poor" (long probe distance).

The result: the variance of probe distances drops dramatically. Instead of a few keys having very long probe chains while most have short ones, all keys have similar (moderate) probe distances. This makes the worst-case much better while keeping the average nearly the same.

Cuckoo Hashing

Cuckoo hashing (named after the cuckoo bird, which lays eggs in other birds' nests) uses two hash functions h₁ and h₂ and two tables T₁ and T₂. Each key is stored in either T₁[h₁(k)] or T₂[h₂(k)]. Lookup checks exactly 2 locations — worst-case O(1).

Insert: try T₁[h₁(k)]. If occupied, evict the occupant and place the new key there. The evicted key goes to its alternative location in T₂. If that is also occupied, evict again. This chain of evictions (the "cuckoo" process) terminates quickly with high probability. If it cycles (after a threshold number of evictions), rehash with new functions.

python
class CuckooHashTable:
    def __init__(self, size=11):
        self.size = size
        self.T1 = [None] * size
        self.T2 = [None] * size
        self.max_evictions = 20

    def h1(self, k): return k % self.size
    def h2(self, k): return (k // self.size) % self.size

    def lookup(self, key):
        if self.T1[self.h1(key)] == key: return True
        if self.T2[self.h2(key)] == key: return True
        return False  # exactly 2 lookups, always

    def insert(self, key):
        for _ in range(self.max_evictions):
            slot = self.h1(key)
            if self.T1[slot] is None:
                self.T1[slot] = key; return
            key, self.T1[slot] = self.T1[slot], key  # evict

            slot = self.h2(key)
            if self.T2[slot] is None:
                self.T2[slot] = key; return
            key, self.T2[slot] = self.T2[slot], key  # evict
        # Too many evictions — need to rehash with new functions
        raise Exception("Cycle detected — rehash needed")

Cuckoo hashing has a strict load factor limit: with two tables each of size m, you can store at most ~0.5m keys (load factor ~0.5) before cycles become frequent. Variants with 3 or 4 hash functions push this to α ≈ 0.9.

The beauty of cuckoo hashing for interviews: lookup is worst-case O(1), which no other dynamic hash table achieves. The cost is paid at insertion time, where the expected amortized cost is O(1) but worst-case can trigger a full rehash. For read-heavy workloads (which describes most real systems — typical read-to-write ratios are 100:1 or higher), this trade-off is excellent.

Bloom Filters

Not a hash table, but closely related: a Bloom filter answers "is x in the set?" using k hash functions and a bit array of m bits. On insert, set bits h₁(x), h₂(x), ..., hk(x) to 1. On query, check if all k bits are 1. False positives are possible (all bits happen to be set by other keys), but false negatives are impossible (if x was inserted, its bits are definitely set).

The false positive probability for n elements, m bits, and k hash functions is approximately:

P(false positive) ≈ (1 - e-kn/m)k

The optimal number of hash functions is k = (m/n) · ln(2). For example, with 10 bits per element (m/n = 10), the optimal k = 7 and the false positive rate is about 0.8%. That is: using just 10 bits per element (1.25 bytes), you can test set membership with 99.2% accuracy. A full hash set would need 8-16 bytes per element.

Bloom filters are used everywhere: databases check a Bloom filter before reading from disk (avoiding expensive I/O for keys that definitely are not stored), CDNs use them to avoid caching one-hit wonders, spell checkers use them as a fast first pass, and Bitcoin nodes use them to request only relevant transactions from peers.

Variants include counting Bloom filters (replace bits with counters to support deletion), cuckoo filters (support deletion with lower false positive rates), and quotient filters (cache-friendly alternative). For interview purposes, remember: Bloom filter = probabilistic set membership, no false negatives, tunable false positive rate via m/n ratio and k hash functions.

python
class BloomFilter:
    def __init__(self, size=1000, num_hashes=7):
        self.bits = [False] * size
        self.size = size
        self.k = num_hashes

    def _hashes(self, item):
        # Generate k hash values using double hashing
        h1 = hash(item) % self.size
        h2 = (hash(item) // self.size + 1) % self.size
        return [(h1 + i * h2) % self.size for i in range(self.k)]

    def add(self, item):
        for h in self._hashes(item):
            self.bits[h] = True

    def might_contain(self, item):
        return all(self.bits[h] for h in self._hashes(item))
Robin Hood Hashing

Insert keys and watch Robin Hood hashing balance probe distances. The bar above each slot shows its probe distance. Rich keys (short distance) get displaced by poor keys (long distance).

ImplementationCollision StrategyResize atWorst LookupSpecial Feature
Python dictOpen (perturbation probing)α = 2/3O(n)Compact layout, insertion order
Java HashMapChaining → treeα = 0.75O(log n)Treeification at chain length 8
C++ unordered_mapChainingα = 1.0O(n)Bucket interface, local iterators
Robin HoodOpen (displacement)α ≈ 0.9O(log n) expectedLow variance probe distance
CuckooTwo tables, evictionα ≈ 0.5O(1)Constant worst-case lookup

Hopscotch Hashing

Hopscotch hashing (Herlihy, Shavit, Tzafrir, 2008) is a hybrid between chaining and open addressing. Each slot has a "neighborhood" of size H (typically 32, matching the word size). A bitmap in each slot records which of the next H positions contain keys that hash to this slot. Lookup examines only the H slots in the neighborhood — a fixed constant, making worst-case lookup O(H) = O(1).

On insert, if the target neighborhood is full, the algorithm displaces an element outside the neighborhood into a further slot, then cascades displacements closer to the target. This is similar to cuckoo hashing but works within a single table. The bounded neighborhood size is particularly valuable for concurrent hash tables: a thread only needs to lock its neighborhood, not the entire table, enabling fine-grained locking and high parallelism.

Hopscotch hashing achieves load factors up to 0.9 with excellent performance, and its bitmap operations are efficiently implementable using SIMD instructions — a single instruction can check all 32 neighborhood slots simultaneously.

The Evolution of Hash Table Design

YearInnovationKey Idea
1953Chaining (Luhn)Linked lists at each slot
1954Open addressing (Amdahl, Elmore, Peterson)Probe within the table itself
1963Linear probing analysis (Knuth)First rigorous analysis of clustering
1979Universal hashing (Carter, Wegman)Randomized defense against adversarial inputs
1984Perfect hashing (FKS)O(1) worst-case for static sets
2001Cuckoo hashing (Pagh, Rodler)Worst-case O(1) lookup for dynamic sets
2005Robin Hood hashing (Celis)Variance reduction via displacement
2008Hopscotch hashingBounded neighborhoods for concurrency
2017Swiss Table (Google)SIMD-parallel probing, metadata bytes
Swiss Table (Google's abseil). Modern C++ code at Google uses "Swiss tables" (flat_hash_map). Key innovations: SIMD-parallel probing (check 16 slots at once using SSE2 instructions), metadata bytes per slot for fast filtering, and tombstone-free deletion via "shift-back" cleanup. It runs 2-3x faster than std::unordered_map in benchmarks. Rust's HashMap also uses this design (hashbrown crate).
Design question: You need a data structure for a read-heavy cache where lookup speed is critical and the key set changes infrequently. Which hash table variant is best?

Chapter 8: Design & Debug

Hash tables appear in nearly every system design interview. This chapter covers the most common design patterns and failure modes.

Design Challenge: LRU Cache

"Design a cache with O(1) get and O(1) put. When the cache exceeds capacity, evict the least recently used item."

This is the classic LRU (Least Recently Used) cache, and the answer is a hash map plus a doubly-linked list. The hash map gives O(1) lookup. The doubly-linked list maintains access order — the most recently used item is at the head, the least recently used is at the tail.

get(key)
Look up key in hash map → O(1). If found, move the node to the head of the linked list (mark as recently used) → O(1). Return the value.
put(key, value)
If key exists: update value, move to head. If key is new: create node, add to head, add to hash map. If over capacity: remove tail node from list and hash map → all O(1).
python
class Node:
    def __init__(self, key, val):
        self.key, self.val = key, val
        self.prev = self.nxt = None

class LRUCache:
    def __init__(self, capacity):
        self.cap = capacity
        self.map = {}  # key -> Node
        # Sentinel nodes: head.nxt = MRU, tail.prev = LRU
        self.head = Node(0, 0)
        self.tail = Node(0, 0)
        self.head.nxt = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        node.prev.nxt = node.nxt
        node.nxt.prev = node.prev

    def _add_front(self, node):
        node.nxt = self.head.nxt
        node.prev = self.head
        self.head.nxt.prev = node
        self.head.nxt = node

    def get(self, key):
        if key not in self.map:
            return -1
        node = self.map[key]
        self._remove(node)
        self._add_front(node)
        return node.val

    def put(self, key, val):
        if key in self.map:
            self._remove(self.map[key])
        node = Node(key, val)
        self.map[key] = node
        self._add_front(node)
        if len(self.map) > self.cap:
            lru = self.tail.prev
            self._remove(lru)
            del self.map[lru.key]
LRU Cache Simulator

Get and put keys in a capacity-3 LRU cache. Watch the linked list reorder on access. Least recently used is evicted when full.

Design Challenge: Spell Checker

A spell checker needs to answer "is this word valid?" for a dictionary of ~200K words. A hash set (hash table with no values, just keys) gives O(1) lookup. But if memory is tight, a Bloom filter uses only ~10 bits per word (250 KB total) with a 1% false positive rate — meaning it might say "yes" for a misspelled word, but never says "no" for a real word. Use the Bloom filter as a first pass, then check a compact on-disk dictionary for the rare false positives.

Debug Scenario: Sudden Slowdown

"Your web service uses a hash map to cache user sessions. Response times were 2ms but suddenly jumped to 200ms. What happened?"

Systematic diagnosis:

HypothesisHow to checkFix
Load factor too high (no resize)Log n and m, compute αTrigger resize at α > 0.75
Hash collision attackCheck if many keys map to same slotSwitch to SipHash or universal hashing
Broken hash functionCheck distribution of h(k) across slotsReplace hash function, ensure it uses all bits of the key
Tombstone accumulation (open addressing)Count DELETED slots vs totalPeriodic full rehash to eliminate tombstones
Memory thrashingCheck if table exceeds L3 cacheReduce table size, improve data locality

Collision Attacks

In 2011, researchers demonstrated that many web frameworks (PHP, Java, Python, Ruby) were vulnerable to hash collision attacks. By crafting HTTP POST parameters that all hash to the same slot, an attacker could force the server's hash table to degrade to O(n) per operation. A single malicious request with 100K parameters could tie up a CPU core for minutes.

The fix: SipHash, a keyed hash function designed by Jean-Philippe Aumasson and Daniel J. Bernstein. SipHash takes a 128-bit secret key, making it impossible for an attacker to predict which keys collide. Python (since 3.4), Rust, Ruby, and Perl all switched to SipHash for their built-in hash tables.

Debug Scenario: Memory Leak with Hash Tables

"Your Python service's memory grows without bound even though you are deleting entries from a dict."

Common cause: the dict stores objects that reference each other (circular references), and the garbage collector's cycle detector runs less frequently than your insertion rate. Or: you are storing large values and only deleting keys — the dict's internal array does not shrink automatically in most implementations. In CPython, dict.clear() followed by reassigning to a new dict is the only way to reclaim the backing array's memory.

Another subtlety: if you use a hash table with tombstones (open addressing), deleting keys does not actually free the slots. The table stays at its peak size. Periodic "compaction" (creating a new table and re-inserting all live keys) is necessary to truly reclaim space. This is analogous to database vacuuming (PostgreSQL's VACUUM command does exactly this for its hash indexes).

Debug Scenario: Hash Table Slower Than Expected

"Our hash map operations are taking 10x longer than expected. The load factor is only 0.3."

Low load factor should mean fast operations. If it is slow anyway, investigate:

CauseDiagnosisFix
Expensive hash functionProfile: is hash() taking most of the time?Cache hash values, use cheaper hash
Expensive key comparisonProfile: is equals() taking most of the time?Compare hash values first (fast reject)
Table too large for cacheTable size * entry size > L3 cache sizeShrink table, use compact representation
Many tombstonesCount DELETED vs occupied slotsCompact/rebuild the table
Pointer chasing (chaining)Profile cache misses with perf statSwitch to open addressing

Debug Scenario: Non-Deterministic Iteration Order

"Our test suite passes locally but fails in CI. The output order of items changes between runs."

In Python 3.3+, hash randomization is enabled by default (PYTHONHASHSEED is random each run). This means dict iteration order can vary between runs for dicts with non-integer keys. In Python 3.7+, dicts preserve insertion order, but sets do not. If your code depends on iteration order of a set, it will be non-deterministic. Fix: sort the output, or use an OrderedDict/list where order matters.

The hashCode/equals contract. In Java (and similar languages), if two objects are equal (a.equals(b) is true), they must have the same hash code. If you override equals() but not hashCode(), the hash table will store "equal" objects in different slots, leading to duplicate entries and failed lookups. This is the single most common hash table bug in Java code. Always override both or neither.

Design Scenario: Rate Limiter

"Design a rate limiter that allows at most K requests per client per minute."

Use a hash map from client_id to a list of timestamps. On each request, check if the client has made K requests in the last 60 seconds. Implementation:

python
import time
from collections import defaultdict, deque

class RateLimiter:
    def __init__(self, max_requests=10, window_secs=60):
        self.max_requests = max_requests
        self.window = window_secs
        self.clients = defaultdict(deque)  # client_id -> deque of timestamps

    def allow(self, client_id):
        now = time.time()
        q = self.clients[client_id]

        # Evict timestamps older than the window
        while q and now - q[0] > self.window:
            q.popleft()

        if len(q) < self.max_requests:
            q.append(now)
            return True  # allowed
        return False  # rate limited

The hash map gives O(1) lookup per client. The deque gives O(1) timestamp eviction. Total: O(1) amortized per request. This is the sliding window counter pattern — one of the most common uses of hash maps in backend engineering.

Design Scenario: Distributed Session Store

"You have 100 web servers. Users can hit any server. How do you store session data so that any server can access any user's session?"

Option 1: Consistent hashing across servers. hash(session_id) determines which server "owns" the session. On a cache miss, the server computes the consistent hash, routes to the owner, and caches locally. On server failure, only 1/N sessions need to remapped.

Option 2: Centralized hash table (Redis, Memcached). All servers read/write to a shared in-memory store. Simple but introduces a single point of failure and a network hop per access.

Option 3: Replicated hash table. Each server has a full copy. Writes are broadcast to all servers. Good for read-heavy workloads but write propagation adds latency. The interviewer expects you to compare these trade-offs: consistency, availability, partition tolerance (CAP theorem).

Debug question: You implement a hash table in Java. You override equals() for your custom key class to compare by value, but forget to override hashCode(). What happens when you insert two equal keys?

Chapter 9: Interview Arsenal

Cheat Sheet

VariantAvg SearchWorst SearchSpaceBest For
ChainingO(1+α)O(n)O(n+m)General purpose, frequent deletes
Linear ProbingO(1/(1-α))O(n)O(m)Cache-friendly, low α
Double HashingO(1/(1-α))O(n)O(m)Better than linear at high α
Robin HoodO(1/(1-α))O(log n) expO(m)Low-variance reads
CuckooO(1)O(1)O(n)Read-heavy, static-ish sets
PerfectO(1)O(1)O(n)Static sets (keywords, configs)
Bloom FilterO(k)O(k)O(m) bitsMembership test, no false negatives

Common Interview Problems

1. Two Sum (LeetCode #1) — Given an array and a target, find two elements that sum to the target. Brute force O(n²): try every pair. With a hash map: for each element x, check if (target - x) is in the map. If yes, return the pair. If no, add x to the map and continue. Single pass, O(n) time, O(n) space. This is the quintessential "use a hash map" interview problem — if you see "find a complement" or "find a pair," think hash map.

python
def two_sum(nums, target):
    seen = {}  # value -> index
    for i, x in enumerate(nums):
        complement = target - x
        if complement in seen:
            return [seen[complement], i]
        seen[x] = i
    return []

2. Group Anagrams (LeetCode #49) — Group strings that are anagrams of each other. Key insight: two strings are anagrams if and only if they have the same sorted characters. Use tuple(sorted(s)) as the hash map key. This is an example of the "canonical form" pattern: transform each element into a canonical representative, then group by representative using a hash map. The same pattern works for grouping isomorphic strings, grouping equivalent fractions, etc.

python
from collections import defaultdict

def group_anagrams(strs):
    groups = defaultdict(list)
    for s in strs:
        key = tuple(sorted(s))  # O(k log k) per string
        groups[key].append(s)
    return list(groups.values())

3. First Non-Repeating Character (LeetCode #387) — Find the first character in a string that appears exactly once. Two passes: first pass counts frequencies in a hash map, second pass finds the first character with count 1. This is the "frequency counting" pattern — use a hash map to count occurrences, then query the counts. The same technique solves: find majority element (Boyer-Moore is better but hash map works), find top K frequent elements, check if two strings are permutations.

python
def first_unique(s):
    freq = {}
    for c in s:
        freq[c] = freq.get(c, 0) + 1
    for c in s:
        if freq[c] == 1:
            return c
    return None

4. Subarray Sum Equals K — Count subarrays whose elements sum to k. Maintain a running prefix sum. At each index, check if (prefix_sum - k) exists in the hash map — that means there is a subarray ending here that sums to k. O(n) time, O(n) space.

python
def subarray_sum(nums, k):
    count = 0
    prefix = 0
    seen = {0: 1}  # prefix_sum -> count of occurrences
    for x in nums:
        prefix += x
        if prefix - k in seen:
            count += seen[prefix - k]
        seen[prefix] = seen.get(prefix, 0) + 1
    return count

5. Consistent Hashing — For distributed systems: place servers on a ring of size 232 using hash(server_id). To route a key, compute hash(key) and walk clockwise to the first server on the ring. When a server is added or removed, only keys between the new/removed server and its predecessor need to be remapped — roughly n/k keys move (where k is the number of servers) instead of the n keys that would move with naive modular hashing (hash(key) % k changes for every key when k changes). This is how Amazon DynamoDB, Apache Cassandra, and CDN routing work.

To ensure even load distribution, each physical server is represented by multiple virtual nodes at different points on the ring. With 100-200 virtual nodes per server, load imbalance stays below 10%.

6. Hash Table from Scratch — Implement a hash table with insert, search, delete, and automatic resizing. This is a full implementation exercise that tests all concepts from this lesson. The code is in Chapter 3 (chaining version) and the comprehensive version with resize is shown below in the cheat sheet.

7. Longest Substring Without Repeating Characters (LeetCode #3) — Use a hash map (or set) to track characters in the current window. Slide the window rightward, adding characters to the set. When a duplicate is found, slide the left boundary past the previous occurrence. The hash set gives O(1) membership testing at each step, making the algorithm O(n).

python
class HashMap:
    def __init__(self, initial_cap=8):
        self.cap = initial_cap
        self.size = 0
        self.buckets = [[] for _ in range(self.cap)]
        self.LOAD_THRESHOLD = 0.75

    def _hash(self, key):
        return hash(key) % self.cap

    def _resize(self):
        old = self.buckets
        self.cap *= 2
        self.buckets = [[] for _ in range(self.cap)]
        self.size = 0
        for bucket in old:
            for k, v in bucket:
                self.put(k, v)  # rehash into new table

    def put(self, key, value):
        if self.size / self.cap >= self.LOAD_THRESHOLD:
            self._resize()
        h = self._hash(key)
        for i, (k, v) in enumerate(self.buckets[h]):
            if k == key:
                self.buckets[h][i] = (key, value)
                return
        self.buckets[h].append((key, value))
        self.size += 1

    def get(self, key):
        h = self._hash(key)
        for k, v in self.buckets[h]:
            if k == key:
                return v
        raise KeyError(key)

    def remove(self, key):
        h = self._hash(key)
        for i, (k, v) in enumerate(self.buckets[h]):
            if k == key:
                self.buckets[h].pop(i)
                self.size -= 1
                return
        raise KeyError(key)

Hash Table Patterns for Interviews

Hash tables appear in interview problems in five recurring patterns. Recognizing the pattern is 80% of the solution.

PatternWhen to useExample problems
Complement lookup"Find a pair that satisfies a condition"Two Sum, Two Sum II, 4Sum, K-diff pairs
Frequency counting"Count occurrences, find duplicates, find majority"First unique char, top K frequent, valid anagram
Canonical grouping"Group elements by some equivalence"Group anagrams, isomorphic strings
Sliding window + set"Find longest/shortest subarray with distinct elements"Longest substring no repeat, min window substring
Prefix sum + map"Count subarrays with a target sum"Subarray sum equals K, continuous subarray sum

When you see a problem, ask: "Can I reduce this to a lookup?" If yes, a hash map likely helps. If the problem involves ordering (sorted output, range queries, nearest neighbor), a hash map alone will not suffice — consider a BST, heap, or sorted structure.

Implementing LRU Cache Step by Step

The LRU cache is one of the most frequently asked system design questions. Let us walk through the exact thought process an interviewer expects:

Step 1: Identify the operations. get(key) returns the value if key exists, else -1. put(key, value) inserts or updates. Both must be O(1). When capacity is exceeded, evict the least recently used key.

Step 2: Choose data structures. O(1) lookup suggests a hash map. O(1) insertion/removal at arbitrary positions suggests a doubly-linked list. The key insight: combine both. The hash map maps keys to linked list nodes. The linked list maintains recency order.

Step 3: Define invariants. The head of the list is always the most recently used. The tail is the least recently used. Every key in the hash map has a corresponding node in the list, and vice versa. These invariants must hold after every operation.

Step 4: Handle edge cases. What if get() is called on a key at the tail? It must move to the head. What if put() updates an existing key? Move it to the head. What if put() inserts into a full cache? Remove the tail, delete from the hash map, then insert the new key.

This is the pattern: state the operations, choose data structures, define invariants, handle edges. Interviewers evaluate your ability to think systematically, not just write code.

Key Concepts to Nail in Interviews

The five things every interviewer expects you to know: 1. Hash tables give O(1) average-case for insert/search/delete, NOT O(1) worst-case (unless perfect hashing or cuckoo hashing). 2. Collisions are inevitable (pigeonhole principle) — the question is how you resolve them. 3. Load factor determines performance. Resize at α ≈ 0.75. 4. Hash function quality matters. Bad hash = bad distribution = slow table. 5. hashCode/equals contract: equal objects must have equal hash codes.

Complexity Quick Reference

Copy this table to your whiteboard at the start of any hash table interview question:

OperationChaining (avg)Chaining (worst)Open Addr (avg)Open Addr (worst)
InsertO(1)O(n)O(1/(1-α))O(n)
Search (found)O(1 + α/2)O(n)O((1/α)ln(1/(1-α)))O(n)
Search (not found)O(1 + α)O(n)O(1/(1-α))O(n)
DeleteO(1 + α)O(n)Tombstone + searchO(n)
SpaceO(n + m)O(m)

Remember: all "average" bounds assume simple uniform hashing and α = O(1). The O(n) worst cases happen with adversarial inputs or degenerate hash functions.

Connections

TopicRelationship to Hash Tables
Balanced BSTs (Ch 13)O(log n) guaranteed vs hash table's O(1) average. BSTs support ordered operations (min, max, range queries) that hash tables cannot.
Dynamic ProgrammingMemoization uses hash tables to cache subproblem results.
Graph algorithmsAdjacency lists are often hash maps (vertex → neighbor list).
Counting problemsFrequency tables, histogram computation — all hash maps.
String algorithmsRabin-Karp uses rolling hash for substring search. Trie vs hash map trade-offs.
CryptographyCryptographic hash functions (SHA-256) guarantee collision resistance. Hash tables need only uniformity.
Distributed systemsConsistent hashing for load balancing. Rendezvous hashing for server selection.

Recommended Reading

ResourceWhat You Learn
CLRS Chapter 11Rigorous proofs of all bounds. Universal hashing proof. Perfect hashing O(n) space proof.
Knuth TAOCP Vol 3, Ch 6.4The original analysis of linear probing (1963). Still the deepest treatment available.
"A Cool and Practical Alternative to Traditional Hash Tables" (Pagh & Rodler, 2001)The cuckoo hashing paper. Elegant and readable.
Matt Kulukundis, CppCon 2017: "Designing a Fast, Efficient, Cache-friendly Hash Table"The Swiss Table talk. How Google rethought hash tables using SIMD.
Python source code: Objects/dictobject.cHow Python's dict actually works. The perturbation probe sequence. The compact layout.

Common Mistakes in Interviews

MistakeWhy it is wrongWhat to say instead
"Hash tables are O(1)"Only in the expected/average case. Worst case is O(n) for all standard variants."Expected O(1) under simple uniform hashing. Worst case O(n), but pathological only with adversarial inputs or a bad hash function."
"Just use a hash map" (for everything)Hash maps do not support ordered operations: min, max, range queries, predecessor/successor."Hash map for unordered access. BST or sorted array for ordered queries. Know when each is appropriate."
Forgetting to handle hash collisions in custom implementationsWithout collision resolution, keys silently overwrite each other."I will use chaining. Each slot is a linked list. Insert appends; search traverses."
Using mutable objects as hash keysIf the object changes after insertion, its hash changes, and the table loses the entry."Hash keys must be immutable. In Python, use tuples (not lists) as dict keys."

When NOT to Use Hash Tables

Hash tables are not always the answer. Use a different data structure when you need:

NeedBetter ChoiceWhy
Ordered iterationBalanced BST (TreeMap)In-order traversal in O(n)
Range queries ("all keys between 10 and 50")Balanced BST or B-treeHash has no notion of "between"
Find min/maxHeap or BSTO(1) min with heap vs O(n) scan of hash table
Memory-constrained membership testBloom filter~10 bits/element vs ~50+ bytes/element
Worst-case O(log n) guaranteeBalanced BSTHash table worst case is O(n)
Persistent/functional data structureHash Array Mapped Trie (HAMT)Structural sharing for immutable updates
Closing thought. "The most important property of a program is whether it accomplishes the intention of its user." — C.A.R. Hoare. Hash tables are the most practically important data structure because they directly solve the most common programming need: looking things up quickly. Every time you write dict[key] in Python or access a JSON object by key, you are using a hash table. Understanding how they work — the math, the engineering trade-offs, the failure modes — separates someone who uses tools from someone who builds them.
Final question: You need a data structure that supports: (a) O(1) average insert and lookup, (b) ordered iteration by key, and (c) O(log n) find-minimum. Can a single hash table do this?