Arrays, linked lists, stacks, queues, dictionaries, binary search trees, priority queues, and hashing. The building blocks everything else stands on.
You need to store a million phone numbers. Sometimes you need to look one up by name. Sometimes you need to add a new entry. Sometimes you need to find the smallest one. No single arrangement of data is the fastest at everything.
An unsorted list makes insertion trivial -- just tack it onto the end. But searching requires scanning the whole thing. A sorted list makes search fast via binary search, but now every insertion has to shift elements around to maintain order.
This is the fundamental tension of data structure design: optimizing for one operation often makes another expensive. The art is choosing the structure that best matches the operations you actually need.
This chapter surveys the essential data structures: arrays, linked lists, stacks, queues, trees, and hash tables. For each one, we will ask the same question: what operations does it make fast, and what does it make slow?
The two most fundamental ways to store a sequence are contiguous (arrays) and linked (linked lists). Every other data structure is built from one or both of these primitives.
An array stores elements in a contiguous block of memory. Element i lives at a fixed offset from the start, so accessing any element takes O(1) time. But inserting or deleting in the middle requires shifting all subsequent elements -- O(n) work.
A linked list stores each element in a separate node containing data plus a pointer to the next node. Insertion and deletion are O(1) once you have a pointer to the right spot. But finding that spot requires walking the list from the head -- O(n) in the worst case.
| Operation | Array | Linked List |
|---|---|---|
| Access i-th element | O(1) | O(n) |
| Search | O(n) unsorted, O(log n) sorted | O(n) |
| Insert at front | O(n) | O(1) |
| Delete (given pointer) | O(n) | O(1) |
Click operations to see the cost difference. The array is the top row, the linked list is below.
Stacks and queues are containers -- data structures where retrieval order depends on insertion order, not on content. They restrict how you access data, and that restriction is their power.
A stack retrieves by last-in, first-out (LIFO) order. Think of a stack of plates: you always take the top one. The operations are push (add to top) and pop (remove from top). Both take O(1) time.
A queue retrieves by first-in, first-out (FIFO) order. Think of a line at a bank: the first person in line is the first served. The operations are enqueue (add to back) and dequeue (remove from front). Both take O(1) time.
Push values into both, then pop/dequeue them. Notice the LIFO vs FIFO difference.
A dictionary is the workhorse of data structures. It stores items so you can find them by key. The three fundamental operations are search, insert, and delete.
Many real problems reduce to dictionary operations. Remove duplicates from a mailing list? Insert each name, searching first to check if it already exists. Find the most frequent word in a document? Use names as keys and counts as values.
The big question is what data structure implements the dictionary. The table below shows the worst-case costs for two simple implementations:
| Operation | Unsorted Array | Sorted Array |
|---|---|---|
| Search | O(n) | O(log n) |
| Insert | O(1) | O(n) |
| Delete | O(1)* | O(n) |
| Min / Max | O(n) | O(1) |
| Successor | O(n) | O(1) |
*Deletion from an unsorted array is O(1) by a clever trick: swap the deleted element with the last element, then decrement the size counter. No shifting needed.
Neither array implementation gives us fast search AND fast insert simultaneously. Binary search trees and hash tables will fix this, as we'll see next.
A binary search tree (BST) is a rooted binary tree where every node satisfies one rule: all keys in the left subtree are smaller, and all keys in the right subtree are larger. This single invariant makes search, insertion, and deletion all O(h), where h is the tree height.
To search for key k: compare with the root. If k is smaller, recurse left. If larger, recurse right. If equal, you found it. Each comparison eliminates one subtree.
To insert key k: search for k. When you fall off the tree (reach a null pointer), that's where k belongs. Create a new node there.
To find the minimum: follow left pointers until you can't anymore. The leftmost node holds the smallest key. Maximum is the mirror image -- follow right pointers.
Click "Insert Random" to add nodes. Watch the tree grow. Try "Insert Sorted" to see degeneration.
A priority queue is a data structure that supports inserting elements and efficiently extracting the one with the highest (or lowest) priority. It combines the best features of sorted and unsorted structures.
The key operations are:
The most common implementation is the binary heap -- a complete binary tree stored in an array where every parent is smaller than its children (in a min-heap). The heap gives us:
| Operation | Heap | Sorted Array | Unsorted Array |
|---|---|---|---|
| Insert | O(log n) | O(n) | O(1) |
| Extract-Min | O(log n) | O(1) | O(n) |
| Find-Min | O(1) | O(1) | O(n) |
Heaps are the engine behind heapsort (Chapter 4) and Dijkstra's shortest path (Chapter 6). Whenever you need "give me the smallest/largest item quickly," think heap.
What if we could do search, insert, and delete all in O(1) time? Hash tables come remarkably close to this ideal.
A hash function maps each key to an integer index in an array of size m. To insert key k, compute h(k) and store it in slot h(k). To search, compute h(k) and look there. If two keys hash to the same slot, we have a collision.
The two main collision strategies are:
• Chaining: Each slot holds a linked list. Colliding keys go into the same list. Search scans the list at h(k).
• Open addressing: On collision, probe subsequent slots until an empty one is found.
Insert keys and watch them hash to slots. Collisions form chains. Try inserting many keys to see the load increase.
Time to see all these data structures compete head-to-head. This simulation lets you pick an operation and watch how each structure performs it, step by step.
Select an operation to see the cost (number of steps) for each data structure. Green = fast (O(1) or O(log n)), yellow = medium (O(log n)), red = slow (O(n)).
Data structures are not isolated tools -- they are the foundations that make algorithms fast. Here is where each structure appears later in the book:
| Data Structure | Where It Leads |
|---|---|
| Heap / Priority Queue | Chapter 4: Heapsort, Chapter 6: Prim's & Dijkstra's |
| Stack | Chapter 5: DFS (implicit stack), Chapter 7: Backtracking |
| Queue | Chapter 5: BFS |
| Hash Table | Duplicate detection, string matching, caching DP results |
| BST / Balanced Tree | Chapter 6: Kruskal's union-find, sweep-line algorithms |
| Array | Chapter 8: DP tables, Chapter 4: Binary search |