Data Structures — Skiena, Chapter 3

Chapter 0: The Trade-off

You need to store a million phone numbers. Sometimes you need to look one up by name. Sometimes you need to add a new entry. Sometimes you need to find the smallest one. No single arrangement of data is the fastest at everything.

An unsorted list makes insertion trivial -- just tack it onto the end. But searching requires scanning the whole thing. A sorted list makes search fast via binary search, but now every insertion has to shift elements around to maintain order.

This is the fundamental tension of data structure design: optimizing for one operation often makes another expensive. The art is choosing the structure that best matches the operations you actually need.

The core insight: There is no single "best" data structure. The right choice depends entirely on which operations you perform most frequently. Data structure design is about balancing trade-offs, not achieving perfection.

This chapter surveys the essential data structures: arrays, linked lists, stacks, queues, trees, and hash tables. For each one, we will ask the same question: what operations does it make fast, and what does it make slow?

Why can't a single data structure be the fastest at every operation?

Because computers are too slow Because organizing data to speed up one operation often slows down another Because programming languages are limited

Chapter 1: Arrays vs Linked Lists

The two most fundamental ways to store a sequence are contiguous (arrays) and linked (linked lists). Every other data structure is built from one or both of these primitives.

An array stores elements in a contiguous block of memory. Element i lives at a fixed offset from the start, so accessing any element takes O(1) time. But inserting or deleting in the middle requires shifting all subsequent elements -- O(n) work.

A linked list stores each element in a separate node containing data plus a pointer to the next node. Insertion and deletion are O(1) once you have a pointer to the right spot. But finding that spot requires walking the list from the head -- O(n) in the worst case.

The key trade-off:
• Arrays win at random access and memory efficiency (no pointer overhead). They have better cache locality because elements sit next to each other in memory.
• Linked lists win at dynamic resizing and fast insertion/deletion. They never overflow unless memory itself is full.

Operation	Array	Linked List
Access i-th element	O(1)	O(n)
Search	O(n) unsorted, O(log n) sorted	O(n)
Insert at front	O(n)	O(1)
Delete (given pointer)	O(n)	O(1)

Array vs Linked List Operations

Click operations to see the cost difference. The array is the top row, the linked list is below.

You need to frequently insert elements at the front of a sequence and rarely need random access. Which structure is better?

Array (O(1) access) Linked list (O(1) insertion at front, and you rarely need random access) Neither, both are equally good

Chapter 2: Stacks & Queues

Stacks and queues are containers -- data structures where retrieval order depends on insertion order, not on content. They restrict how you access data, and that restriction is their power.

A stack retrieves by last-in, first-out (LIFO) order. Think of a stack of plates: you always take the top one. The operations are push (add to top) and pop (remove from top). Both take O(1) time.

A queue retrieves by first-in, first-out (FIFO) order. Think of a line at a bank: the first person in line is the first served. The operations are enqueue (add to back) and dequeue (remove from front). Both take O(1) time.

When to use which: Stacks appear naturally in recursion (the call stack), expression evaluation, and undo operations. Queues appear in breadth-first search, job scheduling, and any situation where fairness (first-come, first-served) matters. If retrieval order doesn't matter, prefer a stack -- it's simpler.

Stack vs Queue

Push values into both, then pop/dequeue them. Notice the LIFO vs FIFO difference.

You push A, B, C onto a stack, then pop twice. What do you get (in order)?

A, B C, B (last in, first out) B, C

Chapter 3: Dictionaries

A dictionary is the workhorse of data structures. It stores items so you can find them by key. The three fundamental operations are search, insert, and delete.

Many real problems reduce to dictionary operations. Remove duplicates from a mailing list? Insert each name, searching first to check if it already exists. Find the most frequent word in a document? Use names as keys and counts as values.

The big question is what data structure implements the dictionary. The table below shows the worst-case costs for two simple implementations:

Operation	Unsorted Array	Sorted Array
Search	O(n)	O(log n)
Insert	O(1)	O(n)
Delete	O(1)*	O(n)
Min / Max	O(n)	O(1)
Successor	O(n)	O(1)

*Deletion from an unsorted array is O(1) by a clever trick: swap the deleted element with the last element, then decrement the size counter. No shifting needed.

Skiena's take-home lesson: Data structure design must balance all the different operations it supports. The fastest data structure for both operations A and B may well not be the fastest for either A or B alone.

Neither array implementation gives us fast search AND fast insert simultaneously. Binary search trees and hash tables will fix this, as we'll see next.

An unsorted array supports O(1) insertion and O(1) deletion (via swap trick). What's the catch?

Search takes O(n) because the array is unsorted It uses too much memory There is no catch

Chapter 4: Binary Search Trees

A binary search tree (BST) is a rooted binary tree where every node satisfies one rule: all keys in the left subtree are smaller, and all keys in the right subtree are larger. This single invariant makes search, insertion, and deletion all O(h), where h is the tree height.

To search for key k: compare with the root. If k is smaller, recurse left. If larger, recurse right. If equal, you found it. Each comparison eliminates one subtree.

To insert key k: search for k. When you fall off the tree (reach a null pointer), that's where k belongs. Create a new node there.

To find the minimum: follow left pointers until you can't anymore. The leftmost node holds the smallest key. Maximum is the mirror image -- follow right pointers.

The height matters: All BST operations take O(h) time, where h is the tree's height. A balanced tree on n nodes has h = O(log n), giving us O(log n) search, insert, and delete. But a tree built by inserting sorted data degenerates into a linked list with h = n. Balanced tree variants (AVL trees, red-black trees) add rotations to guarantee O(log n) height.

BST Builder

Click "Insert Random" to add nodes. Watch the tree grow. Try "Insert Sorted" to see degeneration.

What is the worst-case height of a BST with n nodes?

n (degenerates to a linked list when keys are inserted in sorted order) log n n/2

Chapter 5: Priority Queues

A priority queue is a data structure that supports inserting elements and efficiently extracting the one with the highest (or lowest) priority. It combines the best features of sorted and unsorted structures.

The key operations are:

Insert(Q, x)

Add element x to the priority queue.

↓

Extract-Min(Q)

Remove and return the element with the smallest key.

The most common implementation is the binary heap -- a complete binary tree stored in an array where every parent is smaller than its children (in a min-heap). The heap gives us:

Operation	Heap	Sorted Array	Unsorted Array
Insert	O(log n)	O(n)	O(1)
Extract-Min	O(log n)	O(1)	O(n)
Find-Min	O(1)	O(1)	O(n)

The heap trick: A heap stores the array so that element at position k has children at positions 2k and 2k+1, and its parent at position ⌊k/2⌋. No pointers needed! The tree is implicit in the array indices. This means a heap uses half the memory of a pointer-based tree.

Heaps are the engine behind heapsort (Chapter 4) and Dijkstra's shortest path (Chapter 6). Whenever you need "give me the smallest/largest item quickly," think heap.

In a min-heap stored as an array, where is the minimum element?

Always at position 1 (the root) At the last position Could be anywhere

Chapter 6: Hashing

What if we could do search, insert, and delete all in O(1) time? Hash tables come remarkably close to this ideal.

A hash function maps each key to an integer index in an array of size m. To insert key k, compute h(k) and store it in slot h(k). To search, compute h(k) and look there. If two keys hash to the same slot, we have a collision.

The two main collision strategies are:

• Chaining: Each slot holds a linked list. Colliding keys go into the same list. Search scans the list at h(k).
• Open addressing: On collision, probe subsequent slots until an empty one is found.

Expected O(1), worst case O(n): With a good hash function and a table that isn't too full, each slot holds about n/m items on average (called the load factor). With chaining and load factor ≤ 1, expected search time is O(1). But all n keys could hash to the same slot, giving O(n) worst case. A good hash function makes this astronomically unlikely.

Hash Table Visualizer

Insert keys and watch them hash to slots. Collisions form chains. Try inserting many keys to see the load increase.

Load: 0/7

A hash table with chaining has 10 slots and 30 keys. What is the expected chain length?

10 3 (load factor = 30/10 = 3, so each chain has ~3 elements on average) 1

Chapter 7: The Operations Explorer

Time to see all these data structures compete head-to-head. This simulation lets you pick an operation and watch how each structure performs it, step by step.

Data Structure Comparison

Select an operation to see the cost (number of steps) for each data structure. Green = fast (O(1) or O(log n)), yellow = medium (O(log n)), red = slow (O(n)).

n 100

Try this: Set n to 1000 and switch between operations. Notice how no single structure is green for everything. The hash table is fastest for search/insert/delete but can't do min or successor efficiently. The sorted array is great for min/successor/search but terrible at insert. The BST (balanced) is the best all-rounder -- O(log n) for everything.

Which data structure provides O(log n) performance for ALL five dictionary operations?

Hash table Sorted array Balanced BST (all operations take O(log n) or better)

Chapter 8: Connections

Data structures are not isolated tools -- they are the foundations that make algorithms fast. Here is where each structure appears later in the book:

Data Structure	Where It Leads
Heap / Priority Queue	Chapter 4: Heapsort, Chapter 6: Prim's & Dijkstra's
Stack	Chapter 5: DFS (implicit stack), Chapter 7: Backtracking
Queue	Chapter 5: BFS
Hash Table	Duplicate detection, string matching, caching DP results
BST / Balanced Tree	Chapter 6: Kruskal's union-find, sweep-line algorithms
Array	Chapter 8: DP tables, Chapter 4: Binary search

Skiena's take-home lessons from Chapter 3:
• Picking the wrong data structure can make an otherwise efficient algorithm useless. The right structure for the job matters enormously.
• Stacks (LIFO) and queues (FIFO) are simple but arise constantly in graph algorithms and recursion.
• Binary search trees give O(log n) dictionary operations -- but only when balanced. Sorted input destroys them.
• Hash tables give expected O(1) for search/insert/delete, but they cannot support min, max, or ordered traversal.
• When you need both fast search AND ordered operations, a balanced BST is the answer.

You need to repeatedly extract the minimum element from a dynamic collection. Which data structure is best?

Min-heap / priority queue (O(log n) extract-min, O(1) find-min) Hash table Unsorted array