Skiena, Chapter 7

Combinatorial Search

Backtracking, pruning, Sudoku, and heuristic methods. How to systematically explore vast solution spaces and how to give up gracefully with simulated annealing.

Prerequisites: Chapters 1-5 (recursion, DFS). That's it.
10
Chapters
5+
Simulations
10
Quizzes

Chapter 0: The Combinatorial Explosion

A modern computer performs billions of operations per second. That sounds like a lot, until you count how many configurations exist for even modest problems.

Problemn = 10n = 20n = 30
All subsets (2n)1,024~1 million~1 billion
All permutations (n!)3.6 million~1018~1032

One million items means all arrangements of about 10-11 objects, but not more. One million subsets means all combinations of about 20 items, but not more. For anything significantly larger, we need to be smart about what we search.

The two strategies: When the problem is small enough, backtracking systematically enumerates all solutions with clever pruning to skip dead ends. When it's too large for exact solutions, heuristic methods like simulated annealing find good-enough answers fast. This chapter teaches both.

The key insight: you almost never need to look at every configuration. With careful pruning, you can eliminate vast swaths of the search space at once, reducing impossible searches to seconds.

How many permutations of 15 objects exist?

Chapter 1: Backtracking

Backtracking is a systematic method for iterating through all possible configurations of a search space. It builds solutions incrementally, one component at a time, and abandons a partial solution as soon as it determines that it cannot lead to a valid complete solution.

We model solutions as vectors a = (a1, a2, ..., an), where each ai is chosen from a set Si of candidates. At each step, we extend the partial solution by one element. If we reach a complete solution, we process it. If the partial solution cannot possibly lead to a valid solution, we backtrack -- undo the last choice and try the next candidate.

python
def backtrack(a, k, input):
    if is_a_solution(a, k, input):
        process_solution(a, k, input)
    else:
        k += 1
        candidates = construct_candidates(a, k, input)
        for c in candidates:
            a[k] = c
            make_move(a, k, input)
            backtrack(a, k, input)
            unmake_move(a, k, input)  # clean up
Backtracking is DFS on an implicit graph: The solution space forms a tree where each node is a partial solution and each edge represents adding one element. Backtracking is depth-first search on this tree. DFS is preferred over BFS because it uses space proportional to the tree height (the solution length), not the tree width (which grows exponentially).

The framework is general. Five application-specific functions plug in to customize it: is_a_solution (are we done?), construct_candidates (what choices exist?), process_solution (output the result), make_move and unmake_move (update/restore data structures).

Why is depth-first search preferred over breadth-first for backtracking?

Chapter 2: Constructing All Subsets

How do we generate all 2n subsets of {1, ..., n}? Set up a vector of n cells. The value of ai (true or false) says whether item i is in the subset. Each element doubles the possibilities, giving exactly 2n subsets.

python
def all_subsets(n):
    def bt(a, k):
        if k == n:
            print([i+1 for i in range(n) if a[i]])
        else:
            for val in [True, False]:
                a[k] = val
                bt(a, k + 1)
    bt([None] * n, 0)

Since True always comes before False, the full set is generated first and the empty set last. For {1, 2, 3}:

{1,2,3}, {1,2}, {1,3}, {1}, {2,3}, {2}, {3}, {}

The tree has 2n leaves. At each level, we branch into two choices (include or exclude item i). The tree has depth n and 2n leaves -- one for each subset. Every internal node is visited at most twice (once going down, once coming back up).
Subset Generation Tree

Click "Step" to advance through the subset tree for {1,2,3}. Left branch = include, right = exclude. Warm = current path.

Click Step to generate subsets

This pattern generalizes beyond subsets. The backtracking framework generates any combinatorial object by changing only the construct_candidates function. For subsets: 2 candidates (true/false). For permutations: all unused elements. For graph paths: all unvisited neighbors. Same framework, different candidates.

In the subset-generation backtracking tree, what does each level represent?

Chapter 3: Constructing All Permutations

How many permutations of {1, ..., n}? There are n choices for the first position, n-1 for the second, n-2 for the third, and so on. Total: n! = n × (n-1) × ... × 1.

The candidates for position k are the elements not yet used in positions 1 through k-1. We track used elements with a boolean array for O(1) lookup per candidate.

python
def all_permutations(n):
    def bt(a, k, used):
        if k == n:
            print(a[:])
        else:
            for i in range(1, n + 1):
                if not used[i]:
                    a[k] = i
                    used[i] = True
                    bt(a, k + 1, used)
                    used[i] = False
    bt([0] * n, 0, [False] * (n + 1))

This generates permutations in lexicographic order: 123, 132, 213, 231, 312, 321. The tree has n! leaves and depth n.

Subsets vs Permutations: Subsets have 2 candidates per position (include/exclude), giving a binary tree with 2n leaves. Permutations have a decreasing number of candidates (n, n-1, ..., 1), giving a tree with n! leaves. Both use the same backtracking framework -- only construct_candidates differs.
What are the candidates for position k when generating permutations?

Chapter 4: Search Pruning

Backtracking guarantees correctness by enumerating all possibilities. But most of those possibilities are garbage. Pruning means cutting off branches of the search tree that cannot lead to valid or optimal solutions.

Consider finding the optimal traveling salesman tour. We could generate all n! permutations and check each tour's cost. But suppose our partial tour starts with vertices (v1, v2) and the edge (v1, v2) doesn't even exist in the graph. The next (n-2)! permutations starting with (v1, v2) are all wasted. Much better to prune immediately.

Pruning is the difference between impossible and instant. Skiena's Sudoku experiments illustrate this dramatically. With naive backtracking, a hard puzzle never finished. With most-constrained-square selection plus look-ahead pruning, the same puzzle solved in under 11,000 steps. That's a speedup of billions.

Two pruning strategies that work broadly:

StrategyIdeaEffect
Feasibility pruningAbandon partial solutions that violate constraintsAvoids obviously bad paths
Optimality pruningAbandon partial solutions that can't beat the best known answerBranch-and-bound style cutoff
Symmetry breakingOnly explore one representative from each set of symmetric configurationsCan reduce search by factor of n! or more
Look-aheadCheck if the remaining subproblem has any feasible extension at allDetects dead ends early
In Sudoku, the "most constrained first" heuristic selects the open square with the fewest remaining candidates. Why does this dramatically speed up the search?

Chapter 5: Sudoku by Backtracking

Sudoku is a perfect showcase for backtracking with pruning. The rules: fill a 9 × 9 grid so every row, column, and 3 × 3 box contains the digits 1-9 with no repeats.

The state space: each open square must be filled with a digit. The candidates for square (i, j) are the digits 1-9 that don't appear in row i, column j, or the box containing (i, j). We backtrack when a square has zero candidates.

Skiena tested four backtracking variants on Sudoku puzzles of varying difficulty:

Next SquareLook-ahead?EasyMediumHard
ArbitraryNo1,904,832863,305never finished
ArbitraryYes12714212,507,212
Most constrainedNo48841,243,838
Most constrainedYes486510,374
The lesson is stark. Smart square selection (most constrained first) combined with look-ahead pruning reduced the hard puzzle from "never finished" to 10,374 steps. That is the power of pruning -- not just a constant-factor speedup, but the difference between impossible and trivial.

Look-ahead checks whether any other open square has zero candidates. If so, there is no way to complete the puzzle from the current state, so we backtrack immediately instead of wasting effort filling in more squares before discovering the dead end.

What is "look-ahead" pruning in the context of Sudoku?

Chapter 6: Random Sampling

When the search space is too large for exhaustive search, we need heuristic methods. The simplest is random sampling (Monte Carlo): repeatedly generate random solutions, evaluate them, and keep the best.

python
def random_sampling(problem, n_samples):
    best = random_solution(problem)
    best_cost = cost(best)
    for _ in range(n_samples):
        s = random_solution(problem)
        c = cost(s)
        if c < best_cost:
            best, best_cost = s, c
    return best

When does random sampling work well?

Many acceptable solutions
If most random solutions are good, you'll find a good one quickly.
No exploitable structure
When the landscape has no gradient to follow, random is as good as anything.

When does it fail? For TSP, the number of reasonable tours is vanishingly small compared to all permutations. Random sampling will almost never find a good tour, no matter how many samples you take.

For the traveling salesman problem on 20 cities, why is random sampling a terrible strategy?

Chapter 7: Simulated Annealing

Simulated annealing is inspired by the physical process of cooling metal. A hot material has atoms bouncing wildly; as it cools, they settle into a low-energy crystal structure. The algorithm starts "hot" (accepting random moves, even bad ones) and gradually cools (becoming increasingly picky).

Start
Begin with a random solution S and a high temperature T.
Neighbor
Generate a random neighbor S' of S (a small modification).
Accept?
If S' is better, always accept it. If S' is worse, accept it with probability e-(cost(S')-cost(S))/T. This probability decreases as T drops.
Cool
Reduce T slightly. Repeat until frozen.
python
def simulated_annealing(problem, T=1.0, cooling=0.999, min_T=0.001):
    s = random_solution(problem)
    best = s
    while T > min_T:
        s_new = random_neighbor(s)
        delta = cost(s_new) - cost(s)
        if delta < 0 or random() < exp(-delta / T):
            s = s_new
        if cost(s) < cost(best):
            best = s
        T *= cooling
    return best
Why accept bad moves? Pure hill-climbing gets trapped in local optima -- valleys surrounded by worse solutions in every direction. By occasionally accepting a worse solution (especially early on when T is high), simulated annealing can escape local optima and explore more of the landscape. As T decreases, it settles into the best region it's found.

The three critical design choices: (1) the cooling schedule (how fast T decreases), (2) the neighborhood structure (what counts as a "small" move), and (3) the initial temperature (hot enough to escape any local minimum).

For TSP, a natural neighborhood is swapping two cities in the tour (a "2-opt" move). The gradient-descent version (always accept improvements, never accept worse) is called hill climbing. It is fast but gets trapped in local optima. Simulated annealing is hill climbing with strategic occasional mistakes.

MethodAccepts Worse?GuaranteeSpeed
Random samplingN/A (no neighbors)NoneFast per sample, needs many
Hill climbingNeverLocal optimum onlyVery fast
Simulated annealingSometimes (decreasing)Converges to global (in theory)Moderate
Genetic algorithmsVia crossover/mutationNoneSlow per generation
Why does simulated annealing sometimes accept worse solutions?

Chapter 8: The N-Queens Solver

The classic N-Queens problem: place N queens on an N × N chessboard so that no two queens attack each other (no shared row, column, or diagonal). This is a perfect backtracking problem with beautiful pruning opportunities.

N-Queens Backtracking Solver

Click "Solve" to watch backtracking find all solutions. Queens are placed row by row. Red = conflict detected (backtrack!).

Choose N and click Solve
What to notice: The solver places queens one row at a time. When it reaches a row where no column is safe (every position is attacked), it backtracks to the previous row and tries the next column. Watch how pruning eliminates most of the search tree -- the solver never tries placing a queen in a column that's already under attack.
In the N-Queens backtracking solution, what constraint makes column c invalid for row r?

Chapter 9: Connections

Combinatorial search bridges the gap between exact algorithms and the hard problems of the next two chapters:

ConceptWhere It Leads
BacktrackingChapter 8: DP caches overlapping subproblems that backtracking recomputes
PruningBranch-and-bound extends pruning with optimality bounds
HeuristicsChapter 9: NP-complete problems have no exact efficient algorithm, making heuristics essential
Simulated annealingWidely used for VLSI layout, protein folding, scheduling
Search tree = DFSChapter 5: Backtracking is DFS on an implicit search tree
Skiena's take-home lessons from Chapter 7:
• Backtracking ensures correctness by enumerating all possibilities, and ensures efficiency by never visiting a state more than once.
• Clever pruning can make short work of surprisingly hard combinatorial search problems. Proper pruning has a greater impact on search time than any other factor.
• Simulated annealing is the most reliable heuristic search method for hard optimization problems. It is simple to implement, applicable to any problem with a cost function, and often produces excellent results.
• The key decisions in simulated annealing are the cooling schedule, neighborhood structure, and initial temperature.
• When you can't solve a problem exactly, heuristics that probably return something good are better than algorithms that definitely take forever.
You need to solve a combinatorial optimization problem with n = 50 objects. Exhaustive search over all permutations (50!) is impossible. What do you do?