Kochenderfer & Wheeler, Chapter 3

Bracketing

Trap the minimum in an interval, then squeeze until there's nowhere left to hide.

Prerequisites: Chapter 2 (Derivatives and Gradients).
10
Chapters
2
Simulations
10
Quizzes

Chapter 0: Why Bracket?

Before tackling high-dimensional optimization, let's master the simplest case: minimizing a function of a single variable f(x). The key idea is bracketing — finding an interval [a, b] that is guaranteed to contain a minimum, then shrinking it.

The core idea: If you know three points a < b < c where f(a) > f(b) and f(c) > f(b), and the function is unimodal (has exactly one minimum), then the minimum must lie in [a, c]. Now you just need a strategy for shrinking that interval.

Bracketing methods are foundational. They appear as subroutines in nearly every multivariate optimization algorithm — whenever you need to do a line search (optimize along a single direction), you are solving a 1D bracketing problem.

Find a Bracket
Identify [a, b, c] where the minimum is trapped.
Shrink the Bracket
Evaluate f at an interior point and discard a subinterval.
Converge
Repeat until the interval is smaller than tolerance ε.
Why are 1D bracketing methods important for multivariate optimization?

Chapter 1: Unimodality

A function is unimodal on an interval [a, b] if there is a unique minimizer x* in [a, b] such that f is strictly decreasing on [a, x*] and strictly increasing on [x*, b].

Key insight: Unimodality is the crucial assumption that makes bracketing work. If f has multiple local minima in [a, b], evaluating at an interior point doesn't tell you which sub-interval contains the global minimum. With unimodality, one evaluation eliminates a chunk of the interval.

Given a unimodal function and a bracket [a, b], evaluate at two interior points x1 < x2:

The key question is: where should you place x1 and x2 to shrink the interval as fast as possible?

For a unimodal function, if f(x1) < f(x2) with x1 < x2, the minimum lies in:

Chapter 2: Finding an Initial Bracket

Before we can shrink a bracket, we need to find one. The textbook presents an expanding search: start with a point and a step size, then keep expanding in the downhill direction until you overshoot the minimum.

pseudocode
function bracket_minimum(f, x; s=1e-2, k=2.0)
  a, ya = x, f(x)
  b, yb = a + s, f(a + s)
  if yb > ya
    a, b = b, a       # swap: go the other direction
    s = -s
  end
  while true
    c, yc = b + s, f(b + s)
    if yc > yb
      return (a, c)    # bracket found: ya > yb and yc > yb
    end
    a, ya, b, yb = b, yb, c, yc
    s *= k             # expand step size
  end

The expansion factor k = 2 doubles the step at each iteration. This ensures the bracket is found in O(log(distance to minimum)) steps, but the resulting bracket may be much wider than necessary — hence the need for shrinking algorithms.

The bracket expansion algorithm stops when:

Chapter 3: Fibonacci Search

If you know in advance exactly how many function evaluations you can afford (say n), Fibonacci search is provably optimal — it shrinks the interval as much as possible for n evaluations.

The placement of evaluation points is determined by Fibonacci numbers: 1, 1, 2, 3, 5, 8, 13, 21, .... After n evaluations, the interval shrinks by a factor of 1/Fn.

Key insight: Each evaluation point is placed so that it can be reused in the next iteration. Fibonacci search achieves this perfectly by placing points at ratios determined by consecutive Fibonacci numbers: the points are at fractions Fn-2/Fn and Fn-1/Fn from the left.

The catch: you must decide n in advance. If you don't know your budget ahead of time, the golden section search (next chapter) is more practical.

Evaluations nFnInterval Shrinkage
581/8 = 12.5%
10891/89 ≈ 1.1%
20109461/10946 ≈ 0.009%
What makes Fibonacci search impractical for open-ended optimization?

Chapter 4: Golden Section Search

The golden section search is the limit of Fibonacci search as n → ∞. It places evaluation points at the golden ratio φ = (1 + √5) / 2 ≈ 1.618.

At each step, the interval shrinks by a factor of 1/φ ≈ 0.618. The magic property: the remaining point from the previous iteration is already in the right position for the next iteration, so you only need one new function evaluation per step.

ρ = φ − 1 = (√5 − 1) / 2 ≈ 0.618
Why the golden ratio? It is the unique ratio where removing a square from a golden rectangle leaves a smaller golden rectangle. In our context, it means the shrinkage factor is the same whether we discard the left or right subinterval, and the surviving evaluation point falls at exactly the right position for the next step.
pseudocode
function golden_section_search(f, a, b; n)
  ρ = (√5 - 1) / 2
  d = ρ * (b - a)
  x1, x2 = a + d, b - d      # symmetric placement
  y1, y2 = f(x1), f(x2)
  for i = 1 to n - 2
    if y1 < y2
      a, x2, y2 = x2, x1, y1
      d *= ρ
      x1 = a + d
      y1 = f(x1)
    else
      b, x1, y1 = x1, x2, y2
      d *= ρ
      x2 = b - d
      y2 = f(x2)
  return y1 < y2 ? x1 : x2
How many new function evaluations does golden section search need per iteration?

Chapter 5: Quadratic Fit Search

Golden section search ignores the function values — it only uses comparisons. Quadratic fit search does better by fitting a parabola through three points and jumping to its minimum.

Given three bracket points (a, f(a)), (b, f(b)), (c, f(c)), fit a quadratic q(x) = αx² + βx + γ and compute its minimizer x* = −β/(2α).

Key insight: Near a smooth minimum, the function is well-approximated by a parabola (Taylor expansion). Quadratic fit search exploits this local shape, converging much faster than golden section when the function is smooth. It achieves superlinear convergence.

The danger: if the function is not well-approximated by a quadratic (e.g., near a kink or in a flat region), the quadratic fit can give a terrible prediction. Robust implementations combine quadratic fit with golden section as a fallback.

MethodConvergence RateAssumptions
Golden sectionLinear (factor 0.618)Unimodality only
Quadratic fitSuperlinearSmoothness near minimum
Bisection (derivative)Linear (factor 0.5)Derivative available
Quadratic fit search converges faster than golden section because:

Chapter 6: Shubert-Piyavskii Method

What if you want to find the global minimum, not just a local one? The Shubert-Piyavskii method can do this — but it requires the function to be Lipschitz continuous with a known constant L:

|f(x) − f(y)| ≤ L |x − y|    for all x, y

The Lipschitz constant L bounds how fast the function can change. Given L and a set of evaluated points, we can construct a sawtooth lower bound: from each evaluated point, draw lines of slope ±L. The function must lie above this sawtooth.

The idea: The minimum of the sawtooth lower bound tells us the best-case value the function could achieve in the unexplored region. Evaluate f at that point, update the sawtooth, and repeat. This is guaranteed to converge to the global minimum.

The catch: you need to know L. Overestimating L makes the search conservative (slow). Underestimating L means you might miss the global minimum. In practice, L is often unknown.

The Shubert-Piyavskii method requires what property of the function?

Chapter 7: Bisection Method

If you have access to the derivative f'(x), you can use the bisection method. The idea: at the midpoint m of [a, b], evaluate f'(m):

Each step halves the interval, giving a shrinkage factor of 0.5 per evaluation (better than golden section's 0.618). The tradeoff: you need the derivative, which is an extra cost.

Comparison: Bisection on the derivative converts root-finding (find where f'=0) into a bracketing problem. It is conceptually the same as the bisection method for finding roots of equations, applied to f' instead of f.
If f'(m) < 0 at the midpoint, the minimum is:

Chapter 8: Golden Section Simulator

Watch golden section search in action. The canvas below shows a unimodal function with the current bracket [a, b] and the two evaluation points. Click "Step" to advance the algorithm.

Golden Section Search

Click Step to advance. The bracket (green region) shrinks by factor φ each step.

Step 0 | Bracket width: 6.00
After 10 steps of golden section search, the bracket width shrinks by approximately:

Chapter 9: Summary

MethodRequiresShrinkage/stepConvergence
Fibonacci searchUnimodality, fixed budgetOptimal for n evalsLinear
Golden sectionUnimodality× 0.618Linear
Quadratic fitSmoothnessVariableSuperlinear
Shubert-PiyavskiiLipschitz constantVariableGlobal guarantee
BisectionDerivative available× 0.5Linear
Looking ahead: Chapter 4 introduces the multivariate analog: local descent. The line search subroutine used in local descent algorithms is exactly the 1D bracketing problem. Golden section and quadratic fit appear inside almost every gradient-based optimizer.
Which bracketing method provides a global optimality guarantee?