What is optimization? A mathematical framework for finding the best design, decision, or parameter setting — and the algorithms that get us there.
You are designing a wing for an aircraft. The wing has to generate enough lift, but you also want to minimize drag. The shape of the airfoil is controlled by a handful of parameters — thicknesses at various points along the chord. Which values should you choose?
You could try a few by hand. But the search space is enormous: seven parameters, each with a continuous range. Trying every combination is impossible. You need a systematic procedure — an algorithm — that explores the space intelligently and converges on the best design.
Optimization is everywhere:
This chapter sets the stage. We will formalize what an optimization problem looks like, define what a "solution" means, and survey the landscape of algorithms covered in this book.
The quest to find "the best" is ancient. Pythagoras (569-475 BCE) claimed that mathematics could model the world. Euclid (325-265 BCE) proved that a square is the rectangle with maximum area for a given perimeter — an early optimization result.
Fermat (1601-1665) realized you could find optima by setting the derivative to zero. Lagrange (1736-1813) extended this to constrained problems with his method of Lagrange multipliers, combining the objective and constraints into a single function called the Lagrangian.
The modern era began in the mid-20th century with the rise of electronic computers:
| Year | Milestone | Contributor |
|---|---|---|
| 1940 | Linear programming formulation | Kantorovich |
| 1947 | Simplex algorithm | Dantzig |
| 1952 | Dynamic programming | Bellman |
| 1960s | Quasi-Newton methods (BFGS) | Broyden, Fletcher, Goldfarb, Shanno |
| 2014+ | Adam optimizer for deep learning | Kingma & Ba |
Today, optimization powers everything from training transformer models with billions of parameters (ChatGPT, 2022) to designing aircraft wings and planning medical radiotherapy.
A typical engineering design optimization workflow has four stages:
The designer's job is to formulate the problem correctly. The algorithm's job is to solve it. A brilliant algorithm applied to a poorly formulated problem will still produce a bad answer.
There are also pitfalls. An optimization algorithm may exploit modeling errors, producing a solution that is "optimal" with respect to the model but useless in practice. The designer must always validate the result against physical reality.
Every optimization problem can be written in a standard form:
Here x is the design point — a vector [x1, ..., xn] of design variables. f is the objective function we want to minimize. X is the feasible set — the collection of all allowed designs.
The feasible set is typically defined by constraints:
This gives a triangular feasible region in 2D. The solution x* is the point inside that triangle where f is smallest.
The no free lunch theorem (Wolpert & Macready, 1997) says no single optimization algorithm is best for all problems. Every algorithm makes assumptions about the objective function — smoothness, convexity, Lipschitz continuity. Choosing the right algorithm depends on knowing your problem's structure.
Optimization problems appear everywhere in engineering and science. Here are key examples from the textbook:
| Domain | Design Variables | Objective | Constraints |
|---|---|---|---|
| Aircraft Design | Airfoil thicknesses at 7 chord points | Minimize drag | Maintain lift, minimum thickness |
| Deep Learning | Millions of weights & biases | Minimize prediction error (loss) | Implicit: network architecture |
| Portfolio | Fraction of budget per asset | Maximize return, minimize risk | Weights ≥ 0, sum to 1 |
| Radiotherapy | Beam positions, shapes, intensities | Minimize radiation to healthy tissue | Tumor dose ≥ threshold |
| Robotics | Sequence of moves | Minimize number of steps | Avoid obstacles |
Notice the variety: continuous vs. discrete variables, convex vs. non-convex objectives, few vs. millions of dimensions. Different structures call for different algorithms — which is why this book covers so many.
When we minimize f, we seek a global minimizer: a point x* where f(x*) is the smallest value f achieves. But proving global optimality is generally hard. Often the best we can do is verify a local minimum.
A point x* is a local minimizer if f(x*) ≤ f(x) for all x in some neighborhood around x*. There are two flavors:
| Type | Definition | Example |
|---|---|---|
| Strong (strict) local min | f(x*) < f(x) for all nearby x ≠ x* | Bottom of a bowl |
| Weak local min | f(x*) ≤ f(x) for nearby x, but equality possible | Flat valley floor |
| Global min | f(x*) ≤ f(x) for all x in the feasible set | The absolute lowest point |
A function may have many local minima but only one global minimum value (though possibly achieved at multiple points). The landscape of local minima is what makes optimization hard — and interesting.
For differentiable functions, we can state necessary and sufficient conditions for a point to be a local minimum.
Univariate case: A point x* is at a local minimum if:
For a guaranteed strong local min, we need the stronger condition f''(x*) > 0.
Multivariate case: Replace the derivative with the gradient and the second derivative with the Hessian matrix:
The Hessian ∇²f is the matrix of all second partial derivatives. Positive semidefinite means all eigenvalues are ≥ 0, which generalizes the univariate condition f'' ≥ 0 to multiple dimensions.
For a guaranteed strong local min: ∇f(x*) = 0 and ∇²f(x*) is positive definite (all eigenvalues strictly > 0).
Let's build intuition for what optimization looks like in 2D. The canvas below shows contour lines of a function f(x1, x2) = x1² + 4x2² (an elliptic paraboloid). Click anywhere to place a "design point" and see its function value.
Click on the canvas to place a point. The contour lines show constant values of f. The global minimum is at the origin.
The textbook is organized into five parts that build on each other:
The progression is logical: first learn to differentiate (Ch 2), then optimize in 1D (Ch 3), then in many dimensions without constraints (Ch 4-9), then with constraints (Ch 10-14), and finally tackle special structures (Ch 15-20).
Optimization is the science of finding the best solution from a set of feasible alternatives, guided by an objective function and constrained by the problem's requirements.
| Concept | What It Means |
|---|---|
| Design point x | A vector of design variables we can adjust |
| Objective function f(x) | The quantity we want to minimize |
| Feasible set X | The set of allowed design points (defined by constraints) |
| Global minimum | The lowest f-value over the entire feasible set |
| Local minimum | The lowest f-value in a neighborhood |
| First-order condition | ∇f(x*) = 0: gradient vanishes at a minimum |
| Second-order condition | ∇²f(x*) positive semidefinite: curvature is non-negative |
| No free lunch | No algorithm is universally best; structure matters |