Kochenderfer & Wheeler, Chapter 10

Constraints

Real problems have limits. Wings can't be infinitely thin, budgets can't be negative. Learn to optimize within boundaries.

Prerequisites: Chapter 5 (First-Order Methods).
10
Chapters
1
Simulations
10
Quizzes

Chapter 0: Why Constraints?

Every previous chapter assumed the domain was all of Rn. In practice, a hedge fund manager cannot sell more stock than they own, an airplane cannot have zero-thickness wings, and you cannot spend negative dollars.

The core idea: Constrained optimization finds the best solution within a feasible set X ⊂ Rn. Adding constraints can change the solution (if the unconstrained optimum is infeasible) or leave it unchanged (if it already satisfies the constraints). The challenge is to use the optimization methods we already know while respecting the boundaries.

The general constrained optimization problem is: minimize f(x) subject to x ∈ X. This chapter presents a toolbox of techniques to handle this: transformations, projections, Lagrange multipliers, penalty methods, and barrier methods.

Constraints are necessary in optimization when:

Chapter 1: Constraint Types

Any optimization problem can be rewritten using two types of constraints:

TypeFormExample
Equalityh(x) = 0Budget must exactly equal $100
Inequalityg(x) ≤ 0Weight must not exceed 50 kg

Greater-than inequalities g(x) ≥ 0 become less-than by negation: −g(x) ≤ 0. Equality constraints can be decomposed into two inequalities: h(x) ≤ 0 and −h(x) ≤ 0.

Why use functions instead of set membership? The functions h(x) and g(x) provide information about how far a point is from being feasible. This gradient information helps algorithms steer toward feasibility, whereas a simple "in or out" check gives no direction.
The advantage of expressing constraints as g(x) ≤ 0 rather than set membership is:

Chapter 2: Removing Constraints

Sometimes you can transform the problem so constraints vanish. Two approaches:

1. Bound transforms: An interval constraint a ≤ x ≤ b can be removed by substituting:

x = (b+a)/2 + (b−a)/2 · 2x̂ / (1 + x̂²)

This maps all of R to [a,b], so optimizing over x̂ ∈ R automatically satisfies a ≤ x ≤ b.

2. Equality substitution: If h(x) = c1x1 + c2x2 = 0, solve for x2 = −(c1/c2)x1 and substitute into f, eliminating one variable and one constraint.

Key insight: Removing constraints by transformation is the cleanest approach when possible. It reduces the constrained problem to an unconstrained one, letting you use any method from Chapters 4–9. The limitation: not all constraints can be easily transformed away.
A bound transform x = ta,b(x̂) converts a constrained problem into:

Chapter 3: Projected Methods

Another approach: run your favorite unconstrained method, and after each step, project back onto the feasible set. The projection finds the closest feasible point to the unconstrained iterate:

xfeasible = argminx ∈ X ||x − xunc||

For box constraints a ≤ x ≤ b, the projection is trivial: clamp each component to its nearest bound. For linear constraints, it becomes a quadratic program.

Think of it this way: Projected gradient descent is like walking downhill on a table that has edges. You follow the gradient, but when you reach an edge, you slide along it instead of falling off. The projection is the "slide" step.
For box constraints, projecting an infeasible point involves:

Chapter 4: Lagrange Multipliers

For equality constraints h(x) = 0, the method of Lagrange multipliers gives necessary conditions for optimality. The idea: at a constrained minimum, the gradient of f must be aligned with the gradient of h.

∇f(x) = λ∇h(x)

where λ is the Lagrange multiplier. We form the Lagrangian:

L(x, λ) = f(x) − λh(x)

Setting ∇L = 0 gives us both the alignment condition (∇f = λ∇h) and the constraint satisfaction (h(x) = 0).

The intuition: If you are walking along a mountain ridge (the constraint curve h(x) = 0), you are at a local minimum along that ridge when the slope of f is perpendicular to the ridge. But the gradient of h is also perpendicular to the ridge. So at the constrained minimum, ∇f and ∇h point in the same direction — they are aligned.
At a constrained minimum with h(x) = 0, the gradients ∇f and ∇h are:

Chapter 5: Inequality Constraints

With inequalities g(x) ≤ 0, the optimum can be either on the constraint boundary (constraint is active, g(x*) = 0) or in the interior (constraint is inactive, g(x*) < 0).

When active: the Lagrange condition holds with ∇f + μ∇g = 0, and we need μ ≥ 0 (the penalty pushes inward). When inactive: ∇f = 0 (as if unconstrained) and μ = 0 (the constraint does not matter).

This leads to complementary slackness: μ · g(x*) = 0. Either the constraint is tight (g = 0) or the multiplier is zero (μ = 0) — but not both positive simultaneously.

Key insight: Complementary slackness elegantly captures the two cases. An active constraint contributes to the optimality conditions through its multiplier μ > 0. An inactive constraint is irrelevant and has μ = 0. The constraint and its multiplier "take turns" being zero.
Complementary slackness means:

Chapter 6: KKT Conditions

The Karush-Kuhn-Tucker (KKT) conditions are the first-order necessary conditions for optimality with both equality and inequality constraints. They generalize the Lagrange conditions:

ConditionRequirementMeaning
Feasibilityg(x*) ≤ 0, h(x*) = 0Point satisfies all constraints
Dual feasibilityμ ≥ 0Inequality multipliers push inward
Complementary slacknessμ ⊙ g = 0Inactive constraints have zero multiplier
Stationarity∇f + ∑μi∇gi + ∑λj∇hj = 0Gradient of f lies in span of constraint gradients
The big picture: KKT conditions are to constrained optimization what ∇f = 0 is to unconstrained optimization. Any local minimum must satisfy KKT (assuming constraint qualification). For convex problems, KKT conditions are also sufficient — any KKT point is a global minimum.
The KKT conditions require that the gradient of f at the optimum:

Chapter 7: Penalty Methods

Penalty methods convert constrained problems to unconstrained ones by adding a penalty for constraint violations:

minimize f(x) + ρ · p(x)

Common penalty functions:

PenaltyFormulaProperty
Count∑(gi(x) > 0)Sharp boundary but no gradient info
Quadratic∑max(gi, 0)²Smooth but may need large ρ
Augmented Lagrangianρ/2 ∑hi² + ∑λihiWorks with smaller ρ

The augmented Lagrangian (method of multipliers) adds linear penalty terms that are updated each iteration: λ(k+1) = λ(k) + ρh(x(k+1)). This avoids needing ρ → ∞.

Think of it this way: The count penalty is a wall. The quadratic penalty is a slope. The augmented Lagrangian is a slope with an adjustable tilt. Each provides different tradeoffs between sharpness and smoothness.
The augmented Lagrangian improves on quadratic penalties by:

Chapter 8: Interior Point Methods

Interior point methods (barrier methods) keep all iterates strictly feasible by adding a barrier function that approaches infinity at constraint boundaries:

minimize f(x) + (1/ρ) pbarrier(x)

Common barriers: the inverse barrier p = −∑1/gi(x) and the log barrier p = −∑log(−gi(x)). As ρ increases, the barrier weakens and the solution approaches the true constrained optimum.

Interior vs. penalty: Penalty methods approach the optimum from outside the feasible set (infeasible iterates). Interior point methods approach from inside (always feasible). Interior point methods are preferred when intermediate infeasible solutions are unacceptable — for example, in real-time control where every iterate must be a valid action.

Interior point methods require a strictly feasible starting point. If one is not known, you can find it by solving an auxiliary problem: minimize s subject to g(x) ≤ s·1, starting from a large s.

Interior point methods guarantee that every iterate is:

Chapter 9: Summary

MethodApproachBest For
TransformRemove constraints algebraicallySimple bounds, linear equalities
Projected descentProject infeasible iteratesBox constraints, convex feasible sets
Lagrange multipliersAnalytical optimality conditionsEquality constraints, theory
Penalty methodsPenalize infeasibilityGeneral constraints, simple to implement
Augmented LagrangianPenalty + adaptive multipliersGeneral equality constraints
Interior pointBarrier function, stay feasibleWhen feasibility is required at all times
Looking ahead: Chapter 11 explores duality — transforming a constrained minimization into a maximization problem that provides lower bounds and enables powerful algorithms like ADMM.
Which method approaches the constrained optimum from inside the feasible set?