Kochenderfer & Wheeler, Chapter 10

Constraints

Real problems have limits. Wings can't be infinitely thin, budgets can't be negative. Learn to optimize within boundaries.

Prerequisites: Chapter 5 (First-Order Methods).

Chapters

Simulations

Quizzes

Chapter 0: Why Constraints?

Every previous chapter assumed the domain was all of Rⁿ. In practice, a hedge fund manager cannot sell more stock than they own, an airplane cannot have zero-thickness wings, and you cannot spend negative dollars.

The core idea: Constrained optimization finds the best solution within a feasible set X ⊂ Rⁿ. Adding constraints can change the solution (if the unconstrained optimum is infeasible) or leave it unchanged (if it already satisfies the constraints). The challenge is to use the optimization methods we already know while respecting the boundaries.

The general constrained optimization problem is: minimize f(x) subject to x ∈ X. This chapter presents a toolbox of techniques to handle this: transformations, projections, Lagrange multipliers, penalty methods, and barrier methods.

Constraints are necessary in optimization when:

Design variables must satisfy physical, economic, or logical limits The objective function is quadratic The function has no local minima

Chapter 1: Constraint Types

Any optimization problem can be rewritten using two types of constraints:

Type	Form	Example
Equality	h(x) = 0	Budget must exactly equal $100
Inequality	g(x) ≤ 0	Weight must not exceed 50 kg

Greater-than inequalities g(x) ≥ 0 become less-than by negation: −g(x) ≤ 0. Equality constraints can be decomposed into two inequalities: h(x) ≤ 0 and −h(x) ≤ 0.

Why use functions instead of set membership? The functions h(x) and g(x) provide information about how far a point is from being feasible. This gradient information helps algorithms steer toward feasibility, whereas a simple "in or out" check gives no direction.

The advantage of expressing constraints as g(x) ≤ 0 rather than set membership is:

The function g provides gradient information about the distance to feasibility It eliminates the need for optimization It guarantees convexity

Chapter 2: Removing Constraints

Sometimes you can transform the problem so constraints vanish. Two approaches:

1. Bound transforms: An interval constraint a ≤ x ≤ b can be removed by substituting:

x = (b+a)/2 + (b−a)/2 · 2x̂ / (1 + x̂²)

This maps all of R to [a,b], so optimizing over x̂ ∈ R automatically satisfies a ≤ x ≤ b.

2. Equality substitution: If h(x) = c₁x₁ + c₂x₂ = 0, solve for x₂ = −(c₁/c₂)x₁ and substitute into f, eliminating one variable and one constraint.

Key insight: Removing constraints by transformation is the cleanest approach when possible. It reduces the constrained problem to an unconstrained one, letting you use any method from Chapters 4–9. The limitation: not all constraints can be easily transformed away.

A bound transform x = t_a,b(x̂) converts a constrained problem into:

An unconstrained problem over x̂ that automatically satisfies the bounds A problem with more constraints A linear program

Chapter 3: Projected Methods

Another approach: run your favorite unconstrained method, and after each step, project back onto the feasible set. The projection finds the closest feasible point to the unconstrained iterate:

x_feasible = argmin_{x ∈ X} ||x − x_unc||

For box constraints a ≤ x ≤ b, the projection is trivial: clamp each component to its nearest bound. For linear constraints, it becomes a quadratic program.

Think of it this way: Projected gradient descent is like walking downhill on a table that has edges. You follow the gradient, but when you reach an edge, you slide along it instead of falling off. The projection is the "slide" step.

For box constraints, projecting an infeasible point involves:

Clamping each component to the nearest bound Solving a quadratic program Restarting the optimization

Chapter 4: Lagrange Multipliers

For equality constraints h(x) = 0, the method of Lagrange multipliers gives necessary conditions for optimality. The idea: at a constrained minimum, the gradient of f must be aligned with the gradient of h.

∇f(x) = λ∇h(x)

where λ is the Lagrange multiplier. We form the Lagrangian:

L(x, λ) = f(x) − λh(x)

Setting ∇L = 0 gives us both the alignment condition (∇f = λ∇h) and the constraint satisfaction (h(x) = 0).

The intuition: If you are walking along a mountain ridge (the constraint curve h(x) = 0), you are at a local minimum along that ridge when the slope of f is perpendicular to the ridge. But the gradient of h is also perpendicular to the ridge. So at the constrained minimum, ∇f and ∇h point in the same direction — they are aligned.

At a constrained minimum with h(x) = 0, the gradients ∇f and ∇h are:

Aligned (one is a scalar multiple of the other) Perpendicular Both zero

Chapter 5: Inequality Constraints

With inequalities g(x) ≤ 0, the optimum can be either on the constraint boundary (constraint is active, g(x*) = 0) or in the interior (constraint is inactive, g(x*) < 0).

When active: the Lagrange condition holds with ∇f + μ∇g = 0, and we need μ ≥ 0 (the penalty pushes inward). When inactive: ∇f = 0 (as if unconstrained) and μ = 0 (the constraint does not matter).

This leads to complementary slackness: μ · g(x*) = 0. Either the constraint is tight (g = 0) or the multiplier is zero (μ = 0) — but not both positive simultaneously.

Key insight: Complementary slackness elegantly captures the two cases. An active constraint contributes to the optimality conditions through its multiplier μ > 0. An inactive constraint is irrelevant and has μ = 0. The constraint and its multiplier "take turns" being zero.

Complementary slackness means:

For each constraint, either g(x*) = 0 or its multiplier μ = 0 All constraints must be active at the optimum All multipliers must be positive

Chapter 6: KKT Conditions

The Karush-Kuhn-Tucker (KKT) conditions are the first-order necessary conditions for optimality with both equality and inequality constraints. They generalize the Lagrange conditions:

Condition	Requirement	Meaning
Feasibility	g(x) ≤ 0, h(x) = 0	Point satisfies all constraints
Dual feasibility	μ ≥ 0	Inequality multipliers push inward
Complementary slackness	μ ⊙ g = 0	Inactive constraints have zero multiplier
Stationarity	∇f + ∑μ_i∇g_i + ∑λ_j∇h_j = 0	Gradient of f lies in span of constraint gradients

The big picture: KKT conditions are to constrained optimization what ∇f = 0 is to unconstrained optimization. Any local minimum must satisfy KKT (assuming constraint qualification). For convex problems, KKT conditions are also sufficient — any KKT point is a global minimum.

The KKT conditions require that the gradient of f at the optimum:

Lies in the span of the gradients of the active constraints Is zero Points away from the feasible region

Chapter 7: Penalty Methods

Penalty methods convert constrained problems to unconstrained ones by adding a penalty for constraint violations:

minimize f(x) + ρ · p(x)

Common penalty functions:

Penalty	Formula	Property
Count	∑(g_i(x) > 0)	Sharp boundary but no gradient info
Quadratic	∑max(g_i, 0)²	Smooth but may need large ρ
Augmented Lagrangian	ρ/2 ∑h_i² + ∑λ_ih_i	Works with smaller ρ

The augmented Lagrangian (method of multipliers) adds linear penalty terms that are updated each iteration: λ^(k+1) = λ^(k) + ρh(x^(k+1)). This avoids needing ρ → ∞.

Think of it this way: The count penalty is a wall. The quadratic penalty is a slope. The augmented Lagrangian is a slope with an adjustable tilt. Each provides different tradeoffs between sharpness and smoothness.

The augmented Lagrangian improves on quadratic penalties by:

Not requiring the penalty weight to approach infinity Eliminating all constraints Using derivatives of the constraints

Chapter 8: Interior Point Methods

Interior point methods (barrier methods) keep all iterates strictly feasible by adding a barrier function that approaches infinity at constraint boundaries:

minimize f(x) + (1/ρ) p_barrier(x)

Common barriers: the inverse barrier p = −∑1/g_i(x) and the log barrier p = −∑log(−g_i(x)). As ρ increases, the barrier weakens and the solution approaches the true constrained optimum.

Interior vs. penalty: Penalty methods approach the optimum from outside the feasible set (infeasible iterates). Interior point methods approach from inside (always feasible). Interior point methods are preferred when intermediate infeasible solutions are unacceptable — for example, in real-time control where every iterate must be a valid action.

Interior point methods require a strictly feasible starting point. If one is not known, you can find it by solving an auxiliary problem: minimize s subject to g(x) ≤ s·1, starting from a large s.

Interior point methods guarantee that every iterate is:

Strictly inside the feasible set On the constraint boundary Infeasible but approaching feasibility

Chapter 9: Summary

Method	Approach	Best For
Transform	Remove constraints algebraically	Simple bounds, linear equalities
Projected descent	Project infeasible iterates	Box constraints, convex feasible sets
Lagrange multipliers	Analytical optimality conditions	Equality constraints, theory
Penalty methods	Penalize infeasibility	General constraints, simple to implement
Augmented Lagrangian	Penalty + adaptive multipliers	General equality constraints
Interior point	Barrier function, stay feasible	When feasibility is required at all times

Looking ahead: Chapter 11 explores duality — transforming a constrained minimization into a maximization problem that provides lower bounds and enables powerful algorithms like ADMM.

Which method approaches the constrained optimum from inside the feasible set?

Quadratic penalty Count penalty Interior point (barrier) method