Kochenderfer & Wheeler, Chapter 11

Duality

Every constrained minimization problem has a hidden mirror image — a maximization problem that bounds it from below.

Prerequisites: Chapter 10 (Constraints, Lagrangians, KKT conditions).
10
Chapters
2
Simulations
10
Quizzes

Chapter 0: Why Duality?

In the previous chapter we learned how to handle constraints using Lagrange multipliers, penalty methods, and the KKT conditions. But there was always a lingering question: how do we know if our solution is actually optimal?

Duality gives us a powerful answer. Every constrained minimization problem has a dual problem — a maximization problem whose solution provides a lower bound on the original (primal) problem. If those bounds meet, we have proof of optimality.

The core idea: Think of the primal problem as asking "how low can the objective go while respecting constraints?" The dual asks "how high can a lower bound go?" When they agree, you've found the answer — and you have a certificate to prove it.

Why should you care? Three reasons:

Verification
The dual gives a lower bound. If primal = dual, you know you're optimal.
Alternative Solving
Sometimes the dual is easier to solve than the primal.
Economic Interpretation
Dual variables are "shadow prices" — they tell you the cost of tightening a constraint.

The concept of duality is rooted in the generalized Lagrangian from Chapter 10. Recall that for a constrained problem with inequality constraints g(x) ≤ 0 and equality constraints h(x) = 0, the Lagrangian is:

L(x, μ, λ) = f(x) + ∑i μi gi(x) + ∑j λj hj(x)

The dual variables μ and λ are not just algebraic artifacts — they encode how much each constraint "costs" the objective. This chapter unpacks that idea fully.

What does the dual problem provide for the primal problem?

Chapter 1: Primal & Dual Forms

Let's make the two forms concrete. Start with the general constrained problem (the primal):

minimizex f(x)   subject to   g(x) ≤ 0,   h(x) = 0

We rewrite it using the generalized Lagrangian. The primal form is:

minimizex maximizeμ≥0, λ L(x, μ, λ)

Why does this work? If any constraint is violated, the inner maximization drives the Lagrangian to infinity — so the outer minimization avoids infeasible points. If all constraints are satisfied, the Lagrangian reduces to f(x) plus terms that the multipliers push toward zero.

Now we simply swap the order of min and max to get the dual form:

maximizeμ≥0, λ minimizex L(x, μ, λ)
Key insight: The only difference between primal and dual is the order of min and max. Primal: min over x first, then max over multipliers. Dual: max over multipliers first, then min over x. This simple swap changes everything.

The inner minimization in the dual is often written as a function of the multipliers alone:

D(μ, λ) = minimizex L(x, μ, λ)

This is called the dual function. It is always concave, regardless of whether the original problem is convex. That means we can maximize it using gradient ascent and it has no spurious local maxima.

PropertyPrimalDual
DirectionMinimizationMaximization
VariablesDesign xMultipliers μ, λ
Inner problemmax over μ, λmin over x
Always convex?Not necessarilyYes (concave max)
How is the dual problem obtained from the primal?

Chapter 2: Weak Duality

Here is one of the most useful results in optimization. It comes directly from a fact about min and max that holds for any function:

maxa minb f(a, b) ≤ minb maxa f(a, b)

This is the max-min inequality. Think of it this way: if you get to choose second (after seeing what the other player did), you can always do at least as well as if you had to choose first. Applied to our Lagrangian:

d* ≤ p*

where d* is the optimal dual value and p* is the optimal primal value. This relationship is called weak duality, and it always holds — no assumptions about convexity needed.

Why this matters: For any feasible μ ≥ 0 and any λ, the dual function D(μ, λ) is a lower bound on p*. This means every dual-feasible point gives you a "floor" on how good the primal solution can be. If you find a primal solution with value p and a dual solution with value d, the true optimum is trapped in the interval [d, p].

Consider a concrete example. Suppose you're minimizing sin(x) subject to x² ≤ 1. The primal optimum is at x = −1, giving p* ≈ −0.841. The Lagrangian is:

L(x, μ) = sin(x) + μ(x² − 1)

For any μ ≥ 0, we can minimize over x to get D(μ). Each such D(μ) is guaranteed to be at most −0.841. This is weak duality in action.

μD(μ) = minx L(x, μ)≤ p* = −0.841?
0.0−∞ (sin has no min)Yes
0.5≈ −1.05Yes
1.0≈ −1.14Yes
d* ≈ 0.42≈ −0.841Equal!

In the last row, the dual achieves the primal value exactly. That's strong duality — the subject of the next chapter.

Weak duality tells us that for any dual-feasible point, D(μ, λ) is:

Chapter 3: Strong Duality

Weak duality says d* ≤ p*. But when does d* = p*? This is called strong duality, and it means the dual problem gives us not just a bound but the exact answer.

The difference p* − d* is the duality gap. When strong duality holds, the gap is zero.

When does strong duality hold? The most common sufficient condition is: the problem is convex (convex objective, convex inequality constraints, affine equality constraints) and Slater's condition is satisfied — there exists at least one strictly feasible point where all inequality constraints hold with strict inequality.

Slater's condition is a constraint qualification. It rules out pathological cases where the constraints barely touch the feasible region in a degenerate way. For most practical problems with convex structure, Slater's condition holds.

Convex Problem
f convex, gi convex, hj affine
+ Slater's condition
Strong Duality
d* = p*, zero duality gap
↓ implies
KKT Sufficient
Any KKT point is globally optimal

Strong duality has deep practical consequences. If you solve the dual and get d*, then you know p* = d*. You can also recover the primal solution from the dual solution. This is the basis of many efficient optimization algorithms.

For non-convex problems, strong duality may not hold. The duality gap can be positive, meaning the dual gives a useful but imperfect bound. However, even a nonzero gap is useful — it tells you the worst-case suboptimality of your current solution.

What condition (beyond convexity) guarantees strong duality?

Chapter 4: The Duality Gap

The duality gap δ = p* − d* measures how far apart the primal and dual optima are. It serves as a convergence certificate: when an algorithm reduces the gap below a threshold, you can stop.

Key insight: You don't need to know p* to use the duality gap. If your current primal solution has value p and your current dual solution has value d, then the true optimum lies in [d, p]. When p − d is small enough, you're done.

Many modern optimization algorithms (especially interior point methods) maintain both primal and dual iterates. At each step, they compute the gap and use it as a stopping criterion. This is far more reliable than checking gradient norms alone.

ScenarioDuality GapInterpretation
Gap = 0Strong dualityOptimal solution found
Gap > 0 (small)Near-optimalClose to the true optimum
Gap > 0 (large)Weak duality onlyNon-convex problem, or poor dual bound

In practice, solvers report both the primal objective and the dual objective. The difference — the duality gap — is the most trustworthy measure of solution quality.

An optimizer reports primal value 3.2 and dual value 3.0. What do you know?

Chapter 5: The Dual Function

The dual function D(μ, λ) = minx L(x, μ, λ) deserves a closer look. It has a remarkable property: it is always concave, even when the original problem is non-convex.

Why is it concave? For any fixed x, the Lagrangian L(x, μ, λ) is an affine function of μ and λ. The dual function is the pointwise minimum of these affine functions. The minimum of a family of affine functions is concave.

Practical consequence: Since D is concave, maximizing D (the dual problem) is a convex optimization problem. Gradient ascent will converge to the global maximum — no local optima to worry about. This is true even when the primal is non-convex.

To evaluate D(μ, λ) for given multiplier values, you solve an unconstrained minimization of L over x. If the Lagrangian is unbounded below for some μ, λ, then D(μ, λ) = −∞. The effective domain of the dual function is the set of multiplier values where D is finite.

The gradient of D with respect to the multipliers is especially nice. At the minimizer x*(μ, λ):

μ D = g(x*)     ∇λ D = h(x*)

The constraint violations themselves are the gradients! This makes dual ascent particularly intuitive: update the multipliers in the direction of constraint violation.

Why is the dual function always concave?

Chapter 6: Primal-Dual Interior Point

Chapter 10 introduced interior point methods: replace hard constraint boundaries with smooth logarithmic barriers and solve a sequence of unconstrained problems. Primal-dual methods extend this idea by simultaneously maintaining both primal variables x and dual variables μ, λ.

The idea: instead of solving the barrier problem for x alone, we solve the KKT system for both x and the multipliers at the same time. This gives us the duality gap as a free byproduct — a built-in stopping criterion.

Key insight: The logarithmic barrier turns inequality constraints gi(x) ≤ 0 into smooth penalties −(1/ρ) ∑ log(−gi(x)). As ρ → ∞, the barrier approximates the hard constraint. The dual variables emerge naturally from the KKT conditions of the barrier problem.

At each iteration, the primal-dual method solves a linearized KKT system:

μi = −1 / (ρ gi(x))

This links the dual variables directly to how close each constraint is to being active. A constraint far from its boundary has a small multiplier. A nearly-active constraint has a large multiplier.

The algorithm alternates:

Newton Step
Solve the linearized KKT system for Δx, Δμ, Δλ
Line Search
Ensure primal feasibility (gi < 0) and dual feasibility (μi > 0)
Check Gap
Compute duality gap; if small, stop

Primal-dual interior point methods are the workhorse of modern optimization solvers. They handle linear, quadratic, and general convex programs with remarkable efficiency.

In a primal-dual interior point method, what provides the stopping criterion?

Chapter 7: Dual Ascent

Since the dual function D(μ, λ) is concave, we can maximize it using gradient ascent on the multipliers. This is called dual ascent (or dual decomposition).

The algorithm is elegant:

pseudocode
# Dual Ascent
Initialize μ ≥ 0, λ
for k = 1, 2, ...:
  x* ← argminx L(x, μ, λ)     # solve inner minimization
  μ ← max(0, μ + α g(x*))    # ascent on inequality multipliers
  λ ← λ + α h(x*)              # ascent on equality multipliers

Notice the update rules. For inequality constraints, μi increases when gi(x*) > 0 (constraint violated) and decreases when gi(x*) < 0 (constraint slack). The max(0, ...) projection ensures μ ≥ 0. For equality constraints, λj adjusts in proportion to hj(x*).

Intuition: If a constraint is being violated, increase its penalty (multiplier). If it's slack, decrease the penalty. The multipliers self-adjust until the right balance is found. It's like a thermostat: too hot → increase cooling; too cold → decrease cooling.

Dual ascent converges when strong duality holds and the step size α is chosen appropriately (e.g., diminishing step sizes αk → 0 with ∑ αk = ∞).

A powerful extension is the method of multipliers (augmented Lagrangian method), which adds a quadratic penalty to the Lagrangian to improve convergence:

Lρ(x, μ, λ) = f(x) + ∑i μi gi(x) + ∑j λj hj(x) + (ρ/2) ||h(x)||²
In dual ascent, what determines the direction of the multiplier update?

Chapter 8: Duality Visualizer

Let's see primal and dual problems side by side. Below is an interactive visualization of the problem: minimize sin(x) subject to x² ≤ 1.

The left panel shows the primal objective (black curve) with the feasible region shaded. The right panel shows the dual function D(μ) = minx [sin(x) + μ(x² − 1)]. Drag the μ slider to see how the Lagrangian (purple curves, left) and the dual value (red dot, right) change.

Primal-Dual Visualization

Drag the slider to change μ. Watch the Lagrangian curve (purple) and dual value (red dot) move.

μ0.80
What to notice: As you increase μ, the Lagrangian curve bends upward (the penalty for violating x² ≤ 1 increases). The dual value D(μ) first rises then falls — the peak is the dual optimum d*. At the peak, d* = p* (strong duality holds for this convex problem). The duality gap is zero.
At the optimal μ*, the dual function D(μ*) equals:

Chapter 9: Summary

Duality transforms a constrained minimization problem into a mirror-image maximization problem. The dual always provides a lower bound (weak duality), and for convex problems with Slater's condition, it provides the exact answer (strong duality).

ConceptWhat It Means
Primal problemThe original constrained minimization
Dual problemmaxμ≥0,λ minx L(x, μ, λ)
Dual function Dminx L(x, μ, λ) — always concave
Weak dualityd* ≤ p* always
Strong dualityd* = p* when convex + Slater's
Duality gapp* − d* — measures suboptimality
Dual ascentGradient ascent on D — update multipliers by constraint violation
Primal-dual methodsUpdate x and μ,λ simultaneously using KKT
Looking ahead: Duality is the foundation for linear programming (Chapter 12), where every LP has a well-defined dual LP. It also powers interior point methods for quadratic and convex programs. In machine learning, duality appears in SVMs (the kernel trick is a dual formulation) and in maximum entropy models.
For a non-convex problem, the dual function D is: