Every constrained minimization problem has a hidden mirror image — a maximization problem that bounds it from below.
In the previous chapter we learned how to handle constraints using Lagrange multipliers, penalty methods, and the KKT conditions. But there was always a lingering question: how do we know if our solution is actually optimal?
Duality gives us a powerful answer. Every constrained minimization problem has a dual problem — a maximization problem whose solution provides a lower bound on the original (primal) problem. If those bounds meet, we have proof of optimality.
Why should you care? Three reasons:
The concept of duality is rooted in the generalized Lagrangian from Chapter 10. Recall that for a constrained problem with inequality constraints g(x) ≤ 0 and equality constraints h(x) = 0, the Lagrangian is:
The dual variables μ and λ are not just algebraic artifacts — they encode how much each constraint "costs" the objective. This chapter unpacks that idea fully.
Let's make the two forms concrete. Start with the general constrained problem (the primal):
We rewrite it using the generalized Lagrangian. The primal form is:
Why does this work? If any constraint is violated, the inner maximization drives the Lagrangian to infinity — so the outer minimization avoids infeasible points. If all constraints are satisfied, the Lagrangian reduces to f(x) plus terms that the multipliers push toward zero.
Now we simply swap the order of min and max to get the dual form:
The inner minimization in the dual is often written as a function of the multipliers alone:
This is called the dual function. It is always concave, regardless of whether the original problem is convex. That means we can maximize it using gradient ascent and it has no spurious local maxima.
| Property | Primal | Dual |
|---|---|---|
| Direction | Minimization | Maximization |
| Variables | Design x | Multipliers μ, λ |
| Inner problem | max over μ, λ | min over x |
| Always convex? | Not necessarily | Yes (concave max) |
Here is one of the most useful results in optimization. It comes directly from a fact about min and max that holds for any function:
This is the max-min inequality. Think of it this way: if you get to choose second (after seeing what the other player did), you can always do at least as well as if you had to choose first. Applied to our Lagrangian:
where d* is the optimal dual value and p* is the optimal primal value. This relationship is called weak duality, and it always holds — no assumptions about convexity needed.
Consider a concrete example. Suppose you're minimizing sin(x) subject to x² ≤ 1. The primal optimum is at x = −1, giving p* ≈ −0.841. The Lagrangian is:
For any μ ≥ 0, we can minimize over x to get D(μ). Each such D(μ) is guaranteed to be at most −0.841. This is weak duality in action.
| μ | D(μ) = minx L(x, μ) | ≤ p* = −0.841? |
|---|---|---|
| 0.0 | −∞ (sin has no min) | Yes |
| 0.5 | ≈ −1.05 | Yes |
| 1.0 | ≈ −1.14 | Yes |
| d* ≈ 0.42 | ≈ −0.841 | Equal! |
In the last row, the dual achieves the primal value exactly. That's strong duality — the subject of the next chapter.
Weak duality says d* ≤ p*. But when does d* = p*? This is called strong duality, and it means the dual problem gives us not just a bound but the exact answer.
The difference p* − d* is the duality gap. When strong duality holds, the gap is zero.
Slater's condition is a constraint qualification. It rules out pathological cases where the constraints barely touch the feasible region in a degenerate way. For most practical problems with convex structure, Slater's condition holds.
Strong duality has deep practical consequences. If you solve the dual and get d*, then you know p* = d*. You can also recover the primal solution from the dual solution. This is the basis of many efficient optimization algorithms.
For non-convex problems, strong duality may not hold. The duality gap can be positive, meaning the dual gives a useful but imperfect bound. However, even a nonzero gap is useful — it tells you the worst-case suboptimality of your current solution.
The duality gap δ = p* − d* measures how far apart the primal and dual optima are. It serves as a convergence certificate: when an algorithm reduces the gap below a threshold, you can stop.
Many modern optimization algorithms (especially interior point methods) maintain both primal and dual iterates. At each step, they compute the gap and use it as a stopping criterion. This is far more reliable than checking gradient norms alone.
| Scenario | Duality Gap | Interpretation |
|---|---|---|
| Gap = 0 | Strong duality | Optimal solution found |
| Gap > 0 (small) | Near-optimal | Close to the true optimum |
| Gap > 0 (large) | Weak duality only | Non-convex problem, or poor dual bound |
In practice, solvers report both the primal objective and the dual objective. The difference — the duality gap — is the most trustworthy measure of solution quality.
The dual function D(μ, λ) = minx L(x, μ, λ) deserves a closer look. It has a remarkable property: it is always concave, even when the original problem is non-convex.
Why is it concave? For any fixed x, the Lagrangian L(x, μ, λ) is an affine function of μ and λ. The dual function is the pointwise minimum of these affine functions. The minimum of a family of affine functions is concave.
To evaluate D(μ, λ) for given multiplier values, you solve an unconstrained minimization of L over x. If the Lagrangian is unbounded below for some μ, λ, then D(μ, λ) = −∞. The effective domain of the dual function is the set of multiplier values where D is finite.
The gradient of D with respect to the multipliers is especially nice. At the minimizer x*(μ, λ):
The constraint violations themselves are the gradients! This makes dual ascent particularly intuitive: update the multipliers in the direction of constraint violation.
Chapter 10 introduced interior point methods: replace hard constraint boundaries with smooth logarithmic barriers and solve a sequence of unconstrained problems. Primal-dual methods extend this idea by simultaneously maintaining both primal variables x and dual variables μ, λ.
The idea: instead of solving the barrier problem for x alone, we solve the KKT system for both x and the multipliers at the same time. This gives us the duality gap as a free byproduct — a built-in stopping criterion.
At each iteration, the primal-dual method solves a linearized KKT system:
This links the dual variables directly to how close each constraint is to being active. A constraint far from its boundary has a small multiplier. A nearly-active constraint has a large multiplier.
The algorithm alternates:
Primal-dual interior point methods are the workhorse of modern optimization solvers. They handle linear, quadratic, and general convex programs with remarkable efficiency.
Since the dual function D(μ, λ) is concave, we can maximize it using gradient ascent on the multipliers. This is called dual ascent (or dual decomposition).
The algorithm is elegant:
pseudocode # Dual Ascent Initialize μ ≥ 0, λ for k = 1, 2, ...: x* ← argminx L(x, μ, λ) # solve inner minimization μ ← max(0, μ + α g(x*)) # ascent on inequality multipliers λ ← λ + α h(x*) # ascent on equality multipliers
Notice the update rules. For inequality constraints, μi increases when gi(x*) > 0 (constraint violated) and decreases when gi(x*) < 0 (constraint slack). The max(0, ...) projection ensures μ ≥ 0. For equality constraints, λj adjusts in proportion to hj(x*).
Dual ascent converges when strong duality holds and the step size α is chosen appropriately (e.g., diminishing step sizes αk → 0 with ∑ αk = ∞).
A powerful extension is the method of multipliers (augmented Lagrangian method), which adds a quadratic penalty to the Lagrangian to improve convergence:
Let's see primal and dual problems side by side. Below is an interactive visualization of the problem: minimize sin(x) subject to x² ≤ 1.
The left panel shows the primal objective (black curve) with the feasible region shaded. The right panel shows the dual function D(μ) = minx [sin(x) + μ(x² − 1)]. Drag the μ slider to see how the Lagrangian (purple curves, left) and the dual value (red dot, right) change.
Drag the slider to change μ. Watch the Lagrangian curve (purple) and dual value (red dot) move.
Duality transforms a constrained minimization problem into a mirror-image maximization problem. The dual always provides a lower bound (weak duality), and for convex problems with Slater's condition, it provides the exact answer (strong duality).
| Concept | What It Means |
|---|---|
| Primal problem | The original constrained minimization |
| Dual problem | maxμ≥0,λ minx L(x, μ, λ) |
| Dual function D | minx L(x, μ, λ) — always concave |
| Weak duality | d* ≤ p* always |
| Strong duality | d* = p* when convex + Slater's |
| Duality gap | p* − d* — measures suboptimality |
| Dual ascent | Gradient ascent on D — update multipliers by constraint violation |
| Primal-dual methods | Update x and μ,λ simultaneously using KKT |