Ch 11: Duality — Kochenderfer & Wheeler

Chapter 0: Why Duality?

In the previous chapter we learned how to handle constraints using Lagrange multipliers, penalty methods, and the KKT conditions. But there was always a lingering question: how do we know if our solution is actually optimal?

Duality gives us a powerful answer. Every constrained minimization problem has a dual problem — a maximization problem whose solution provides a lower bound on the original (primal) problem. If those bounds meet, we have proof of optimality.

The core idea: Think of the primal problem as asking "how low can the objective go while respecting constraints?" The dual asks "how high can a lower bound go?" When they agree, you've found the answer — and you have a certificate to prove it.

Why should you care? Three reasons:

Verification

The dual gives a lower bound. If primal = dual, you know you're optimal.

↓

Alternative Solving

Sometimes the dual is easier to solve than the primal.

↓

Economic Interpretation

Dual variables are "shadow prices" — they tell you the cost of tightening a constraint.

The concept of duality is rooted in the generalized Lagrangian from Chapter 10. Recall that for a constrained problem with inequality constraints g(x) ≤ 0 and equality constraints h(x) = 0, the Lagrangian is:

L(x, μ, λ) = f(x) + ∑_i μ_i g_i(x) + ∑_j λ_j h_j(x)

The dual variables μ and λ are not just algebraic artifacts — they encode how much each constraint "costs" the objective. This chapter unpacks that idea fully.

What does the dual problem provide for the primal problem?

An upper bound on the optimal value A lower bound on the optimal value The exact optimal value in all cases

Chapter 1: Primal & Dual Forms

Let's make the two forms concrete. Start with the general constrained problem (the primal):

minimize_x f(x) subject to g(x) ≤ 0, h(x) = 0

We rewrite it using the generalized Lagrangian. The primal form is:

minimize_x maximize_{μ≥0, λ} L(x, μ, λ)

Why does this work? If any constraint is violated, the inner maximization drives the Lagrangian to infinity — so the outer minimization avoids infeasible points. If all constraints are satisfied, the Lagrangian reduces to f(x) plus terms that the multipliers push toward zero.

Now we simply swap the order of min and max to get the dual form:

maximize_{μ≥0, λ} minimize_x L(x, μ, λ)

Key insight: The only difference between primal and dual is the order of min and max. Primal: min over x first, then max over multipliers. Dual: max over multipliers first, then min over x. This simple swap changes everything.

The inner minimization in the dual is often written as a function of the multipliers alone:

D(μ, λ) = minimize_x L(x, μ, λ)

This is called the dual function. It is always concave, regardless of whether the original problem is convex. That means we can maximize it using gradient ascent and it has no spurious local maxima.

Property	Primal	Dual
Direction	Minimization	Maximization
Variables	Design x	Multipliers μ, λ
Inner problem	max over μ, λ	min over x
Always convex?	Not necessarily	Yes (concave max)

How is the dual problem obtained from the primal?

By adding more constraints By removing the Lagrangian By swapping the order of min and max in the Lagrangian

Chapter 2: Weak Duality

Here is one of the most useful results in optimization. It comes directly from a fact about min and max that holds for any function:

max_a min_b f(a, b) ≤ min_b max_a f(a, b)

This is the max-min inequality. Think of it this way: if you get to choose second (after seeing what the other player did), you can always do at least as well as if you had to choose first. Applied to our Lagrangian:

d* ≤ p*

where d* is the optimal dual value and p* is the optimal primal value. This relationship is called weak duality, and it always holds — no assumptions about convexity needed.

Why this matters: For any feasible μ ≥ 0 and any λ, the dual function D(μ, λ) is a lower bound on p*. This means every dual-feasible point gives you a "floor" on how good the primal solution can be. If you find a primal solution with value p and a dual solution with value d, the true optimum is trapped in the interval [d, p].

Consider a concrete example. Suppose you're minimizing sin(x) subject to x² ≤ 1. The primal optimum is at x = −1, giving p* ≈ −0.841. The Lagrangian is:

L(x, μ) = sin(x) + μ(x² − 1)

For any μ ≥ 0, we can minimize over x to get D(μ). Each such D(μ) is guaranteed to be at most −0.841. This is weak duality in action.

μ	D(μ) = min_x L(x, μ)	≤ p* = −0.841?
0.0	−∞ (sin has no min)	Yes
0.5	≈ −1.05	Yes
1.0	≈ −1.14	Yes
d* ≈ 0.42	≈ −0.841	Equal!

In the last row, the dual achieves the primal value exactly. That's strong duality — the subject of the next chapter.

Weak duality tells us that for any dual-feasible point, D(μ, λ) is:

A lower bound on the primal optimum p* An upper bound on the primal optimum p* Equal to the primal optimum p*

Chapter 3: Strong Duality

Weak duality says d* ≤ p*. But when does d* = p*? This is called strong duality, and it means the dual problem gives us not just a bound but the exact answer.

The difference p* − d* is the duality gap. When strong duality holds, the gap is zero.

When does strong duality hold? The most common sufficient condition is: the problem is convex (convex objective, convex inequality constraints, affine equality constraints) and Slater's condition is satisfied — there exists at least one strictly feasible point where all inequality constraints hold with strict inequality.

Slater's condition is a constraint qualification. It rules out pathological cases where the constraints barely touch the feasible region in a degenerate way. For most practical problems with convex structure, Slater's condition holds.

Convex Problem

f convex, g_i convex, h_j affine

+ Slater's condition

Strong Duality

d* = p*, zero duality gap

↓ implies

KKT Sufficient

Any KKT point is globally optimal

Strong duality has deep practical consequences. If you solve the dual and get d*, then you know p* = d*. You can also recover the primal solution from the dual solution. This is the basis of many efficient optimization algorithms.

For non-convex problems, strong duality may not hold. The duality gap can be positive, meaning the dual gives a useful but imperfect bound. However, even a nonzero gap is useful — it tells you the worst-case suboptimality of your current solution.

What condition (beyond convexity) guarantees strong duality?

The objective must be linear Slater's condition: a strictly feasible point exists All constraints must be equalities

Chapter 4: The Duality Gap

The duality gap δ = p* − d* measures how far apart the primal and dual optima are. It serves as a convergence certificate: when an algorithm reduces the gap below a threshold, you can stop.

Key insight: You don't need to know p* to use the duality gap. If your current primal solution has value p and your current dual solution has value d, then the true optimum lies in [d, p]. When p − d is small enough, you're done.

Many modern optimization algorithms (especially interior point methods) maintain both primal and dual iterates. At each step, they compute the gap and use it as a stopping criterion. This is far more reliable than checking gradient norms alone.

Scenario	Duality Gap	Interpretation
Gap = 0	Strong duality	Optimal solution found
Gap > 0 (small)	Near-optimal	Close to the true optimum
Gap > 0 (large)	Weak duality only	Non-convex problem, or poor dual bound

In practice, solvers report both the primal objective and the dual objective. The difference — the duality gap — is the most trustworthy measure of solution quality.

An optimizer reports primal value 3.2 and dual value 3.0. What do you know?

The true optimum is between 3.0 and 3.2 The true optimum is exactly 3.1 The problem is infeasible

Chapter 5: The Dual Function

The dual function D(μ, λ) = min_x L(x, μ, λ) deserves a closer look. It has a remarkable property: it is always concave, even when the original problem is non-convex.

Why is it concave? For any fixed x, the Lagrangian L(x, μ, λ) is an affine function of μ and λ. The dual function is the pointwise minimum of these affine functions. The minimum of a family of affine functions is concave.

Practical consequence: Since D is concave, maximizing D (the dual problem) is a convex optimization problem. Gradient ascent will converge to the global maximum — no local optima to worry about. This is true even when the primal is non-convex.

To evaluate D(μ, λ) for given multiplier values, you solve an unconstrained minimization of L over x. If the Lagrangian is unbounded below for some μ, λ, then D(μ, λ) = −∞. The effective domain of the dual function is the set of multiplier values where D is finite.

The gradient of D with respect to the multipliers is especially nice. At the minimizer x*(μ, λ):

∇_μ D = g(x*) ∇_λ D = h(x*)

The constraint violations themselves are the gradients! This makes dual ascent particularly intuitive: update the multipliers in the direction of constraint violation.

Why is the dual function always concave?

Because the primal is always convex Because it is the pointwise minimum of affine functions Because the multipliers must be nonnegative

Chapter 6: Primal-Dual Interior Point

Chapter 10 introduced interior point methods: replace hard constraint boundaries with smooth logarithmic barriers and solve a sequence of unconstrained problems. Primal-dual methods extend this idea by simultaneously maintaining both primal variables x and dual variables μ, λ.

The idea: instead of solving the barrier problem for x alone, we solve the KKT system for both x and the multipliers at the same time. This gives us the duality gap as a free byproduct — a built-in stopping criterion.

Key insight: The logarithmic barrier turns inequality constraints g_i(x) ≤ 0 into smooth penalties −(1/ρ) ∑ log(−g_i(x)). As ρ → ∞, the barrier approximates the hard constraint. The dual variables emerge naturally from the KKT conditions of the barrier problem.

At each iteration, the primal-dual method solves a linearized KKT system:

μ_i = −1 / (ρ g_i(x))

This links the dual variables directly to how close each constraint is to being active. A constraint far from its boundary has a small multiplier. A nearly-active constraint has a large multiplier.

The algorithm alternates:

Newton Step

Solve the linearized KKT system for Δx, Δμ, Δλ

↓

Line Search

Ensure primal feasibility (g_i < 0) and dual feasibility (μ_i > 0)

↓

Check Gap

Compute duality gap; if small, stop

Primal-dual interior point methods are the workhorse of modern optimization solvers. They handle linear, quadratic, and general convex programs with remarkable efficiency.

In a primal-dual interior point method, what provides the stopping criterion?

The gradient norm of f A fixed number of iterations The duality gap between primal and dual objectives

Chapter 7: Dual Ascent

Since the dual function D(μ, λ) is concave, we can maximize it using gradient ascent on the multipliers. This is called dual ascent (or dual decomposition).

The algorithm is elegant:

pseudocode
# Dual Ascent
Initialize μ ≥ 0, λ
for k = 1, 2, ...:
  x* ← argmin_x L(x, μ, λ)     # solve inner minimization
  μ ← max(0, μ + α g(x*))    # ascent on inequality multipliers
  λ ← λ + α h(x*)              # ascent on equality multipliers

Notice the update rules. For inequality constraints, μ_i increases when g_i(x*) > 0 (constraint violated) and decreases when g_i(x*) < 0 (constraint slack). The max(0, ...) projection ensures μ ≥ 0. For equality constraints, λ_j adjusts in proportion to h_j(x*).

Intuition: If a constraint is being violated, increase its penalty (multiplier). If it's slack, decrease the penalty. The multipliers self-adjust until the right balance is found. It's like a thermostat: too hot → increase cooling; too cold → decrease cooling.

Dual ascent converges when strong duality holds and the step size α is chosen appropriately (e.g., diminishing step sizes α_k → 0 with ∑ α_k = ∞).

A powerful extension is the method of multipliers (augmented Lagrangian method), which adds a quadratic penalty to the Lagrangian to improve convergence:

L_ρ(x, μ, λ) = f(x) + ∑_i μ_i g_i(x) + ∑_j λ_j h_j(x) + (ρ/2) ||h(x)||²

In dual ascent, what determines the direction of the multiplier update?

The constraint violation at the current primal solution The gradient of the objective f A random direction

Chapter 8: Duality Visualizer

Let's see primal and dual problems side by side. Below is an interactive visualization of the problem: minimize sin(x) subject to x² ≤ 1.

The left panel shows the primal objective (black curve) with the feasible region shaded. The right panel shows the dual function D(μ) = min_x [sin(x) + μ(x² − 1)]. Drag the μ slider to see how the Lagrangian (purple curves, left) and the dual value (red dot, right) change.

Primal-Dual Visualization

Drag the slider to change μ. Watch the Lagrangian curve (purple) and dual value (red dot) move.

μ0.80

What to notice: As you increase μ, the Lagrangian curve bends upward (the penalty for violating x² ≤ 1 increases). The dual value D(μ) first rises then falls — the peak is the dual optimum d*. At the peak, d* = p* (strong duality holds for this convex problem). The duality gap is zero.

At the optimal μ*, the dual function D(μ*) equals:

Zero The primal optimum p* Positive infinity

Chapter 9: Summary

Duality transforms a constrained minimization problem into a mirror-image maximization problem. The dual always provides a lower bound (weak duality), and for convex problems with Slater's condition, it provides the exact answer (strong duality).

Concept	What It Means
Primal problem	The original constrained minimization
Dual problem	max_μ≥0,λ min_x L(x, μ, λ)
Dual function D	min_x L(x, μ, λ) — always concave
Weak duality	d* ≤ p* always
Strong duality	d* = p* when convex + Slater's
Duality gap	p* − d* — measures suboptimality
Dual ascent	Gradient ascent on D — update multipliers by constraint violation
Primal-dual methods	Update x and μ,λ simultaneously using KKT

Looking ahead: Duality is the foundation for linear programming (Chapter 12), where every LP has a well-defined dual LP. It also powers interior point methods for quadratic and convex programs. In machine learning, duality appears in SVMs (the kernel trick is a dual formulation) and in maximum entropy models.

For a non-convex problem, the dual function D is:

Still concave — concavity is guaranteed regardless of the primal Also non-convex Undefined