Converting constrained problems to unconstrained ones — and doing it well.
Given the equality-constrained problem:
$$\min_{\mathbf{x}} f(\mathbf{x}) \quad \text{s.t.} \quad \mathbf{c}(\mathbf{x}) = \mathbf{0}$$We replace the constraints with a penalty term:
$$\min_{\mathbf{x}}\; f(\mathbf{x}) + \mu \sum_i c_i(\mathbf{x})^2 \quad \longleftrightarrow \quad \min_{\mathbf{x}}\; f(\mathbf{x}) + \mu\,\mathbf{c}(\mathbf{x})^T\mathbf{c}(\mathbf{x})$$Large $\mu$ penalizes constraint violation heavily, pushing the solution toward feasibility.
Example: $\min\; x + y$ subject to $x^2 + y^2 - 1 = 0$.
With small $\mu$, the surface is a gentle bowl — easy to optimize but the minimum is far from the constraint circle. With large $\mu$, the surface forms a steep-walled trough along the circle — the minimum is close to the constraint, but the landscape is highly ill-conditioned.
The dashed circle is the constraint $x^2 + y^2 = 1$. The star marks the true constrained solution at $(-1/\sqrt{2}, -1/\sqrt{2})$.
If we use the global minimizer of each penalized subproblem, then as $\mu_k \to \infty$ the solutions converge to a solution of the constrained problem.
If we approximately minimize each subproblem (to gradient norm $\|\mathbf{g}(\mathbf{x}_k)\| \le \tau_k$ with $\tau_k \to 0$), then a limit point of the sequence is either:
The penalty method applies the same $\mu$ to every constraint. But some constraints interact more strongly with the objective than others.
The fix: instead of just penalizing, also estimate Lagrange multipliers $\lambda$ for each constraint:
$$\mathcal{L}(\mathbf{x}; \lambda, \mu) = f(\mathbf{x}) - \lambda^T\mathbf{c}(\mathbf{x}) + \frac{\mu}{2}\|\mathbf{c}(\mathbf{x})\|^2$$If we minimize in $\mathbf{x}$ alone, the stationarity condition is:
$$\nabla_\mathbf{x} \mathcal{L} = \mathbf{g}_f(\mathbf{x}) - \mathbf{J}_c(\mathbf{x})^T(\lambda - \mu\,\mathbf{c}(\mathbf{x})) = 0$$Compare with the KKT condition for the original problem:
$$\mathbf{g}_f(\mathbf{x}^*) - \mathbf{J}_c(\mathbf{x}^*)^T\lambda^* = 0$$At a solution where $\mathbf{c}(\mathbf{x}^*) = 0$, these match if $\lambda = \lambda^*$. This suggests the multiplier update:
$$\lambda_{k+1} = \lambda_k - \mu_k\,\mathbf{c}(\mathbf{x}_k)$$The Lagrange multipliers $\lambda_k$ allow the method to weight different constraints differently, directly addressing the "hanging net" problem.
Recall the KKT conditions for the equality-constrained problem $\min f(\mathbf{x})$ s.t. $\mathbf{c}(\mathbf{x}) = 0$:
Suppose $(\mathbf{x}^*, \lambda^*)$ is a KKT point satisfying second-order sufficient conditions, and the constraint Jacobian $\mathbf{J}_c(\mathbf{x}^*)$ has full rank. Then for $\mu$ sufficiently large, $\mathbf{x}^*$ is a strict local minimizer of $\mathcal{L}(\mathbf{x}; \lambda^*, \mu)$.
This means if we know the true multipliers, we can solve the augmented Lagrangian subproblem for a finite $\mu$.
Under the same conditions, the augmented Lagrangian algorithm converges: the multiplier estimates $\lambda_k \to \lambda^*$, and the iterates $\mathbf{x}_k \to \mathbf{x}^*$, for $\mu$ bounded away from zero.
For inequality constraints $\mathbf{d}(\mathbf{x}) \ge 0$, barrier (or interior-point) methods add a logarithmic penalty that prevents iterates from leaving the feasible region:
$$\min_{\mathbf{x}}\; f(\mathbf{x}) - \mu \sum_i \log\big(d_i(\mathbf{x})\big)$$As $\mu \to 0$, the barrier term weakens and the solution approaches the constrained optimum. Unlike penalty methods, iterates stay feasible throughout.
A common practical approach combines barrier methods with equality constraint handling:
Reference: Chapter 16, Griva, Sofer & Nash.
| Method | Key Idea |
|---|---|
| SQP | Sequential Quadratic Programming: at each step, solve a QP that approximates the NLP locally. Combines a quadratic model of the Lagrangian with linearized constraints. SNOPT is a well-known implementation. |
| Gradient projection | Project the gradient step onto the feasible set. Natural and efficient for bound constraints ($\ell \le \mathbf{x} \le \mathbf{u}$), where projection is just clamping. |