Nonlinear Programming

Constrained optimization: the setup, why it's hard, and the overarching strategy.

The three problem types

Equality constrained

$$\begin{aligned}\min_{\mathbf{x}} &\quad f(\mathbf{x}) \\ \text{s.t.} &\quad \mathbf{c}(\mathbf{x}) = \mathbf{0}\end{aligned}$$

Inequality constrained

$$\begin{aligned}\min_{\mathbf{x}} &\quad f(\mathbf{x}) \\ \text{s.t.} &\quad \mathbf{d}(\mathbf{x}) \ge \mathbf{0}\end{aligned}$$

General optimization

$$\begin{aligned}\min_{\mathbf{x}} &\quad f(\mathbf{x}) \\ \text{s.t.} &\quad \mathbf{c}(\mathbf{x}) = \mathbf{0} \\ &\quad \mathbf{d}(\mathbf{x}) \ge \mathbf{0}\end{aligned}$$

Reference: Chapter 17 of Nocedal & Wright (Penalty Methods and Augmented Lagrangians).

Reducing to equality + bounds

The general problem with both equality and inequality constraints can be transformed into a problem with only equalities and bound constraints. Introduce slack variables $\mathbf{s} \ge 0$:

$$\begin{aligned}\min_{\mathbf{x}} &\quad f(\mathbf{x}) \\ \text{s.t.} &\quad \mathbf{c}(\mathbf{x}) = \mathbf{0} \\ &\quad \mathbf{d}(\mathbf{x}) \ge \mathbf{0}\end{aligned}$$
$\Longrightarrow$
$$\begin{aligned}\min_{\mathbf{x},\mathbf{s}} &\quad f(\mathbf{x}) \\ \text{s.t.} &\quad \mathbf{c}(\mathbf{x}) = \mathbf{0} \\ &\quad \mathbf{d}(\mathbf{x}) - \mathbf{s} = \mathbf{0} \\ &\quad \boldsymbol{\ell} \le \mathbf{x} \le \mathbf{u},\; \mathbf{s} \ge \mathbf{0}\end{aligned}$$
So handling equality constraints and bounds suffices! This is why methods like LANCELOT focus on equality constraints with bound constraints $\boldsymbol{\ell} \le \mathbf{x} \le \mathbf{u}$ — the general problem reduces to this form.

Why nonlinear constraints are dangerous

Nonlinear equality constraints can hide NP-hard problems. Consider the constraint:

$$x_i^2 - 1 = 0 \quad \Longrightarrow \quad x_i = \pm 1$$

This forces each variable to be $+1$ or $-1$. With $n$ such constraints, the feasible set has $2^n$ isolated points — we've encoded an integer programming problem as a nonlinear program!

Lesson: There is no algorithm that solves general nonlinear programs efficiently in all cases. The best we can do is find local solutions and KKT points.

Key insight: don't force feasibility

A natural instinct is to project every iterate onto the constraint manifold. This is a bad idea.

Consider optimizing on the surface of a hypersphere ($\|\mathbf{x}\|^2 = 1$). If we force every iterate to lie on the sphere, we can only take steps that follow the curvature — tiny arcs. But if we allow the iterate to cut through the interior, we can take much larger steps.

We only care about satisfying constraints at the solution. Intermediate iterates can violate constraints freely. This is the key idea behind penalty and augmented Lagrangian methods.

The overarching strategy

Approximate the constrained problem by something easier (unconstrained or bound-constrained), and solve a sequence of these simpler problems.

This creates a nested structure of iterations, like the movie Inception:

Reality — Level 0
Nonlinear optimization via a sequence of subproblems
Dream Level 1
Unconstrained / quadratic / linear subproblems
Dream Level 2
Newton / quasi-Newton iterations
Dream Level 3 — Limbo
Iterative linear solvers for the Newton system

The key insight from this nesting: don't solve inner subproblems too accurately. You can "live an entire lifetime" at Level 3 for almost no cost at Level 0. Early outer iterations only need rough inner solutions; accuracy matters only as you approach the answer.

Next: the two main methods for converting constrained problems into unconstrained ones — penalty methods and augmented Lagrangians.