When $n$ is a million, the linear algebra inside Newton-type methods stops being free.
There's no formal cutoff. Operationally:
| Problem | Variables | Constraints |
|---|---|---|
| Metric-constrained clustering (Veldt & Gleich) | $1.6 \times 10^8$ | $3 \times 10^{12}$ |
| Aerospace trajectory / CFD | $10^5$–$10^7$ | varies |
| Modern LLM training (GPT-class) | $10^{12}$+ (parameters) | — |
| PDE-constrained inverse problems | $10^6$–$10^9$ (mesh DOFs) | PDE |
The Veldt/Gleich problem took days to weeks on a cluster. With 3 trillion constraints, you can't even list the constraints — you have to generate them on the fly inside the solver.
For this lecture, we assume:
If even one function call takes hours (e.g., a CFD simulation), you're in surrogate optimization territory — a different lecture entirely.
Recall the asymptotic notations:
So "$f$ costs $o(n^2)$" says the cost grows strictly slower than $n^2$. Acceptable: $O(n)$, $O(n \log n)$, $O(n^{3/2})$. Not acceptable: $O(n^2)$ exactly, or $O(n^2 / \log n)$.
Newton's method, stripped down:
x_0 given
while not done:
Solve H_k p_k = -g_k # ← the bottleneck
Line search to find α_k
x_{k+1} = x_k + α_k p_k
The line search is fine — it's just inner products and a few function calls. The trouble is the linear solve.
For a dense symmetric $\mH \in \RR^{n \times n}$, Cholesky factorization costs $\sim n^3/6$ flops and the matrix itself takes $n(n+1)/2$ doubles.
If the problem is huge and dense Newton is hopeless, you have three options:
| Route | Idea | Covered in |
|---|---|---|
| 1. Use a simpler method | Drop the Hessian: gradient descent, conjugate gradient, SGD. | Next lecture |
| 2. Use scalable linear algebra | Exploit sparsity, banding, or low-rank structure in $\mH$. | Part 2 |
| 3. Change the method | Build a low-memory approximation to $\mH^{-1}$ that never gets formed explicitly. | Part 3 — L-BFGS |