What breaks when $n$ is a million? When you can exploit structure, and when you need a whole new method: limited-memory quasi-Newton.
What makes optimization "large-scale"? Newton's method at $n = 10^5$: memory walls, cubic time, and a live quiz. Little-$o$ vs big-$O$ for function evaluations.
Start here →A show-and-tell: log-barrier diagonals, banded, arrowhead, sparse QPs, low-rank-plus-diagonal. When your $\mathbf{H}$ has structure, scalable linear algebra wins.
Explore →The main event: BFGS without storing the matrix. Derive the two-loop recursion step-by-step. Interactive demos of the algorithm, $m$-sensitivity, $\mathbf{T}_0$ scaling, and a head-to-head vs full BFGS.
Discover →