Solution 2: Fit a local quadratic model by interpolation and solve it with a trust-region method.
Newton's method needs the Hessian $\nabla^2 f(\mathbf{x}) \in \mathbb{R}^{n \times n}$. Could we estimate it via finite differences? Each column requires a gradient evaluation, and each gradient costs $O(n)$ function evaluations, so:
$$\text{FD Hessian cost} = O(n) \times O(n) = O(n^2) \text{ function evaluations}$$For expensive black-box functions, $O(n^2)$ evaluations per step is often unacceptable. The quadratic model approach achieves Newton-like behavior using a different strategy: rather than estimating derivatives, we interpolate the function directly.
Build a full quadratic model of $f$ around the current point $\mathbf{x}_k$:
$$m_k(\mathbf{x}_k + \mathbf{p}) = c + \mathbf{q}^T\mathbf{p} + \tfrac{1}{2}\mathbf{p}^T\mathbf{G}\mathbf{p}$$Here $c$ is a scalar, $\mathbf{q} \in \mathbb{R}^n$ plays the role of the gradient, and $\mathbf{G} \in \mathbb{R}^{n \times n}$ (symmetric) plays the role of the Hessian.
We determine $c$, $\mathbf{q}$, $\mathbf{G}$ by interpolation: choose sample points $\mathbf{y}_1, \ldots, \mathbf{y}_p$ and require
$$m_k(\mathbf{y}_\ell) = f(\mathbf{y}_\ell), \quad 1 \le \ell \le p$$Then we optimize this model using a trust-region method.
The quadratic model has three groups of unknowns:
| Component | Count |
|---|---|
| Constant $c$ | 1 |
| Linear term $\mathbf{q}$ | $n$ |
| Quadratic term $\mathbf{G}$ (symmetric) | $\frac{n(n+1)}{2}$ |
| Total | $1 + n + \frac{n(n+1)}{2} = O(n^2)$ |
We have $p = O(n^2)$ parameters and $p$ interpolation conditions $m_k(\mathbf{y}_\ell) = f(\mathbf{y}_\ell)$. This gives a $p \times p$ linear system for the unknowns $(c, \mathbf{q}, \mathbf{G})$.
When we move to a new iterate $\mathbf{x}_{k+1}$, we don't throw away the whole model. We keep most of the sample points $\mathbf{y}_\ell$ and replace just one (typically the oldest or worst). This changes one row of the interpolation system.
Updating the factorization of a $p \times p$ system when one row changes costs $O(p^2) = O(n^4)$ — a rank-1 update to the factored system, analogous to the Sherman–Morrison formula.
For moderate $n$ (say 10–50), the $O(n^4)$ arithmetic is cheap compared to function evaluations. As the instructor noted: "This little toy in my pocket can do billions of computations per second" — $n^4$ for $n = 50$ is about $6 \times 10^6$, a fraction of a second.