Direct Search Methods

Solutions 3 & 4: Search along fixed directions or use a stencil pattern.

Solution 3: Coordinate / Direction Search

Fix a sequence of search directions that span $\mathbb{R}^n$, and cycle among them:

$$\mathbf{p} = \mathbf{e}_1, -\mathbf{e}_1, \ldots, \mathbf{e}_n, -\mathbf{e}_n, \mathbf{e}_1, -\mathbf{e}_1, \ldots$$

At each step, do a line search (or evaluate a few step sizes) along the current direction.

A related variant cycles through pairs:

$$\pm\mathbf{e}_1, \ldots, \pm\mathbf{e}_n, \pm\mathbf{e}_n, \pm\mathbf{e}_{n-1}, \ldots, \pm\mathbf{e}_1, \ldots$$

"Brutally slow in general.
Wickedly fast when applicable.
(Like a scalpel.)"

This works well when the function is separable or nearly so (each coordinate can be optimized independently). It fails badly when there are strong cross-coordinate dependencies.

Solution 4: Pattern Search

Instead of cycling through one direction at a time, evaluate $f$ at a whole stencil of points around the current iterate:

xk
Cross stencil ($\pm\mathbf{e}_i$)
xk
Y-stencil (non-axis-aligned)

Each stencil point is:

$$\mathbf{x}_k + \gamma_k \mathbf{p}_k, \quad \mathbf{p}_k \in \mathcal{D}_k$$

The algorithm:

  1. Evaluate $f$ at all stencil points
  2. If the best stencil point gives sufficient decrease, move there
  3. Otherwise, reduce $\gamma_k$ (shrink the stencil) and re-evaluate

The Zoutendijk condition

For pattern search to converge, the stencil must satisfy a geometric condition: for any direction $\mathbf{v}$, there is some stencil direction $\mathbf{p}$ that is not too far from $\mathbf{v}$:

Cosine measure (Zoutendijk condition)
$$\min_{\mathbf{v} \in \mathbb{R}^n} \max_{\mathbf{p} \in \mathcal{D}_k} \frac{\mathbf{v}^T\mathbf{p}}{\|\mathbf{p}\|\,\|\mathbf{v}\|} \ge \delta > 0$$

This ensures we can always get at least a $\delta$-projection on any gradient direction. In other words: the stencil must span $\mathbb{R}^n$ — there can be no "blind spots."

The cross stencil $\{\pm\mathbf{e}_1, \ldots, \pm\mathbf{e}_n\}$ satisfies this with $\delta = 1/\sqrt{n}$.

Can we do better?

Both coordinate search and pattern search use fixed directions. They don't adapt to the shape of the function. Is there a method that uses function evaluations to figure out a good search direction?

Yes: Nelder-Mead uses a simplex of $n+1$ points to build a local "linear" model that identifies the downhill direction.