Lecture 30 — Global Optimization & Meta-heuristics

The fantasy of "global"

Local optimization gives you a $\mathbf{x}^\star$ where $\nabla f(\mathbf{x}^\star) = 0$. Global optimization promises more: the best $\mathbf{x}^\star$ in the entire feasible set. For a generic non-convex $f$ on $[0,1]^n$, there is no algorithm that finds the true global minimum without, in the worst case, looking everywhere. "Looking everywhere" is exponential in $n$.

So when someone says they have a global optimization method, they mean one of two things:

Provably global on a special class (e.g., branch-and-bound on integer programs, $\alpha$BB on factorable nonlinear programs, polynomial optimization with sum-of-squares). These have real guarantees but are restricted to problems with structure that can be exploited.
"Asymptotically global" by random exploration — given infinite time, the algorithm visits every region. That includes simulated annealing, genetic algorithms, particle swarm, ant colony, tabu search, …

"Eventually visits every point" is not an algorithm. Pure uniform random search has the same property and is the trivial baseline. The interesting question for any meta-heuristic is: does it find good solutions quickly on the problems I actually care about?

The unifying frame: structured random exploration

Almost every meta-heuristic is a randomized greedy procedure on a (typically combinatorial) search space. Strip away the metaphor and you find the same three pieces:

Piece	What it does
A neighborhood / mutation operator	Generates new candidate solutions from old ones — often randomly, often local.
A bias toward better solutions	Selection, acceptance, pheromone reinforcement, fitness-proportionate sampling, etc.
A way to escape local optima	Temperature, mutation rate, crossover, tabu list, restarts.

Once you see this skeleton, the bestiary becomes much shorter:

Simulated annealing single chain + cooling

Random-walk MCMC with a temperature that decays to zero. Already covered in Part 1. The "escape" mechanism is high temperature early on; the "bias" is the Metropolis acceptance ratio.

Genetic algorithms population + recombination

A population of candidate solutions evolves by selection (better solutions reproduce more), crossover (combine two solutions), and mutation (random perturbation). The "escape" mechanism is mutation plus diversity in the population. Try the demo below.

Ant colony optimization indirect memory

Many "ants" each construct a candidate solution by random walk on a graph, biased by pheromone levels on edges. Edges in good solutions get more pheromone; pheromone evaporates over time. Originally for the traveling salesman; now applied to vehicle routing and scheduling. The "memory" is in the shared environment, not the agents.

Particle swarm optimization population + velocity

Particles fly through the search space with velocities pulled toward the best point each particle has seen and the best the swarm has seen. Conceptually a leaky momentum-based sampler with social information.

Tabu search explicit memory

Greedy local search that keeps a short list of recently visited solutions and forbids revisiting them. The "tabu list" is what prevents the search from cycling. Especially used on combinatorial problems with cheap neighborhood moves (e.g., swap two edges).

You can write down a 50-line skeleton common to all of them: maintain state, propose a perturbation, score, accept or update memory, repeat. The names differ; the math is mostly bookkeeping.

A skeptical view

I'll be upfront: I'm fairly skeptical of meta-heuristics as a general optimization technology. The skepticism boils down to four points.

The metaphors are doing a lot of work. Calling a stochastic local search "evolution" or an "ant colony" doesn't add explanatory power; it adds vocabulary. Stripped of metaphor, most of these are stochastic local search with a perturbation rule.
Convergence guarantees are weak. "Will find the optimum given infinite time" describes uniform random search too. Finite-time convergence rates are typically not available, or are problem-specific.
Comparisons in the literature are noisy. A new algorithm beats baselines on a benchmark; the baselines were tuned poorly; nobody re-runs with budget control. "X outperforms Y on this benchmark" rarely transfers.
If you have problem structure, use it. A linear program, an integer program, a convex relaxation, or even a good local method with random restarts will usually beat a meta-heuristic when the structure is known. Meta-heuristics are most defensible when you have a black-box objective and no structure to exploit.

Hedge: I am not an expert on these methods. There are subareas (CMA-ES for continuous black-box optimization, modern evolutionary strategies for reinforcement learning, ant colony for some combinatorial problems) where the techniques are competitive and well-studied. The Wikipedia articles on metaheuristic, genetic algorithm, ant colony optimization, simulated annealing, and CMA-ES are good entry points. Form your own view.

When meta-heuristics actually help

The honest case for them:

Combinatorial problems with a cheap neighborhood operator. TSP with 2-opt, scheduling with swap moves, layout problems. Random-restart hill-climbing already does very well, and meta-heuristics are small variations on it.
Black-box objectives with no derivatives and no structure. Hyperparameter tuning over discrete choices, simulator-based optimization, cases where every evaluation is a Monte Carlo simulation.
Anytime algorithms. You need some answer in 5 minutes, a better one in an hour, the best you can do overnight. A population method gives you a smooth quality-vs-time trade-off without a stopping criterion.
"Jostling things around a bit." Plenty of practical problems are mostly-convex with a few annoying local minima. A few random perturbations followed by local descent often beats a single elaborate algorithm.

What none of them buy you is a global guarantee in finite time. If you find someone selling that, look closely at the assumptions.