CS520 - Lecture 29

Other types of methods for large-scale optimization

Computational Methods in Optimization

CS 520

David F. Gleich

Purdue University

Alternating Optimization

Non-negative matrix factorization

Fix solve for

Repeat.

Does it converge?

Block coordinate descent

Gauss-Seidel

Alternating direction

~1980s Lots of study

~2010-2015 More study among ML / Compressed sensing / sparse 1-norm

Bertsekas in the book Nonlinear Programming

Suppose is continuous, differentiable

where is in a convex domain .

Think of each as a block of variables.

If

is unique attained, then the sequence of subproblems converges to a stationary point.

Bertsekas in the book Nonlinear Programming

Suppose there are just two blocks

[Groppo & Sciandrone]

Then we don’t need a unique minimizer any more and we can treat more general convex problems!

The Alternating Direction Method of Multipliers is a more general setting

More general problem theory
Take your problem and break it into solvable pieces and then put a Lagrange multiple on the equality constraint

e.g.

Then the augmented Lagrangian is

solve for given the Lagrangian multipliers on

solve for given the Lagrangian multiplers on

update the Lagrangian multipliers, update

A favorite example of ADMM

Overlapping, non-exhaustive clustering

Hou, Whang, Gleich, Dhillon. Fast Multiplier Methods for Non-exhaustive Overlapping Clustering

Stochastic Gradient Descent

SGD works nicely when your objective function is separable

we’ve seen examples of this in other lectures

This doesn’t work so well when you objective is line

max time s.t. the equations of motion (e.g. raptor)

min cost s.t. the object is buildable

min fuel s.t. we get to mars

In these case, reformulate to get an expectation a different way!