Other types of methods for large-scale optimization 
Computational Methods in Optimization
CS 520
David F. Gleich
Purdue University 
Alternating Optimization 
Non-negative matrix factorization 

 
Fix solve for

Fix solve for

Repeat.

Does it converge? 
Block coordinate descent 

Gauss-Seidel
Alternating direction

~1980s Lots of study

~2010-2015 More study among ML / Compressed sensing / sparse 1-norm
Bertsekas in the book Nonlinear Programming
Suppose is continuous, differentiable

where is in a convex domain
Think of each as a block of variables.

If 
 
is unique attained, then the sequence of subproblems converges to a stationary point. 
Bertsekas in the book Nonlinear Programming
Suppose there are just two blocks 




[Groppo & Sciandrone] 
Then we don’t need a unique minimizer any more and we can treat more general convex problems! 
The Alternating Direction Method of Multipliers is a more general setting 
  • More general problem theory
  • Take your problem and break it into solvable pieces and then put a Lagrange multiple on the equality constraint

e.g.
e.g.

Then the augmented Lagrangian is 
solve for given the Lagrangian multipliers on
solve for given the Lagrangian multiplers on
update the Lagrangian multipliers, update

A favorite example of ADMM 
Overlapping, non-exhaustive clustering 
Hou, Whang, Gleich, Dhillon. Fast Multiplier Methods for Non-exhaustive Overlapping Clustering 
Stochastic Gradient Descent 
Stochastic Gradient Descent 
SGD works nicely when your objective function is separable

we’ve seen examples of this in other lectures

This doesn’t work so well when you objective is line

max time s.t. the equations of motion (e.g. raptor)

min cost s.t. the object is buildable

min fuel s.t. we get to mars 

In these case, reformulate to get an expectation a different way!