# Homework 4

$\newcommand{\eps}{\varepsilon} \newcommand{\kron}{\otimes} \DeclareMathOperator{\diag}{diag} \DeclareMathOperator{\trace}{trace} \DeclareMathOperator{\rank}{rank} \DeclareMathOperator{\minimize}{minimize} \DeclareMathOperator{\subjectto}{subject to} \newcommand{\mat}[1]{\boldsymbol{#1}} \renewcommand{\vec}[1]{\boldsymbol{\mathrm{#1}}} \newcommand{\vecalt}[1]{\boldsymbol{#1}} \newcommand{\conj}[1]{\overline{#1}} \newcommand{\normof}[1]{\|#1\|} \newcommand{\onormof}[2]{\|#1\|_{#2}} \newcommand{\MIN}[2]{\begin{array}{ll} \minimize_{#1} & {#2} \end{array}} \newcommand{\MINone}[3]{\begin{array}{ll} \minimize_{#1} & {#2} \\ \subjectto & {#3} \end{array}} \newcommand{\OPTone}{\MINone} \newcommand{\itr}[2]{#1^{(#2)}} \newcommand{\itn}[1]{^{(#1)}} \newcommand{\prob}{\mathbb{P}} \newcommand{\probof}[1]{\prob\left\{ #1 \right\}} \newcommand{\pmat}[1]{\begin{pmatrix} #1 \end{pmatrix}} \newcommand{\bmat}[1]{\begin{bmatrix} #1 \end{bmatrix}} \newcommand{\spmat}[1]{\left(\begin{smallmatrix} #1 \end{smallmatrix}\right)} \newcommand{\sbmat}[1]{\left[\begin{smallmatrix} #1 \end{smallmatrix}\right]} \newcommand{\RR}{\mathbb{R}} \newcommand{\CC}{\mathbb{C}} \newcommand{\eye}{\mat{I}} \newcommand{\mA}{\mat{A}} \newcommand{\mB}{\mat{B}} \newcommand{\mC}{\mat{C}} \newcommand{\mD}{\mat{D}} \newcommand{\mE}{\mat{E}} \newcommand{\mF}{\mat{F}} \newcommand{\mG}{\mat{G}} \newcommand{\mH}{\mat{H}} \newcommand{\mI}{\mat{I}} \newcommand{\mJ}{\mat{J}} \newcommand{\mK}{\mat{K}} \newcommand{\mL}{\mat{L}} \newcommand{\mM}{\mat{M}} \newcommand{\mN}{\mat{N}} \newcommand{\mO}{\mat{O}} \newcommand{\mP}{\mat{P}} \newcommand{\mQ}{\mat{Q}} \newcommand{\mR}{\mat{R}} \newcommand{\mS}{\mat{S}} \newcommand{\mT}{\mat{T}} \newcommand{\mU}{\mat{U}} \newcommand{\mV}{\mat{V}} \newcommand{\mW}{\mat{W}} \newcommand{\mX}{\mat{X}} \newcommand{\mY}{\mat{Y}} \newcommand{\mZ}{\mat{Z}} \newcommand{\mLambda}{\mat{\Lambda}} \newcommand{\mPbar}{\bar{\mP}} \newcommand{\ones}{\vec{e}} \newcommand{\va}{\vec{a}} \newcommand{\vb}{\vec{b}} \newcommand{\vc}{\vec{c}} \newcommand{\vd}{\vec{d}} \newcommand{\ve}{\vec{e}} \newcommand{\vf}{\vec{f}} \newcommand{\vg}{\vec{g}} \newcommand{\vh}{\vec{h}} \newcommand{\vi}{\vec{i}} \newcommand{\vj}{\vec{j}} \newcommand{\vk}{\vec{k}} \newcommand{\vl}{\vec{l}} \newcommand{\vm}{\vec{l}} \newcommand{\vn}{\vec{n}} \newcommand{\vo}{\vec{o}} \newcommand{\vp}{\vec{p}} \newcommand{\vq}{\vec{q}} \newcommand{\vr}{\vec{r}} \newcommand{\vs}{\vec{s}} \newcommand{\vt}{\vec{t}} \newcommand{\vu}{\vec{u}} \newcommand{\vv}{\vec{v}} \newcommand{\vw}{\vec{w}} \newcommand{\vx}{\vec{x}} \newcommand{\vy}{\vec{y}} \newcommand{\vz}{\vec{z}} \newcommand{\vpi}{\vecalt{\pi}}$

# Homework 4

Please answer the following questions in complete sentences in a typed manuscript and submit the solution to me in class on February 16th, 2012.

## Problem 1: Covering the basics.

1. Decrease alone is insufficient! Construct a sequence of iterates such that $f(x_{k+1}) < f(x_k)$, but that do not converge to a minimizer of $f$ for a convex function $f$. (Hint: think $f(x) = x^2$.)

2. Back to calculus. Let

be the line search function at the $k$th iteration of an optimization algorithm. Use the definition of the directional derivative to show that $L_k'(\alpha) = g(\vx_k + \alpha \vp_k)^T \vp_k$.

## Problem 2: Finally! An optimization algorithm

Non-negative least squares is an important variant. Formally, it is:

We’ll see this problem again when we study constrained optimization. Here, we’ll investigate a log-barrier function to approximate it in an unconstrained manner:

where log is an elementwise function. Use the definition where: $\log(x) = -\infty$ if $x \le 0$.

1. Determine the gradient and Hessian.

2. What did you learn in class that you should always do after step 1? Do it.

3. Implement a backtracking line search routine that satisfies sufficient decrease. Convince your professor and TA that your implementation does not have any flaws. Discuss any flaws you known.

4. Modify the gradient_descent_1.m and newtons_method_1.m functions to use your backtracking line search. Read page 59 of Nocedal and Wright and follow the advice.
(Use (3.60) if applicable.)

5. Suppose that $\vx\itn{0}$ is strictly positive.
Does $\vx\itn{k}$ stay strictly positive with these two algorithms? Discuss or prove.

6. Show plots of the convergence in terms of function values and of infinity norms of the gradients of the methods for the matrices:

     A = [
0.0372    0.2869
0.6861    0.7071
0.6233    0.6245
0.6344    0.6170];
b = [
0.8587
0.1781
0.0747
0.8405];
mu = 0.1;
1. Suppose we say that a method converges if the infinity norm of the gradient is less than $10^{-4}$. Plot the number of function evaluations of your new steepest descent method and the Newton method for the values of $\mu = \{10^{-1}, 10^{-2}, 10^{-3}, 10^{-4}, 10^{-5}, 10^{-6} \}.$
Show your solutions for each of these cases. Compare the solutions to Matlab’s fminunc routine.