\documentclass[]{article}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\else % if luatex or xelatex
\ifxetex
\usepackage{mathspec}
\makeatletter % undo the wrong changes made by mathspec
\let\RequirePackage\original@RequirePackage
\let\usepackage\RequirePackage
\makeatother
\else
\usepackage{fontspec}
\fi
\defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
\fi
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
% use microtype if available
\IfFileExists{microtype.sty}{%
\usepackage{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\usepackage{hyperref}
\hypersetup{unicode=true,
pdftitle={Homework 4},
pdfborder={0 0 0},
breaklinks=true}
\urlstyle{same} % don't use monospace font for urls
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{0}
% Redefines (sub)paragraphs to behave more like sections
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
\title{Homework 4}
\input{preamble.tex}
\title{Homework}
\begin{document}
\maketitle
\section{Homework 4}\label{homework-4}
Please answer the following questions in complete sentences in a clearly
prepared manuscript and submit the solution by the due date on
Blackboard (around Sunday, September 23th, 2018.)
Remember that this is a graduate class. There may be elements of the
problem statements that require you to fill in appropriate assumptions.
You are also responsible for determining what evidence to include. An
answer alone is rarely sufficient, but neither is an overly verbose
description required. Use your judgement to focus your discussion on the
most interesting pieces. The answer to ``should I include `something' in
my solution?'' will almost always be: Yes, if you think it helps support
your answer.
\subsection{Problem 0: Homework
checklist}\label{problem-0-homework-checklist}
\begin{itemize}
\item
Please identify anyone, whether or not they are in the class, with
whom you discussed your homework. This problem is worth 1 point, but
on a multiplicative scale.
\item
Make sure you have included your source-code and prepared your
solution according to the most recent Piazza note on homework
submissions.
\end{itemize}
\subsection{Problem 1: Generalizing coordinate and gradient
descent}\label{problem-1-generalizing-coordinate-and-gradient-descent}
The algorithms for coordinate and gradient descent are for solving
\(\mA \vx = \vb\) on a symmetric positive definite matrix \(\mA\)
correspond to trying to minimize
\[ f(\vx) = \tfrac{1}{2} \vx^T \mA \vx - \vx^T \vb \] by, at each step,
exactly minizing this function for a single coordinate (coordinate
descent) or by exactly this function in the direction of the gradient
(gradient descent). Let \(\vx_k\) represent the approximate solution
after \(k\) steps. Let \(\vg_k\) represent the gradient
\(\mA \vx_k - \vb\) be the gradient at the \(k\)th step. Then the
algorithms can be written:
\[ \vx_{k+1} = \vx_{k} - \frac{[\vg_k]_i}{A_{i,i}} \ve_i \qquad \vx_{k+1} = \vx_{k} - \frac{\vg_k^T \vg_k}{\vg_k^T \mA \vg_k} \vg_k \]
respectively.
Suppose that we wish to generalize coordinate descent to \emph{two}
coordinates. Then the update can be written:
\[ \vx_{k+1} = \vx_k + \mE_{ij} \vc_k \] where
\(\mE_{ij} = \bmat{ \ve_i & \ve_j }\) and \(\vc_k = \bmat{c_i \\ c_j}\).
That is, we have to choose two coefficients: \(c_i, c_j\) in order to
figure out the next step. Like coordinate descent, we are going to pick
these to exactly minimize \(f(\vx_{k+1})\) as a function of
\(c_i, c_j\).
Give the resulting iterative algorithm.
\subsection{Problem 2: Implementing gradient
descent}\label{problem-2-implementing-gradient-descent}
For gradient descent, the most expensive computational step is computing
matrix-vector products with the matrix \(\mA\). Show how to implement
this algorithm with only a single matrix-vector product per iteration.
Your algorithm should stop when the residual is sufficiently small. Your
algorithm, including checking the stopping condition, can only use one
matrix-vector product per iteration.
(Hint: you will have to figure out how to \emph{update} the expression
for the gradient and solution and reformulate the algorithm!)
\subsection{Problem 3: Implementing coordinate
descent}\label{problem-3-implementing-coordinate-descent}
For coordinate descent, the most expensive step is computing
\([\vg_k]_i\). Use the routines from Homework 2 to compute this term
efficiently. Develop an algorithm to solve \(\mA \vx = \vb\) via
coordinate descent with two strategies to pick the coordinates: (1) by
cycling through 1 to \(n\) (called cyclic coordinate descent) and (2) by
randomly picking a coordinate (called random coordinate descent). If row
(or column \(i\)) of the matrix has \(k\) non-zeros, your algorithm must
do work proportional to \(k\) in each iteration.
\subsection{Problem 4: Comparing gradient and coordinate descent on our
2d
mesh.}\label{problem-4-comparing-gradient-and-coordinate-descent-on-our-2d-mesh.}
For the 2d mesh with \(n=10\), \(n=20\), and \(n=40\), compare the
performance of the algorithms from problems 2 and 3 in terms of the
\emph{norm of the relative residual} compared with the \emph{number
of non-zeros used}. Use a relative residual of \(10^{-4}\).
We are not comparing by number of iterations because each iteration does
vastly different amounts of work. An iteration of coordinate descent
does a very small amount of work, and we need \(n\) iterations of cyclic
coordinate descent to give us the same amount of work as 1 iteration of
gradient descent. Hence, we'll use the total number of non-zero entries
used. So if a row/column used in coordinate descent has \(k\) non-zero
entries, then we increase our work count by \(k\). For each iteration of
gradient descent, we use the total number of non-zeros in \(\mA\) work.
Your answer should probably have both a table and a plot of how the
residual descreases with \emph{work}. You should show how the residual
decreases from \(10^{-1}\) to \(10^{-4}\) as a function of work. This
suggests a log-scaling of a \(y\)-axis.
\subsection{Problem 5: Using these algorithms on the Candyland linear
systems.}\label{problem-5-using-these-algorithms-on-the-candyland-linear-systems.}
Describe what happens when you use these algorithms on the linear system
of equations for Candyland.
\end{document}