\documentclass[]{article}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\else % if luatex or xelatex
\ifxetex
\usepackage{mathspec}
\makeatletter % undo the wrong changes made by mathspec
\let\RequirePackage\original@RequirePackage
\let\usepackage\RequirePackage
\makeatother
\else
\usepackage{fontspec}
\fi
\defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
\fi
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
% use microtype if available
\IfFileExists{microtype.sty}{%
\usepackage{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\usepackage{hyperref}
\hypersetup{unicode=true,
pdftitle={Homework 1},
pdfborder={0 0 0},
breaklinks=true}
\urlstyle{same} % don't use monospace font for urls
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{0}
% Redefines (sub)paragraphs to behave more like sections
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
\title{Homework 1}
\input{preamble.tex}
\title{Homework}
\begin{document}
\maketitle
\section{Homework 1}\label{homework-1}
Please answer the following questions in complete sentences in a clearly
prepared manuscript and submit the solution by the due date on
Blackboard (around Sunday, September 2nd, 2018.)
Remember that this is a graduate class. There may be elements of the
problem statements that require you to fill in appropriate assumptions.
You are also responsible for determining what evidence to include. An
answer alone is rarely sufficient, but neither is an overly verbose
description required. Use your judgement to focus your discussion on the
most interesting pieces. The answer to ``should I include `something' in
my solution?'' will almost always be: Yes, if you think it helps support
your answer.
\subsection{Problem 0: Homework
checklist}\label{problem-0-homework-checklist}
\begin{itemize}
\item
Please identify anyone, whether or not they are in the class, with
whom you discussed your homework. This problem is worth 1 point, but
on a multiplicative scale.
\item
Make sure you have included your source-code and prepared your
solution according to the most recent Piazza note on homework
submissions.
\end{itemize}
\subsection{Problem 1: Operations}\label{problem-1-operations}
Compute the following by hand or using Julia. The vector
\(\ve = \bmat{1 & \ldots & 1}^T\) (i.e.~the all ones vector).
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
\(\bmat{ 1 & 1 & 2 \\ 3 & 5 & 8 \\ 13 & 21 & 34 } \bmat{ 1 & -2 & 3 \\ -4 & 5 & -6 \\ 7 & -8 & 9} =\)
?
\item
\(\vx =\) \texttt{ones(1000,1)} \(\vy =\) \texttt{1:1000}
\(\vx^T \vy =\) ?
(Optional extra question -- worth no points -- who is always credited
with discovering a very efficient way to compute this as a child?)
\emph{Numpy users} \(\vx =\) \texttt{ones((1000,1))} \(\vy =\)
\texttt{arange(1.,1001.){[}:,\ newaxis{]}\textquotesingle{}}
\item
\(\vx = \bmat{ 1.5 & 2 & -3}^T\). (Assume \(\ve\) is
\(4 \times 1\).)\\
\(\ve \vx^T =\)?\\
\(\vx \ve^T =\)?
\item
\(\vx = \bmat{-5 & 4 & 2}^T\). (Assume \(\ve_i\) is \(3 \times 1\).)
\(\ve_1 \vx^T =\)?\\
\(\vx \ve_3^T =\)?
\end{enumerate}
\subsection{Problem 2: A proof}\label{problem-2-a-proof}
Let \(\mA\) and \(\mC\) be invertible matrices. We'll prove that the
inverse of \(\bmat{ \mA & \mB \\ 0 & \mC }\) is easy to determine!
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
Show that the inverse of \[ \bmat{ 1 & a \\ 0 & 1 } \] is
\[ \bmat{ 1 & -a \\ 0 & 1 }. \]
\item
Now, show that the inverse of \[ \bmat{ \mI & \mA \\ 0 & \mI } \] is
\[ \bmat{ \mI & -\mA \\ 0 & \mI }. \]
\item
Recall that for general \(\mA\) and \(\mB\), not those in the
problem!, \((\mA \mB)^{-1} = \mB^{-1} \mA^{-1}\). Use this fact, and
the result of problem 2.2 to determine the inverse to
\(\bmat{ \mA & \mB \\ 0 & \mC }\) when \(\mA\) and \(\mC\) are
invertible. Alternatively, give the inverse of
\(\bmat{ \mA & \mB \\ 0 & \mC }\). Hint: think about diagonal
matrices!
\end{enumerate}
\subsection{Problem 3: Simplifying a matrix
expression}\label{problem-3-simplifying-a-matrix-expression}
The following derivation occured when I was working on a proof with a
student that we recently used in a research paper. We have an
expression:
\[ \vq(\vx) = \text{Diag}((\mI - \mH) \vx) \cdot (\mI + \mH) \vx \]
where \(\mH\) is a square, non-negative matrix and
\[ \text{Diag}(\vz) = \bmat{ z_1 & 0 & \ldots & 0\\ 0 & z_2 & \ldots & 0\\ & & \ddots \\ 0 & \ldots & 0 & z_n} \]
(which is a diagonal matrix where \(\vz\) is on the diagonal). We needed
to show that: \[ \vy^T \vq(\vx) = \vx^T \mC \vx \] for some matrix
\(\mC\) that depends on \(\vy\). Your goal in this problem is to work
out an expression for \(\mC\).
This entire problem can be done element-wise, but the proof and results
are fairly simple if you embrace matrix notations.
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
(Very easy.) As a warm-up, show that if \(\mF\) and \(\mG\) are square
diagonal matrices, then \(\mF \mG = \mG \mF\).
\item
Show that
\(\vq(\vx) = \text{Diag}(\vx) \vx - \text{Diag}(\mH \vx) \mH \vx\).
(Note that you need to use some properties of diagonal matrices in
order to show this.)
\item
Write an expression of \(\mC\) in terms of \(\vy\) using the
simplified version from part 2.
\end{enumerate}
\subsection{Problem 4: Deep neural
nets}\label{problem-4-deep-neural-nets}
\textbf{This problem will be more difficult if you haven't used Julia or
Matlab before, so get started early! It's designed to teach you about
writing \texttt{for} loops to construct a matrix operation for a
particular task.}
Deep neural networks for image recognition are a new technology that
demonstrates impressive results. In this problem, we will build matrices
that let us simulate what would happen for a random deep neural net.
A deep neural network is a sequence of matrix-vector products and
non-linear functions. Let \(\vx\) represent the input to the neural net.
Then a deep net computes a function such as:
\[ \text{deepnet}(\vx) = f( \mW_k \cdots f(\mW_3 f(\mW_2 f(\mW_1 \vx))) \cdots ) \]
where \(f\) is a non-linear function and \(\mW_i\) is a weight matrix
where \(\mW_i\) has fewer rows that columns. (So the output of
\(\mW_i \vx\) is a \emph{smaller} vector than the input.) These are
called \emph{deep} neural networks because the number of layers \(k\) is
usually large (hundreds). A popular choice for \(f\) is the function:
\[ f(x) = \begin{cases} x & x > 0 \\ 0 & x \le 0 \end{cases} \] which is
called the ReLU (rectified linear unit). (These units model a ``neuron''
that ``activates'' whenever \(x\) is positive and is non-active when
\(x\) is non-positive.) Also, the function could vary with the layer.
Other choices involved in deep neural net architecture are: * how many
layers (or how many functions and weight matrices)? * what is the shape
of a weight matrix? We don't wish to get into too many of the details
here. We are going to have you evaluate a simple neural network
architecture.
Given an input image as a \(32 \times 32\) pixel image, where each pixel
is a real-valued number between \(0\) and \(1\), we want to simulate the
following neural network architecture:
\[ \text{ournet}(\vx) = f(\mW_3 f(\mW_2 f(\mW_1 \vx))) \] where
\begin{itemize}
\tightlist
\item
\(\mW_1\) is a \(256 \times 1024\) matrix that is a rudimentary
edge-detector, the result of \(f(\mW_1 \vx)\) should be large if there
is an edge (see below for how to do this);
\item
\(f\) is a ReLU unit;
\item
\(\mW_2\) is another \(64 \times 256\) matrix which is the same type
of edge detector.
\item
\(\mW_3\) is a \(1 \times 64\) matrix that simply sums the output of
all the inputs. (Hence, \(\mW_3 = \ve^T\) where \(\ve \in \RR^{64}\).)
\end{itemize}
So the challenge here is to build \(\mW_1\) and \(\mW_2\). As mentioned,
these are extremely crude edge-detectors. (There are much better things
you can do, but I wanted to keep this fairly simple.)
We'll illustrate the construction of our edge detector with a
\(4 \times 4\) image. Suppose this input is represented by a matrix:
\[ \text{input} = \bmat{ x_1 & x_5 & x_9 & x_{13} \\
x_2 & x_6 & x_{10} & x_{14} \\
x_3 & x_7 & x_{11} & x_{15} \\
x_4 & x_8 & x_{12} & x_{16} }. \] What we want to do is output
a \(2 \times 2\) result:
\[ \text{output} = \bmat{y_1 & y_3 \\ y_2 & y_4} \] where
\[ \begin{aligned}
y_1 & = x_1 + x_6 - x_5 - x_2 \\
y_2 & = x_3 + x_8 - x_7 - x_4 \\
y_3 & = x_9 + x_{14} - x_{13} - x_{10} \\
y_4 & = x_{11} + x_{16} - x_{15} - x_{12} \\
\end{aligned} \] This particular formula comes from applying a
\(2 \times 2\) computational stencil: \[ \bmat{ + & - \\ - & + } \] to
each \(2 \times 2\) region of the input image to reduce it to a single
number. (As mentioned a few times, if this number is positive, it means
there is an edge or high-contrast region somewhere inside that
\(2 \times 2\) block, but there are much better ways to solve this
problem! This is really just a simple example.)
Now, I described the input and output in terms of \emph{matrices} above.
However, the deep neural network architecture works with a \emph{vector}
input and produces a \emph{vector} output. So we have to rework this a
little bit.
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
Note that the input and output matrices have a linear ordering
\(x_1, \ldots, x_{16}\) and \(y_1, \ldots, y_4\). Let
\(\vx \in \RR^{16}\) and \(\vy \in \RR^{4}\). Write down the matrix
\(\mA\) such that \(\vy = \mA \vx\).
\item
We will want to run this on real images. Download
\begin{itemize}
\tightlist
\item
\url{http://www.cs.purdue.edu/homes/dgleich/cs515-2018/homeworks/image1.png}
\item
\url{http://www.cs.purdue.edu/homes/dgleich/cs515-2018/homeworks/image2.png}
\item
\url{http://www.cs.purdue.edu/homes/dgleich/cs515-2018/homeworks/image3.png}
\end{itemize}
(Before you do the next step, you may have to add the following
packages)
\begin{verbatim}
using Pkg; Pkg.add(["Images","FileIO","ImageMagick"]) # on Linux
using Pkg; Pkg.add(["Images","FileIO","QuartzImageIO"]) # on OSX
\end{verbatim}
Load these images in Julia and convert them into matrices.
\begin{verbatim}
using FileIO, Images
X1 = Float64.(load("image1.png"))
X2 = Float64.(load("image2.png"))
X3 = Float64.(load("image3.png"))
\end{verbatim}
Report the result of
\begin{verbatim}
using LinearAlgebra
tr(X1+X2+X3) # this computed the trace
\end{verbatim}
If you wish to look at the images (not required) then run
\begin{verbatim}
colorview(Gray, X1)
\end{verbatim}
Alternatively, you can use
\begin{verbatim}
using Plots
pyplot()
heatmap(X1,yflip=true, color=:gray)
gui()
\end{verbatim}
\item
In what follows, we'll talk about two different types of indices. The
image index of a pixel is a pair \((i,j)\) that identifies a row and
column for the pixel in the image. The vector index of a pixel is the
index of that pixel in a linear ordering of the image elements. For
instance, in the sample from part 1, pixel (3,2) has linear index
\(7\). Also, pixel (1,4) has index \(13\). Julia can help us built a
map between pixel indices and linear or vector indices:
\begin{verbatim}
N = reshape(1:(4*4), 4, 4)
\end{verbatim}
This creates the pixel index to linear index for the problem above
because
\begin{verbatim}
N[1,4]
N[3,2]
\end{verbatim}
return the appropriate entry.
In your own words, explain what the \texttt{reshape} operation does.
\item
Now we need to construct the matrix \(\mW_i\) in order to apply the
edge detector that we've build.\\
I'm giving you the following template, that I hope you can fill in.
Feel free to construct \(\mW_{1}\) and \(\mW_{2}\) any way you choose,
but the following should provide some guidance.
\begin{verbatim}
function crude_edge_detector(nin,nout)
Nx = # build a map using reshape that takes X indices to x
Ny =
W = zeros(nout^2,nin^2)
for i=1:nin
for j=1:nin
xi =
yj =
W[yj, xi] =
end
end
return W
end
W1 = crude_edge_detector(32,16)
W2 = crude_edge_detector(16,8)
\end{verbatim}
Show the non-zero entries in the first row of \(W2\) as well as the
corresponding indices.
\item
Now write a function to evaluate the neural net output on an image
that we explained above. Note, your code should not recompute the edge
detectors \texttt{W1} and \texttt{W2} each time, doing so will lose
1/3 the points on this question.
\begin{verbatim}
function net(x)
end
\end{verbatim}
To call \texttt{net}, we need convert an image into a vector. You can
use the \texttt{reshape} command to do this, or in Julia, you can use
the \texttt{vec} command too.
Show the results of the following commands
\begin{verbatim}
@show net(vec(Float64.(load("image1.png"))))
@show net(vec(Float64.(load("image2.png"))))
@show net(vec(Float64.(load("image3.png"))))
\end{verbatim}
(Hint, I get
\texttt{net(vec(Float64.(load("image1.png"))))\ =\ 0.08235294117647171})
The original images can be accessed from
\begin{itemize}
\tightlist
\item
https://c1.staticflickr.com/8/7015/6554001581\_3370ca8802\_b.jpg
\item
https://c1.staticflickr.com/3/2880/33053003793\_5840a879fb\_b.jpg
\item
https://c1.staticflickr.com/5/4138/4814209463\_10a00d0b2d\_b.jpg
\end{itemize}
Do these results make sense?
\item
Now suppose we change the edge detector to use the stencil
\[ \bmat{ - & + \\ + & - } \] instead. Show the output from the same
neural net architecture using the new matrices \(W_1\) and \(W_2\).
\item
The matrices \(W1\) and \(W2\) have very few entries compared to the
number of zeros. This is a case where we could consider using sparse
matrices instead of dense matrices. One efficient way of creating a
sparse matrix in Julia is to produce a list of the elements that are
non-zero, and then to use the matrix. Write the following function
\begin{verbatim}
using SparseArrays
function sparse_crude_edge_detector(nin,nout)
Nx = # build a map using reshape that takes X indices to x
Ny =
nnz = # this is the number of non-zeros
I = zeros(Int, nnz) # the row index
J = zeros(Int, nnz) # the column index
V = zeros(nnz) # the value
index = 1
for i=1:nin
for j=1:nin
I[index] =
J[index] =
V[index] =
index += 1
end
end
return sparse(I,J,V,nout^2,nin^2)
end
\end{verbatim}
Write this function and make sure that the result is equivalent to
what you got with your original function.
\begin{verbatim}
sW1 = sparse_crude_edge_detector(32,16)
W1 = crude_edge_detector(32,16)
sW1 == W1 # test for equality
\end{verbatim}
\item
Let \texttt{x\ =\ vec(Float64.(load("image1.png")))}. Then show the
time to compute \texttt{W1*x} vs. \texttt{sW1*x}. (In Julia, this is
\texttt{@time}). Repeat this a few times to make sure you have the
correct time. Report the fastest time for each.
\end{enumerate}
\end{document}