$\newcommand{\eps}{\varepsilon} \newcommand{\kron}{\otimes} \DeclareMathOperator{\diag}{diag} \DeclareMathOperator{\trace}{trace} \DeclareMathOperator{\rank}{rank} \DeclareMathOperator*{\minimize}{minimize} \DeclareMathOperator*{\maximize}{maximize} \DeclareMathOperator{\subjectto}{subject to} \newcommand{\mat}[1]{\boldsymbol{#1}} \renewcommand{\vec}[1]{\boldsymbol{\mathrm{#1}}} \newcommand{\vecalt}[1]{\boldsymbol{#1}} \newcommand{\conj}[1]{\overline{#1}} \newcommand{\normof}[1]{\|#1\|} \newcommand{\onormof}[2]{\|#1\|_{#2}} \newcommand{\MIN}[2]{\begin{array}{ll} \minimize_{#1} & {#2} \end{array}} \newcommand{\MINone}[3]{\begin{array}{ll} \minimize_{#1} & {#2} \\ \subjectto & {#3} \end{array}} \newcommand{\MINthree}[5]{\begin{array}{ll} \minimize_{#1} & {#2} \\ \subjectto & {#3} \\ & {#4} \\ & {#5} \end{array}} \newcommand{\MAX}[2]{\begin{array}{ll} \maximize_{#1} & {#2} \end{array}} \newcommand{\MAXone}[3]{\begin{array}{ll} \maximize_{#1} & {#2} \\ \subjectto & {#3} \end{array}} \newcommand{\itr}[2]{#1^{(#2)}} \newcommand{\itn}[1]{^{(#1)}} \newcommand{\prob}{\mathbb{P}} \newcommand{\probof}[1]{\prob\left\{ #1 \right\}} \newcommand{\pmat}[1]{\begin{pmatrix} #1 \end{pmatrix}} \newcommand{\bmat}[1]{\begin{bmatrix} #1 \end{bmatrix}} \newcommand{\spmat}[1]{\left(\begin{smallmatrix} #1 \end{smallmatrix}\right)} \newcommand{\sbmat}[1]{\left[\begin{smallmatrix} #1 \end{smallmatrix}\right]} \newcommand{\RR}{\mathbb{R}} \newcommand{\CC}{\mathbb{C}} \newcommand{\eye}{\mat{I}} \newcommand{\mA}{\mat{A}} \newcommand{\mB}{\mat{B}} \newcommand{\mC}{\mat{C}} \newcommand{\mD}{\mat{D}} \newcommand{\mE}{\mat{E}} \newcommand{\mF}{\mat{F}} \newcommand{\mG}{\mat{G}} \newcommand{\mH}{\mat{H}} \newcommand{\mI}{\mat{I}} \newcommand{\mJ}{\mat{J}} \newcommand{\mK}{\mat{K}} \newcommand{\mL}{\mat{L}} \newcommand{\mM}{\mat{M}} \newcommand{\mN}{\mat{N}} \newcommand{\mO}{\mat{O}} \newcommand{\mP}{\mat{P}} \newcommand{\mQ}{\mat{Q}} \newcommand{\mR}{\mat{R}} \newcommand{\mS}{\mat{S}} \newcommand{\mT}{\mat{T}} \newcommand{\mU}{\mat{U}} \newcommand{\mV}{\mat{V}} \newcommand{\mW}{\mat{W}} \newcommand{\mX}{\mat{X}} \newcommand{\mY}{\mat{Y}} \newcommand{\mZ}{\mat{Z}} \newcommand{\mLambda}{\mat{\Lambda}} \newcommand{\mSigma}{\ensuremath{\mat{\Sigma}}} \newcommand{\mPbar}{\bar{\mP}} \newcommand{\ones}{\vec{e}} \newcommand{\va}{\vec{a}} \newcommand{\vb}{\vec{b}} \newcommand{\vc}{\vec{c}} \newcommand{\vd}{\vec{d}} \newcommand{\ve}{\vec{e}} \newcommand{\vf}{\vec{f}} \newcommand{\vg}{\vec{g}} \newcommand{\vh}{\vec{h}} \newcommand{\vi}{\vec{i}} \newcommand{\vj}{\vec{j}} \newcommand{\vk}{\vec{k}} \newcommand{\vl}{\vec{l}} \newcommand{\vm}{\vec{l}} \newcommand{\vn}{\vec{n}} \newcommand{\vo}{\vec{o}} \newcommand{\vp}{\vec{p}} \newcommand{\vq}{\vec{q}} \newcommand{\vr}{\vec{r}} \newcommand{\vs}{\vec{s}} \newcommand{\vt}{\vec{t}} \newcommand{\vu}{\vec{u}} \newcommand{\vv}{\vec{v}} \newcommand{\vw}{\vec{w}} \newcommand{\vx}{\vec{x}} \newcommand{\vy}{\vec{y}} \newcommand{\vz}{\vec{z}} \newcommand{\vpi}{\vecalt{\pi}} \newcommand{\vlambda}{\vecalt{\lambda}}$

# Homework 1

• Update 1: Fix a small typo in Problem 4 with the indices.
• Update 2: Another small typo fix in Problem 4 with the X1, X2, X3 matrices and filenames.

Please answer the following questions in complete sentences in a clearly prepared manuscript and submit the solution by the due date on Blackboard (around Sunday, September 3rd, 2017.)

Remember that this is a graduate class. There may be elements of the problem statements that require you to fill in appropriate assumptions. You are also responsible for determining what evidence to include. An answer alone is rarely sufficient, but neither is an overly verbose description required. Use your judgement to focus your discussion on the most interesting pieces. The answer to "should I include 'something' in my solution?" will almost always be: Yes, if you think it helps support your answer.

## Problem 0: Homework checklist

• Please identify anyone, whether or not they are in the class, with whom you discussed your homework. This problem is worth 1 point, but on a multiplicative scale.

• Make sure you have included your source-code and prepared your solution according to the most recent Piazza note on homework submissions.

## Problem 1: Operations

Compute the following by hand or using Julia. The vector $\ve = \bmat{1 & \ldots & 1}^T$ (i.e. the all ones vector).

1. $\bmat{ 1 & 1 & 2 \\ 3 & 5 & 8 \\ 13 & 21 & 34 } \bmat{ 1 & -2 & 3 \\ -4 & 5 & -6 \\ 7 & -8 & 9} =$ ?

2. $\vx =$ ones(1000,1) $\vy =$ 1:1000 $\vx^T \vy =$ ?

(Optional extra question -- worth no points -- who is always credited with discovering a very efficient way to compute this as a child?)

Numpy users $\vx =$ ones((1000,1)) $\vy =$ arange(1.,1001.)[:, newaxis]'

3. $\vx = \bmat{ 1.5 & 2 & -3}^T$. (Assume $\ve$ is $4 \times 1$.)
$\ve \vx^T =$?
$\vx \ve^T =$?

4. $\vx = \bmat{-5 & 4 & 2}^T$. (Assume $\ve_i$ is $3 \times 1$.) $\ve_1 \vx^T =$?
$\vx \ve_3^T =$?

## Problem 2: A proof

Let $\mA$ and $\mC$ be invertible matrices. We'll prove that the inverse of $\bmat{ \mA & \mB \\ 0 & \mC }$ is easy to determine!

1. Show that the inverse of is

2. Now, show that the inverse of is

3. Recall that for general $\mA$ and $\mB$, not those in the problem!, $(\mA \mB)^{-1} = \mB^{-1} \mA^{-1}$. Use this fact, and the result of problem 2.2 to determine the inverse to $\bmat{ \mA & \mB \\ 0 & \mC }$ when $\mA$ and $\mC$ are invertible. Alternatively, give the inverse of $\bmat{ \mA & \mB \\ 0 & \mC }$. Hint: think about diagonal matrices!

## Problem 3: A statistical test

This problem will be more difficult if you haven't used Julia or Matlab before, so get started early! It's designed to give you some simple experience.

One way of viewing a rank-1 matrix is as any matrix that can be written as a single outer-product, that is, $\mA = \vx \vy^T$ for some $\vx$ and some $\vy$.

In this problem, we'll look at the expected value of the random matrix $\mB$ where $\mB$ is a random rank-1 matrix following a simple probability distribution.

1. The first question is just to get you to document your initial impressions and is not worth any points. Suppose that $\vx$ and $\vy$ are length $n$ vectors with elements drawn from a standard normal distribution. Let $\mB = \vx \vy^T$. Then, let $\mC = E[\mB] = E[\vx \vy^T]$, where $E$ is the expected-value operator. What do you think the rank of $\mC$ is (guessing is fine!)? (Remember this isn't worth any points, so don't think about it too long or get hung up on small details.)

2. Now, let's check your guess! We can approximate $\mC$ using what's called a Monte-Carlo approximation. Suppose that $\mB_i = \vx_i \vy_i^T$ is a single matrix generated where $\vx_i$ and $\vy_i$ are instances of length-$n$ vectors with elements drawn from a random normal. Then the Monte Carlo approximation to $\mC$ is given by: We can generate $\vx_i$ and $\vy_i$ in Matlab/Numpy/Julia via

Matlab

x = randn(n,1);
y = randn(n,1);


Numpy/Python

import numpy
x = numpy.random.randn(n,1) # Numpy
y = numpy.random.randn(n,1)


Julia

x = randn(n); # Julia
y = randn(n);


Using your language of choice, evaluate $\mC$ where $N = 10000$ and $n=6$ and report the rank.

3. Did you find the rank surprising given your initial guess? If so, comment on which you think is correct, your code (2) or your guess (1). (Hint: your code should be correct, but maybe you don't trust an approximation?)

## Problem 4: Deep neural nets

This problem will be more difficult if you haven't used Julia or Matlab before, so get started early! It's designed to teach you about writing for loops to construct a matrix operation for a particular task.

Deep neural networks for image recognition are a new technology that demonstrates impressive results. In this problem, we will build matrices that let us simulate what would happen for a random deep neural net.

A deep neural network is a sequence of matrix-vector products and non-linear functions. Let $\vx$ represent the input to the neural net. Then a deep net computes a function such as: where $f$ is a non-linear function and $\mW_i$ is a weight matrix where $\mW_i$ has fewer rows that columns. (So the output of $\mW_i \vx$ is a smaller vector than the input.) These are called deep neural networks because the number of layers $k$ is usually large (hundreds). A popular choice for $f$ is the function: which is called the ReLU (rectified linear unit). (These units model a "neuron" that "activates" whenever $x$ is positive and is non-active when $x$ is non-positive.) Also, the function could vary with the layer. Other choices involved in deep neural net architecture are: how many layers (or how many functions and weight matrices)? what is the shape of a weight matrix? We don't wish to get into too many of the details here. We are going to have you evaluate a simple neural network architecture.

Given an input image as a $32 \times 32$ pixel image, where each pixel is a real-valued number between $0$ and $1$, we want to simulate the following neural network architecture: where

• $\mW_1$ is a $256 \times 1024$ matrix that is a rudimentary edge-detector, the result of $f(\mW_1 \vx)$ should be large if there is an edge (see below for how to do this);
• $f$ is a ReLU unit;
• $\mW_2$ is another $64 \times 256$ matrix which is the same type of edge detector.
• $\mW_3$ is a $1 \times 64$ matrix that simply sums the output of all the inputs. (Hence, $\mW_3 = \ve^T$ where $\ve \in \RR^{64}$.)

So the challenge here is to build $\mW_1$ and $\mW_2$. As mentioned, these are extremely crude edge-detectors. (There are much better things you can do, but I wanted to keep this fairly simple.)

We'll illustrate the construction of our edge detector with a $4 \times 4$ image. Suppose this input is represented by a matrix: What we want to do is output a $2 \times 2$ result: where This particular formula comes from applying a $2 \times 2$ computational stencil: to each $2 \times 2$ region of the input image to reduce it to a single number. (As mentioned a few times, if this number is positive, it means there is an edge or high-contrast region somewhere inside that $2 \times 2$ block, but there are much better ways to solve this problem! This is really just a simple example.)

Now, I described the input and output in terms of matrices above. However, the deep neural network architecture works with a vector input and produces a vector output. So we have to rework this a little bit.

1. Note that the input and output matrices have a linear ordering $x_1, \ldots, x_{16}$ and $y_1, \ldots, y_4$. Let $\vx \in \RR^{16}$ and $\vy \in \RR^{4}$. Write down the matrix $\mA$ such that $\vy = \mA \vx$.

Load these images in Julia and convert them into matrices.

using FileIO


Report the result of

trace(X1+X2+X3)


If you wish to look at the images (not required) then run

colorview(Gray, X1)

3. In what follows, we'll talk about two different types of indices. The image index of a pixel is a pair $(i,j)$ that identifies a row and column for the pixel in the image. The vector index of a pixel is the index of that pixel in a linear ordering of the image elements. For instance, in the sample from part 1, pixel (3,2) has linear index $7$. Also, pixel (1,4) has index $13$. Julia can help us built a map between pixel indices and linear or vector indices:

N = reshape(1:(4*4), 4, 4);


This creates the pixel index to linear index for the problem above because

N[1,4]
N[3,2]


return the appropriate entry.

In your own words, explain what the reshape operation does.

4. Now we need to construct the matrix $\mW_i$ in order to apply the edge detector that we've build.
I'm giving you the following template, that I hope you can fill in. Feel free to construct $\mW_{1}$ and $\mW_{2}$ any way you choose, but the following should provide some guidance.

function crude_edge_detector(nin,nout)
Nx = <fill in> # build a map using reshape that takes X indices to x
Ny = <fill in>
W = zeros(nout^2,nin^2)
for i=1:nin
for j=1:nin
xi = <fill in>
yj = <fill in>
W[yj, xi] = <fill in>
end
end
return W
end

W1 = crude_edge_detector(32,16)
W2 = crude_edge_detector(16,8)


Show the non-zero entries in the first row of $W2$ as well as the corresponding indices.

5. Now write a function to evaluate the neural net output on an image that we explained above. Note, your code should not recompute the edge detectors W1 and W2 each time, doing so will lose 1/3 the points on this question.

function net(x)
<fill in multiple lines>
end


To call net, we need convert an image into a vector. You can use the reshape command to do this, or in Julia, you can use the vec command too.

Show the results of the following commands

@show net(vec(Float64.(load("image1.png"))))

(Hint, I get net(vec(Float64.(load("image1.png")))) = 0.08235294117647171) The original images can be accessed from
6. Now suppose we change the edge detector to use the stencil instead. Show the output from the same neural net architecture using the new matrices $W_1$ and $W_2$.