
Homework 3

Please answer the following questions in complete sentences in a typed, clearly prepared manuscript and submit the solution by the due date on Blackboard (Monday, September 18th, 2017, about 4am)

Problem 0: Homework checklist

• Please identify anyone, whether or not they are in the class, with whom you discussed your homework. This problem is worth 1 point, but on a multiplicative scale.

• Make sure you have included your source-code and prepared your solution according to the most recent Piazza note on homework submissions.

Problem 1: Compute it!

Produce the analytic SVDs of the following matrices. (That is, no numerical approximations, but feel free to let Matlab give you a good guess!). It's important to think about these questions because I may give similar questions on a midterm or final and you'll be expected to remember these. It's also handy to think about how to construct these, even though we haven't seen any algorithms yet. You should be able to work them all out directly from the definition.

1. $\bmat{0 & 3 \\ 0 & 0}$

2. $\bmat{5 & 0 \\ 2 & 0}$

3. $\bmat{5 & -5 \\ 2 & -2 \\ 0 & 0}$

4. $\bmat{2 & 0 \\ 0 & -1}$

Problem 2: Fun with the SVD.

The following quesitons are some warm up questions about the theory of the SVD. Each of these is designed to teach you something about the SVD. So there is usually a nice fact that I want you to know.

1. Find a matrix with two different SVD factorizations.
What is the same in both cases?

2. What is the SVD of $\mA = \vf \vg^T$ for arbitrary vectors $\vf$, $\vg$?

3. Show that $||\mA^{-1}|| = 1/\sigma_{\min}$.

4. If $\mQ$ is an orthogonal matrix, what is it's SVD? In particular, what are the singular values?

Problem 3: The celebrated Eckart-Young theorem

One of the early communities to use the SVD for data analyais were the psychometricians. They were trying to understand how to analyze matrices of test scores!

If $N$ individuals are each subjected to $n$ tests, it is a fundamental postulate of factor theory that the resulting $n \times N$ score matrix a can be adequately approximated by another matrix whose rank $r$ is less than the smaller of $n$ or $N$.
Closely associated to this postulate is the purely mathematical problem of finding that matrix of rank $r$ which most closely approximates a given matrix of higher rank $R$. It will be shown that if the least-squares criterion of approximation be adopted, this problem has a general solution which is relatively simple in a theoretical sense, though the amount of numerical work involved in applications may be prohibitive. Certain conclusions can be drawn from the theoretical solution which may be of importance in practical work. -- Eckart and Young

In this question, we will work through a proof of the Eckart-Young theorem. You are welcome to go look at the paper for the original proof. However, I think the more modern steps below will be easier to follow. Note that you can also find this same proof in many references. Again, you are free to look at these and even copy them. However, I will ask that you explain everything! If you find yourself reading these and thinking, "hmm... why is that true...?" Those are exactly the steps we are expecting you to explain.

The Eckart-Young theorem states:

Let $\mA \in \RR^{m \times n}$, without loss of generality, assume $m \ge n$. Let $\mA = \mU \mat{\Sigma} \mV^T$ be the SVD. Let $\mU = \bmat{\vu_1, \ldots, \vu_m}$, $\mat{\Sigma} = \diag(\sigma_1,\ldots,\sigma_n)$, and $\mV = \bmat{\vv_1, \ldots, \vv_n}$. Define Then: for any matrix $\mB$ of rank $k$.

It's helpful to translate this into real language. Suppose I give you a matrix $\mA$, and ask for the closest matrix to $\mA$ but with rank $k$ instead of rank $\mA$ instead. Eckart-Young tells us that we can construct such as matrix from the SVD.

In this problem, we'll use the matrix 2-norm.

1. Explain why we don't lose any generality with the assumption $m \ge n$.

2. Warmup Suppose that $\mA$ is a diagonal matrix with non-negative entries. What is the best diagonal rank $k$ approximation to $\mA$? (Yes, you have to prove it. No, you cannot assume the Eckart-Young theorem is true!)

3. Warmup Show that $\normof{\mA - \mA_n} = 0$.

4. Warmup Show that $\normof{\mA - \mA_R} = 0$ where $R$ is the rank of $\mA$. Why does this mean we can assume that $k < R$ without losing any further generality?

5. Show that $\normof{\mA - \mA_k} = \sigma_{k+1}$.

6. We'll proceed by contradiction now. Suppose that a rank $k$ matrix $\mB$ exists with $\normof{\mA - \mB} < \sigma_{k+1}$. Briefly justify why:

7. Using the last result, we can choose $\vx$ to be anything we want. Note that $\mB$ is rank $k$. This means it must have a null-space of dimension $n - k$. Show that, for any vector $\vw$ in the null-space of $\mB$ that:

8. Now we have a subspace of $\RR^{n}$ such that $\normof{\mA \vw}$ is bounded above. We'll now show that there's another subspace where it's bounded below! Let $\vz \in \text{span}(\vv_1, \ldots, \vv_{k+1})$. Show that $\normof{\mA \vz} \ge \sigma_{k+1} \normof{\vz}$. (Hint, note that $\mV_{k+1} = \bmat{\vv_1, \ldots, \vv_{k+1}}$ is an orthogonal basis for that span.)

9. What's the dimension of the space where $\mA \vw$ is bounded above? What's the dimension of the space where $\mA \vz$ is bounded below? Why does this lead to a contradiction?

10. Give yourself a pat on the back!

Problem 4: Eigendigits

matread(download(
"http://www.cs.nyu.edu/~roweis/data/usps_all.mat"))


as we did in the Cosine similarity file

1. Describe how the data is stored. Use the heatmap or colorview function in Julia to draw a picture of each individual digit. (Or something else in another language to draw a sample picture of each digit.

(If you are having trouble with this one, post on Piazza and we will try and be helpful. We just want you to be able to work with the data and we will guide you through what's done here...)

2. Describe a scheme to center the vector associated with each digit picture. That is, if $\vf$ is the vector associated with a digit, then explain how to add a scalar to $\vf$ such that $\vx = \vf + \gamma \ve$ and the sum of $\vx$ is zero.

3. Describe and implement a matrix operation to center a matrix of all digits. (Hint, you will likely have to reshape the data-cube that the data comes in...)

4. Compute the singular value decomposition of the matrix of centered images as a "pixels $\times$ centered digits" matrix.
What is the largest singular value?

5. Plot the leading singular vector $\vu_1$ as a $16\times 16$ pixel image. What do you see?

6. Repeat the same analysis for just for the pictures associated with each digit. That is, you should compute the dominant eigenvalue for 10 matrices, one associated with each digit, as well as producing the picture of the leading singular vector for those 10 matrices. Report any key differences from the picture in number 5.