Please answer the following questions in complete sentences in a clearly prepared manuscript and submit the solution by the end of class on Thursday, September 20th, 2012. You should submit your PDF file to Blackboard http://mycourses.purdue.edu as well as your Matlab files.
Please identify anyone, whether or not they are in the class, with whom you discussed your homework. This problem is worth 1 point, but on a multiplicative scale.
Produce the analytic SVDs of the following matrices. (That is, no numerical approximations, but feel free to let Matlab give you a good guess!). It’s important to think about these questions because I may give similar questions on a midterm or final and you’ll be expected to remember these. It’s also handy to think about how to construct these, even though we haven’t seen any algorithms yet. You should be able to work them all out directly from the definition.
The following quesitons are some warm up questions about the theory of the SVD. Each of these is designed to teach you something about the SVD. So there is usually a nice fact that I want you to know.
What is where is an orthogonal matrix?
Find a matrix with two different SVD factorizations. (Remember, someone asked about this in class!) What is the same in both cases?
Show that .
In this problem, we’ll work through the details of how to show that the SVD encodes the rank of a matrix.
Show that the rank of a diagonal matrix is the number of non-zeros.
Show that the rank of when and are full-rank.
Briefly explain that for a square orthogonal is full rank.
Explain why we’ve determined that the rank of a matrix is given by the number of non-zero singular values.
One of the early communities to use the SVD for data analyais were the psychometricians. They were trying to understand how to analyze matrices of test scores!
If individuals are each subjected to tests, it is a fundamental postulate of factor theory that the resulting score matrix a can be adequately approximated by another matrix whose rank is less than the smaller of or .
Closely associated to this postulate is the purely mathematical problem of finding that matrix of rank which most closely approximates a given matrix of higher rank . It will be shown that if the least-squares criterion of approximation be adopted, this problem has a general solution which is relatively simple in a theoretical sense, though the amount of numerical work involved in applications may be prohibitive. Certain conclusions can be drawn from the theoretical solution which may be of importance in practical work. – [Eckart and Young](http://dx.doi.org/10.1007/BF02288367)
In this question, we will work through a proof of the Eckart-Young theorem. You are welcome to go look at the paper for the original proof. However, I think the more modern steps below will be easier to follow. Note that you can also find this same proof in many references. Again, you are free to look at these and even copy them. However, I will ask that you explain everything! If you find yourself reading these and thinking, “hmm… why is that true…?” Those are exactly the steps we are expecting you to explain.
The Eckart-Young theorem states:
Let , without loss of generality, assume . Let be the SVD. Let , , and . Define
\mA_k = \sum_{i=1}^k \sigma_i \vu_i \vv_i^T.Then:
\normof{\mA - \mA_k} \le \normof{\mA - \mB}for any matrix of rank .
It’s helpful to translate this into real language. Suppose I give you a matrix , and ask for the closest matrix to but with rank instead of rank instead. Eckart-Young tells us that we can construct such as matrix from the SVD.
In this problem, we’ll use the matrix 2-norm.
Explain why we don’t lose any generality with the assumption .
Warmup Suppose that is a diagonal matrix with non-negative entries. What is the best diagonal rank approximation to ? (Yes, you have to prove it. No, you cannot assume the Eckart-Young theorem is true!)
Warmup Show that .
Warmup Show that where is the rank of . Why does this mean we can assume that without losing any further generality?
Show that .
We’ll proceed by contradiction now. Suppose that a rank matrix exists with . Briefly justify why:
\normof{(\mA - \mB) \vx} < \sigma_{k+1} \normof{\vx}.Using the last result, we can choose to be anything we want. Note that is rank . This means it must have a null-space of dimension . Show that, for any vector in the null-space of that:
\normof{\mA \vw} < \sigma_{k+1} \normof{\vw}.Now we have a subspace of such that is bounded above. We’ll now show that there’s another subspace where it’s bounded below! Let . Show that . (Hint, note that is an orthogonal basis for that span.)
What’s the dimension of the space where is bounded above? What’s the dimension of the space where is bounded below? Why does this lead to a contradiction?
Give yourself a pat on the back!
Download the matrix ‘Yale_32x32.mat’ from http://www.cad.zju.edu.cn/home/dengcai/Data/Yale/Yale_32x32.mat Recall, you can use
urlwrite('http://www.cad.zju.edu.cn/home/dengcai/Data/Yale/Yale_32x32.mat','Yale_32x32.mat');
as we did in the matlab/cosine_similarity.m file
Compute the singular value decomposition of the matrix of images as a “pixels faces” matrix. What is the largest singular value?
Plot the leading singular vector as a pixel image. What do you see?
3. If you’re curious about this, this website has more info http://en.wikipedia.org/wiki/Eigenface. Note that it describes something slightly different from what we are doing in the first two parts. The key difference is that our matrix isn’t centered with respect to the images.
Implement a backsolve as a function in Matlab. Show your code.
Construct a random upper-triangular linear system via:
A = triu(randn(n)); b = randn(n,1);
Compare the performance of your backsolve to Matlab’s \ method to solve a linear system.
In this problem, we’ll meet one of the most common matrices studied in numerical linear algebra: the -Laplacian. We arrive at this matrix by discretizing a partial differential equation. Poisson’s equation is:
\Delta u = f
where is a continuous function defined over the unit-plane (i.e. ), is a continuous function defined over the same region, and is the Laplacian operator:
\Delta u = \partial^2 u/\partial x^2 + \partial^2 u/\partial y^2.
Given a function , we want to find a function that satifies this equation. There are many approaches to solve this problem theoeretically and numerically. We’ll take a numerical approach here.
Suppose we discretize the function at regular points , and where so that we have:
u(x,y) \approx \begin{matrix} u(x_0, y_0) & \cdots & u(x_n, y_0) \\ \vdots & \ddots & \vdots \\
u(x_0, y_n) & \cdots & u(x_n, y_n). \end{matrix}
For this discretization, note that
\begin{aligned} \Delta u(x_i, y_j) & \approx {n^2} \left( u(x_{i-1},y_{j}) - 2 u(x_i,y_j) + u(x_{i+1},y_j) \right) \\ & \qquad+
{n^2} \left( u(x_{i},y_{j-1}) - 2 u(x_i,y_j) + u(x_{i},y_{j+1}) \right) \\ &
= {n^2} \left( u(x_{i-1},y_{j})+ u(x_{i},y_{j-1}) - 4 u(x_i,y_j) + u(x_{i+1},y_j) + u(x_{i},y_{j+1}) \right) \\ &
= f(x_i, x_j).
\end{aligned}
What we’ve done here is use the approximation:
\partial^2 u / \partial x^2 \approx \frac{1}{h^2} \left( u(x-h) - 2 u(x) + u(x+h) \right)
for both partial terms.
We need this equation to hold at each point . But note that there are some issues with this equation at the boundary values (where x=0 or 1, or where y=0 or 1).
For this problem, we’ll make it very simple and set:
u(0,y_j) = u(1,y_j) = u(x_i,0) = u(x_i,1) = 0.
Now, we’ll do what we always do! Turn this into some type of matrix equation!
Let be an matrix that we’ll index from zero instead of one:
\mU = \bmat{U_{0,0} & \cdots & U_{0,n} \\ \vdots & \ddots & \vdots \\ U_{n,0} & \cdots & U_{n,n} }.
where . At this point, we are nearly done. What we are going to do is turn Poisson’s equation into a linear system. This will be somewhat like how we turned image resampling into a matrix vector equation in the last homework.
In order to write as a vector, we’ll keep the convention from last time:
\mU = \bmat{ u_1 & \cdots & u_{n+1} \\ u_{n+2} & \cdots & u_{2(n+1)} \\ \vdots & \ddots & \vdots \\ u_{n(n+1) + 1} & \cdots & u_{(n+1)(n+1)} }.
Let be the vector of elements here. Note that our approximation to , just involved a linear combination of the elements of . This means we have a linear system:
\mA \vu = \vf
where and correspond to all the rows in:
\frac{1}{h^2} \left( u(x_{i-1},y_{j})+ u(x_{i},y_{j-1}) - 4 u(x_i,y_j) + u(x_{i+1},y_j) + u(x_{i},y_{j+1} \right) = f(x_i, y_j).
Let . Write down the linear equation for including all the boundary conditions. Note that you can encode the boundary conditions by adding a row of where: .
Write a Matlab code to construct the matrix and vector when and . Here’s some pseudo-code to help out:
n = 10;
A = zeros((n+1)^2,(n+1)^2);
f = zeros((n+1)^2,1);
G = reshape(1:((n+1)^2), n+1, n+1)';
for i=0:n
for j=0:n
row = G(i+1,j+1);
if i==0 || j == 0 || i==n || j==n
% we are on a boundary
f(row) = 0;
<fill in statement to set A(row,:)>
else
% we are NOT on a boundary
f(row) = n^2;
<fill in statement to set A(row,:)>
end
end
end
Solve for using Matlab’s backslash solver show the result using the mesh function.
I’ve removed problem 8
I collaborated with
________while working on this problem.
In class, we showed that the first column of a QR factorization of a column vector is:
\va = \underbrace{(\va/\normof{\va})}_{\mQ} \underbrace{\normof{\vq}}_{\mR}.
Next, we showed that the full QR factorization is given by:
\va = \mH \normof{\va} \ve_1,
where is a Householder reflector. This requires that .
I want you to show that directly from the definition of , that is:
\mH = \eye - 2 \frac{\vv \vv^T}{\vv^T \vv}
where . I gave this one a few stabs myself and couldn’t come up with anything that wasn’t a tedious derivation. But I didn’t think about it that long. I’d love to see if there’s a slick proof of this result, again, directly from the defintion – and hence, this bonus question. To get full points, you can just slog it out based on the definition.