Modern matrix methods for large scale data and networks 2012

Minisymposium at SIAM Applied Linear Algebra

Organized by David F. Gleich

Every few years, the new applications for matrix methods arise and challenge existing paradigms. The talks in this mini-symposium sample some of the research that has arisen out of new applications in large scale machine learning, network problems, and data analysis. Much of the research presented at this mini-symposium will have an interesting twist on a classical matrix problem – linear systems, least squares, or eigenvalues – that better fits the current problems.

Updates: 2012-07-04 Slides posted! Click the talk titles below.

Talks

Nonlinear Eigenproblems in Data Analysis and Graph Partitioning

Matthias Hein, Saarland University

It turns out that many problems in data analysis and beyond have natural formulations as nonlinear eigenproblems. In this talk I present our recent line of research in this area. This incluces the efficient computation of nonlinear eigenvectors via a generalization of the inverse power method and a general result showing that a certain class of NP-hard combinatorial problems such as balanced graph cuts or the (constrained) maximum density subgraph have tight relaxations as nonlinear eigenproblems.

LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems

Xiangrui Meng, Stanford University
with Michael Saunders (Stanford), and Michael Mahoney (Stanford)

We develop a parallel iterative least squares solver named LSRN that is based on random normal projection. It computes the unique minimum-length solution to $\min_{x \in \mathbb{R}^n} \|A x - b\|_2$ , where $A \in \mathbb{R}^{m \times n}$ can be rank-deficient with $m \gg n$ or $m \ll n$ . $A$ can be a dense matrix, a sparse matrix, or a linear operator, and LSRN speeds up automatically on sparse matrices and fast operators. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size $\lceil \gamma \min(m,n) \rceil \times \min(m,n)$ , where $\gamma$ is moderately larger than $1$ , e.g., $\gamma = 2$ . We show that the preconditioned system is well-conditioned with a strong concentration result on the extreme singular values. Hence the number of iterations is fully predictable if we apply LSQR or the Chebyshev semi-iterative method to the preconditioned system. The latter method is particularly efficient for solving large-scale problems on clusters with high communication cost. LSRN is also capable of handling certain types of Tikhonov regularization. Numerical results show that LSRN outperforms LAPACK’s DGELSD on large-scale dense problems and MATLAB’s backslash (SuiteSparseQR) on sparse problems on a shared memory machine, and LSRN scales well on Amazon Elastic Compute Cloud (EC2) clusters.

Solving Large Dense Linear Systems with Covariance Matrices

Jie Chen, Argonne National Labs

Gaussian processes are a fundamental tool in spatial/temporal statistics, and they have broad applications to fields such as nuclear engineering and climate science. The maximum likelihood approach for fitting a Gaussian model requires the manipulation of the covariance matrix through inversion, logarithmic action and trace computation, all of which pose significant challenges for large dense matrices. We consider a reformulation of the maximum likelihood through a stochastic approximation framework, which narrows down the several linear algebra challenges to solving a linear system with the covariance matrix for multiple right-hand sides. In this talk I will present an iterative technique integrating the designs of the solver, the preconditioner and the matrix-vector multiplications, which lead to a scalable method for solving the maximum likelihood problem in data analysis.

Fast Coordinate Descent Methods with Variable Selection for Non-negative Matrix Factorization

Inderjit S. Dhillon, UT Austin
with Cho-Jui Hsieh, UT Austin

Non-negative Matrix Factorization (NMF) is a dimension reduction method for non-negative matrices, and has proven to be useful in many areas, such as text mining, bioinformatics and image processing. In this talk, I will present a new coordinate descent algorithm for solving the least squares NMF problem. The algorithm uses a variable selection scheme that employs the gradient of the objective function. This new method is considerably faster in practice, especially when the solution is sparse, as is often the case in real applications. In these cases, our method benefits by selecting important variables to update more often, thus resulting in higher speed. As an example, on a text dataset RCV1, our method is 7 times faster than FastHals, which is a cyclic coordinate descent algorithm, and more than 15 times faster when the sparsity is increased by adding an L1 penalty. We also develop new coordinate descent methods when error in NMF is measured by KL- divergence by applying the Newton method to solve the one-variable sub-problems. Experiments indicate that our algorithm for minimizing the KL-divergence is faster than the Lee and Seung multiplicative rule by a factor of 10 on the CBCL image dataset.

David Gleich