Referred conference publications

Reid Andersen, David F. Gleich, and Vahab S Mirrokni. Overlapping clusters for distributed computation. In Poster proceedings of the SIAM Workshop on Combinatorial and Scientific Computing (CSC), 2011. Poster. [ bib | local ]

Paul G. Constantine and David F. Gleich. Tall and skinny QR factorizations in MapReduce architectures. In Proceedings of the second international workshop on MapReduce and its applications, MapReduce '11, pages 43-50, New York, NY, USA, 2011. ACM. [ bib | DOI | local ]

David F. Gleich and Lek-Heng Lim. Rank aggregation via nuclear norm minimization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '11, pages 60-68, New York, NY, USA, 2011. ACM. [ bib | DOI | local ]

The process of rank aggregation is intimately intertwined with the structure of skew-symmetric matrices. We apply recent advances in the theory and algorithms of matrix completion to skew-symmetric matrices. This combination of ideas produces a new method for ranking a set of items. The essence of our idea is that a rank aggregation describes a partially filled skew-symmetric matrix. We extend an algorithm for matrix completion to handle skew-symmetric data and use that to extract ranks for each item. Our algorithm applies to both pairwise comparison and rating data. Because it is based on matrix completion, it is robust to both noise and incomplete data. We show a formal recovery result for the noiseless case and present a detailed study of the algorithm on synthetic data and Netflix ratings.

David F. Gleich, Paul G. Constantine, Abraham Flaxman, and Asela Gunawardana. Tracking the random surfer: empirically measured teleportation parameters in PageRank. In WWW '10: Proceedings of the 19th international conference on World wide web, pages 381-390, April 2010. [ bib | DOI | local ]

PageRank computes the importance of each node in a directed graph under a random surfer model governed by a teleportation parameter. Commonly denoted alpha, this parameter models the probability of following an edge inside the graph or, when the graph comes from a network of web pages and links, clicking a link on a web page. We empirically measure the teleportation parameter based on browser toolbar logs and a click trail analysis. For a particular user or machine, such analysis produces a value of alpha. We find that these values nicely fit a Beta distribution with mean edge-following probability between 0.3 and 0.7, depending on the site. Using these distributions, we compute PageRank scores where PageRank is computed with respect to a distribution as the teleportation parameter, rather than a constant teleportation parameter. These new metrics are evaluated on the graph of pages in Wikipedia.

Pooya Esfandiar, Francesco Bonchi, David F. Gleich, Chen Greif, Laks V. S. Lakshmanan, and Byung-Won On. Fast Katz and commuters: Efficient approximation of social relatedness over large networks. In Algorithms and Models for the Web Graph, 2010. [ bib | DOI | local ]

Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and a quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.

Mohsen Bayati, Margot Gerritsen, David F. Gleich, Amin Saberi, and Ying Wang. Algorithms for large, sparse network alignment problems. In Proceedings of the 9th IEEE International Conference on Data Mining, pages 705-710, December 2009. [ bib | DOI | arXiv | local ]

We propose a new distributed algorithm for sparse variants of the network alignment problem that occurs in a variety of data mining areas including systems biology, database matching, and computer vision. Our algorithm uses a belief propagation heuristic and provides near optimal solutions for an NP-hard combinatorial optimization problem. We show that our algorithm is faster and outperforms or nearly ties existing algorithms on synthetic problems, a problem in bioinformatics, and a problem in ontology matching. We also provide a unified framework for studying and comparing all network alignment solvers.

Paul G. Constantine and David F. Gleich. Using polynomial chaos to compute the influence of multiple random surfers in the PageRank model. In Anthony Bonato and Fan Chung Graham, editors, Proceedings of the 5th Workshop on Algorithms and Models for the Web Graph (WAW2007), volume 4863 of Lecture Notes in Computer Science, pages 82-95. Springer, 2007. [ bib | DOI | local ]

The PageRank equation computes the importance of pages in a web graph relative to a single random surfer with a constant teleportation coefficient. To be globally relevant, the teleportation coefficient should account for the influence of all users. Therefore, we correct the PageRank formulation by modeling the teleportation coefficient as a random variable distributed according to user behavior. With this correction, the PageRank values themselves become random. We present two methods to quantify the uncertainty in the random PageRank: a Monte Carlo sampling algorithm and an algorithm based the truncated polynomial chaos expansion of the random quantities. With each of these methods, we compute the expectation and standard deviation of the PageRanks. Our statistical analysis shows that the standard deviation of the PageRanks are uncorrelated with the PageRank vector.

David F. Gleich and Leonid Zhukov. Scalable computing with power-law graphs: Experience with parallel PageRank. In SuperComputing 2005, November 2005. Poster. [ bib | local | .pdf ]

Dennis Decoste, David F. Gleich, Tejaswi Kasturi, Sathiya Keerthi, Omid Madani, Seung-Taek Park, David M. Pennock, Corey Porter, Sumit Sanghai, Farial Shahnaz, and Leonid Zhukov. Recommender systems research at Yahoo! Research Labs. In Beyond Personalization, San Diego, CA, January 2005. Position Statement. [ bib | local ]

We describe some of the ongoing projects at Yahoo! Research Labs that involve recommender systems. We discuss recommender systems related problems and solutions relevant to Yahoo!’s business.

David F. Gleich, Leonid Zhukov, Matthew Rasmussen, and Kevin Lang. The World of Music: SDP embedding of high dimensional data. In Information Visualization 2005, 2005. Interactive Poster. [ bib | local ]

In this paper we investigate the use of Semidefinite Programming (SDP) optimization for high dimensional data layout and graph visualization. We developed a set of interactive visualization tools and used them on music artist ratings data from Yahoo!. The computed layout preserves a natural grouping of the artists and provides visual assistance for browsing large music collections.

David F. Gleich and Leonid Zhukov. An SVD based term suggestion and ranking system. In ICDM '04: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM'04), pages 391-394, Brighton, UK, November 2004. IEEE Computer Society. [ bib | DOI | local ]

In this paper, we consider the application of the singular value decomposition (SVD) to a search term suggestion system in a pay-for-performance search market. We propose a novel positive and negative refinement method based on orthogonal subspace projections. We demonstrate that SVD subspace-based methods: 1) expand coverage by reordering the results, and 2) enhance the clustered structure of the data. The numerical experiments reported in this paper were performed on Overture's pay-per-performance search market data.