The goal of this project was to investigate solutions methods for the PageRank problem on a distributed parallel architecture. We implemented a parallel PageRank computation and experimental system using the MPI and PETSc libraries. This webpage documents some of our results on this project.
Presentations and Publications
Our System and Results
For the PageRank problem, there are many computational approaches. The following flowchart summarizes our options for computing PageRank in parallel.
The following picture describes our system. At the base we have a 120-node Beowulf cluster of RLX nodes. Each node has dual Intel Xeon 2.8 GHz processors and 4 GB RAM. These nodes are connected via a 120-port Gigabit Ethernet switch.
Above the hardware layer, we use the MPI standard to control the operation of the cluster and use the PETSc library for its efficient parallel numerical routines. At the top, we have our Parallel PageRank code which is a modular, experimental framework to rapidly prototype and evaluate parallel methods for PageRank computation.
Our results are summarized in the following table show the benefit of this approach.
For the 6 graphs presented here, we present runtimes on the number of processors listed underneath the graph. The size of the graph is given in number of pages (nodes) first and number of links (edges) second. For each method, we list the number of iterations taken, the time per iteration, and the total time. The shaded boxes are the best method in a row. A star denotes the use of a preconditioner. The residual used for convergence was 10^-7.
A natural question with a parallel system is the scaling of the problem. The following plot indicates that on a fully-connected network topology, PageRank scales well.