Performance Evaluation and Optimization for Scientific Kernels
Principal Investigators: Ananth Grama, Ahmed Sameh, Vivek Sarin
Sponsor: Silicon Graphics Inc.
This project investigates a range of algorithmic, architectural,
and system
software issues related to performance evaluation and optimization
on SGI
machines. Our ongoing research has focused on the use of threaded
APIs for
optimized parallel kernels in linear algebra on the Origin 2000. We
have
achieved parallel efficiencies in excess of 85% on reduced-error
multipole
techniques for up to 32 processors using POSIX threads. Similar
performance
has also been demonstrated for multilevel solvers used in modeling
incompressible particulate fluid flows. The proposed project
extends this
research in several directions:
-
Study the impact of heterogeneous communication layers (Origin
2000
multiclusters of over 64 processors) on performance of threaded
APIs.
Specifically, we are investigating the impact of higher latency
meta-routers
and the ability of threads to mask additional latency.
-
Using performance data (cache misses, synchronization penalty,
context
switches) to optimize scheduling for both uniprocessor and parallel
performance. This study focuses on exploiting the iterative nature
of most
applications to enhance data access locality.
-
Investigating architectural implications of parallel overheads.
Specifically, identify machine parameters that most critically
impact the
performance of various kernels and techniques for alleviating
performance
bottlenecks.
-
Algorithmic improvements for minimizing overheads. Often, a more
expensive
algorithm is capable of yielding better processor performance.
Investigating
such algorithms may be key to faster problem solving.
The issues addressed in this proposal impact critical design
decisions for
hardware, software, and application engineers. In addition to
methodologies
for improving performance, the project will result in optimized
kernels for
target applications in sparse linear algebra and n-body methods.
1998
Annual Research Report
Department of
Computer Sciences