This course examines
both theoretical and practical aspects of numerical algorithm design
and implementation on parallel computing platforms. In particular, it
provides an understanding of the tradeoff between arithmetic complexity
and management of hierarchical memory structures, roundoff
characteristics if different from the sequential schemes, and
performance evaluation and enhancement. Applications are derived from
computational science and engineering.
Credit:
3 hours
Prerequisite: CS 525, and CS 514 or CS 515; or consent
of instructor.
Format: 3 hours of lecture per week, and
laboratory work (mainly using ITaP parallel computing platforms).
Required textbook: None
Instructor: Ahmed H. Sameh
Outline
Chapter 1 --
Preliminaries
Models of parallel architecture
Principles of parallel algorithm design
Exploiting data locality - vector
registers and caches
Limits of parallelism
Amdahl's law and variations
Scalability
Basic kernels for dense and sparse matrix
computations
Part I - Dense Matrix Computations >
Chapter 2 - Direct
Linear System Solvers >
LU-factorization and variations >
Triangular system solvers >
Narrow-banded system solvers >
block cyclic reduction >
the Spike algorithm >
tridiagonal systems >
Toeplitz system solvers >
Fast fourier transforms >
Circulat systems >
Chapter 3 - Linear
Least Squares Schemes >
Orthogonal factorization >
Givens reduction >
Householder reduction >
Gram-Schmidt schemes >
Block algorithms >
Saddle-point formulation >
Domain decomposition >
Chapter 4 - The
Symmetric Eigenvalue Problem >
Jacobi schemes >
Reduction to tridiagonal form >
Tridiagonal eigenvalue problem solvers
for obtaining >
all eigenvalues or eigenpairs >
selected eigenvalues or eigenpairs >
Chapter 5 - The Singular-value
Decomposition >
Reduction to bidiagonal form and the QR
schemes >
Jacobi schemes >
>
Part II - Sparse Matrix
Computations >
Chapter 6 - Sparse
Matrices >
Sources
o discretization of partial differential equations >