This research addresses combinatorial and algorithmic problems related to searching and matching of strings and slightly more complicated structures such as regular expressions, arrays, trees, and compounds thereof. These problems arise in a vast variety of applications, and their impact is widely perceived as more and more crucial in future information infrastructures, as unprecedented volumes of information are increasingly amassed, disseminated and shared. A list would include the design of structures for the efficient management of large repositories of strings, arrays, and special types of graphs, fundamental primitives such as the various variants of exact and approximate searching, specific applications such as the identification of periodicities, and other regularities, efficient implementations of ancillary functions such as compression and encoding of elementary discrete objects, etc. The main objective of this project is to abstract and clearly identify these problems, develop new techniques and efficient algorithms to solve them, both by serial and parallel/distributed computation, and implement the new algorithms.
This research represents a joint effort by teams having bases at
Polytechnic University, Georgia Tech, and Purdue University,
respectively. These teams have marked commonalities in backgrounds
and complementary expertise in highly specialized subareas of the
field. The research represents in part a natural extension of the
work carried out by the PI's under previous NSF sponsorships.