Algorithmic and Combinatorial issues in Pattern Matching

Principal Investigators: Alberto Apostolico

This research addresses combinatorial and algorithmic problems related to searching and matching of strings and slightly more complicated structures such as regular expressions, arrays, trees, and compounds thereof. These problems arise in a vast variety of applications, and their impact is widely perceived as more and more crucial in future information infrastructures, as unprecedented volumes of information are increasingly amassed, disseminated and shared. A list would include the design of structures for the efficient management of large repositories of strings, arrays, and special types of graphs, fundamental primitives such as the various variants of exact and approximate searching, specific applications such as the identification of periodicities and other regularities, efficient implementations of ancillary functions such as compression and encoding of elementary discrete objects, etc. The main objective of this project is to abstract and clearly identify these problems, develop new techniques and efficient algorithms to solve them, both by serial and parallel/distributed computation, and implement the new algorithms. This research represents a joint effort by teams having bases at Polytechnic University, Georgia Tech. and Purdue University, respectively. These teams have marked commonalities in backgrounds and complementary expertise in highly specialized subareas of the field. The research represents in part a natural extension of the work carried out by the PI's under previous NSF sponsorships.