My main interest is in Algorithms and Databases. Main focus of my current work is in designing provably I/O efficient and compression-aware indexing data structures.
My Erdos Number is 3. Here are 2 vertex disjoint paths.
Path1: Rahul Shah---Sachin Lodha---Endre Szemeredi---Paul Erdos
Path2: Rahul Shah---Martin Farach-Colton---Noga Alon---Paul Erdos
My thesis:
Mainly consists of two separate parts:
(1) is the problem motivated by computational biology (phylogeny construction), simply worded –Given n points and n(n-1)/2 pairwise distances between them, the goal is to find a weighted tree (phylogeny) which best fits these distances. We consider ordinal phylogeny where instead of pairwise distances only ordinal information (in form of total order or partial order) is available on the pairwise distances. Main result: Given a total order on pairwise distances, it is NP-hard to determine whether there is a weighted tree which maintains the order.Current Projects:
Entropy Compressed Data Structures:
(4) Space lower bounds for Compressed Text Indexing
Uncertainty Management in Databases:
(1) Modeling
(2) Indexing
(3) Join Processing
(4) Query estimation and optimization
Parallel Disk Model:
(1) Online Paging on PDM with optimal competitive bounds.
Other Projects:
Mining deviants in time series data streams:
A method for finding (mathematically well defined) outliers on data streams using polylogarithmic space and per-item processing time.
Rank-aware Databases:
Predicting input size for top-k queries. Cost estimation for query optimization.
SP-GiST bulk loading:
Method for I/O efficiently bulk loading class of space partitioning trees.
Content based Multicast:
Framework and algorithms for efficiently distributing heterogeneous multicast content.
Refer to my publications page for manuscripts.