My main interest is in Algorithms and Databases. Main focus of my current work is in designing provably I/O efficient and compression-aware indexing data structures.

 

My Erdos Number is 3. Here are 2 vertex disjoint paths.

Path1: Rahul Shah---Sachin Lodha---Endre Szemeredi---Paul Erdos

Path2: Rahul Shah---Martin Farach-Colton---Noga Alon---Paul Erdos

 

 

My thesis:

Mainly consists of two separate parts:

(1) is the problem motivated by computational biology (phylogeny construction), simply worded –Given n points and n(n-1)/2 pairwise distances between them, the goal is to find a weighted tree (phylogeny) which best fits these distances. We consider ordinal phylogeny where instead of pairwise distances only ordinal information (in form of total order or partial order) is available on the pairwise distances. Main result: Given a total order on pairwise distances, it is NP-hard to determine whether there is a weighted tree which maintains the order.
(2) broadly classifies as dynamic programming improvement techniques, especially when dynamic programming is based on some distance metric and is carried over a tree. Main result: Many tree dynamic problems (esp. from location theory) which were quadratic time can be done in O(nlogn) time.  

 

[Abstract]

 

Current Projects:

 

Entropy Compressed Data Structures:

(1) Data-aware Dictionary Compression

(2) Dynamic Compressed Text Indexing, Labelled Trees and XML compression

(3) External Memory Compressed Text Indexing

(4) Space lower bounds for Compressed Text Indexing

 

Uncertainty Management in Databases:

(1) Modeling

(2) Indexing

(3) Join Processing

(4) Query estimation and optimization

 

Parallel Disk Model:

(1) Online Paging on PDM with optimal competitive bounds.

 

Other Projects:

 

Mining deviants in time series data streams:

A method for finding (mathematically well defined) outliers on data streams using polylogarithmic space and per-item processing time.

 

Rank-aware Databases:

Predicting input size for top-k queries. Cost estimation for query optimization.

  

SP-GiST bulk loading:

Method for I/O efficiently bulk loading class of space partitioning trees.

 

Content based Multicast:

Framework and algorithms for efficiently distributing heterogeneous multicast content.

  

Refer to my publications page for manuscripts.