III: Small: Towards Scalable and Comprehensive Uncertain Data Management

Sponsor: National Science Foundation

This material is based upon work supported by the National Science Foundation under Grant No: IIS-09168724

Principal Investigators:
Sunil Prabhakar
Xiangyu Zhang

Graduate Students:

Project Summary

Due to the importance of uncertain data for a large number of applications, there has been significant recent interest in database support for uncertain data. Existing work in this area includes new models for uncertain data, prototype implementations, and efficient query processing algorithms for specific types of queries. Despite the recent efforts, several important aspects of uncertain data management remain unexplored. This project addresses two of these areas: Query Optimization and Support for Non-Relational Operators. The first goal is about efficient execution of uncertain data queries. As with traditional data, efficient execution is necessary for ensuring the viability of uncertain data management systems. However, due to the complications of ensuring correct results, and the need for CPU-intensive operations over probability distributions, the goal is critical and challenging. In this project, automatic query optimizations are developed, through query rewriting rules that involve probability threshold operators, corresponding access methods, and cost estimation functions. The difficulty of handling uncertainty when dealing with non-relational operators has been expressed in many domains. The project aims to advance the capability of tracking the exact impact of uncertain inputs as data is processed by arbitrary programs, leveraging advanced techniques from the area of program analysis. A key problem with traditional Monte Carlo based solutions lies in correctly identifying independence in the output of Monte Carlo simulations. Data lineage tracing, which identifies the set of inputs used to compute an output value, is used to address the challenge. Furthermore, a program dependence tracing based approach is devised to trace the propagation of uncertainty during execution of arbitrary binary code. The technique does not rely on Monte Carlo simulations, and does not require access to source code or domain knowledge.

Goals, Objectives, and Targeted Activities

The goals of the project are to enhance uncertain data management beyond relational operators and provide efficient evaluation of relational queries over uncertain data.



Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Last Modified by Sunil Prabhakar on 23rd February, 2014.