Better Uncertain Data Processing via Program Analysis

Uncertain data processing is becoming more and more important. In scientific computation, data are collected through instruments or sensors that may be exposed to rough environmental conditions, leading to errors. Computational processing of these data may hence draw faulty conclusions. For example, a protein may be mistakenly classified as a cancer indicator by slightly altering a parameter of the program used to process experimental data. Such parameters are uncertain because they are provided by biologists based on their experience. Such mistakes may be highly costly because expensive follow-up wet-bench experiments may be guided by the faulty results.

Traditionally, uncertainty analysis is conducted on the underlying mathematical models. However, modern data processing uses more complex models and relies on computers and programs. In this project, we aim to address the uncertain data processing problem from the program analysis perspective.

Recently, we have made the following progress.

Funding

Towards Scalable and Comprehensive Uncertain Data Management, NSF-III-0916874, 2009-2012.

Students

Publications

ICSE W. N. Sumner, T. Bao, X. Zhang, and S. Prabhakar . Coalescing Executions for Fast Uncertainty Analysis ,
IEEE/ACM International Conference on Software Engineering, 2011.

[abstract][pdf]
VLDB Mingwu Zhang, Xiangyu Zhang, Xiang Zhang, Sunil Prabhakar. Tracing Lineage Beyond Relational Operators ,
the 33rd International Conference on Very Large Databases, 2007.

[abstract][pdf]