Probabilistic and Uncertain Databases

Managing probabilistic and uncertain data is a topic of much current interest among database researchers. This page is meant to serve as an information-sharing resource for projects in this area.



Research Groups

The following groups are listed here (in alphabetical order) because they participated in some ad hoc meetings on probabilistic/uncertain databases. This is not intended to be an exhaustive list of people doing this sort of research.


Avatar, IBM Almaden
http://www.almaden.ibm.com/software/projects/avatar/

Description under construction...


HeisenData, Intel/Berkeley
http://www.intel-research.net/berkeley/

Description under construction...


MystiQ, University of Washington
http://www.cs.washington.edu/homes/suciu/project-mystiq.html

MystiQ is a probabilistic database system that focuses on efficient processing of SQL queries. It combines two query evaluation techniques. The first pushes the computation of the output probabilities in the database engine using a technique called "safe plans"; the second runs a Monte Carlo simulation in the middleware, guiding the simulation steps to quickly identify and rank the top k most probable answers. MystiQ supports SELECT-FROM-WHERE-GROUPBY queries, is compatible with postgres, SQL Server, and DB2, and is available under license from the University of Washington.


Orion, Purdue University
http://orion.cs.purdue.edu/

Orion is a general purpose database management system with native support for the collection, storage, and processing of uncertain and imprecise data. The system handles both continuous and discrete uncertain values, and provides various indexes for efficient query evaluation. It is implemented in C and PL/pgSQL, and runs entirely inside of PostgreSQL.


Prob DBs, University of Maryland
http://www.cs.umd.edu/areas/Databases/

We are working on several projects related to probabilistic databases. Our current focus is on (a) developing techniques to represent and query complex tuple correlations in probabilistic databases, (b) spatio-temporal probabilistic databases, (c) supporting statistical modeling of data inside relational database systems (MauveDB), and (d) machine learning methods for relational and semi-structured data (aka Statistical Relational Learning).


Trio, Stanford University
http://infolab.stanford.edu/trio/

In the Trio project at Stanford, we are building a new kind of database management system: one in which data, uncertainty of the data, and data lineage are all first-class citizens. Trio is based on an extended relational model called ULDBs, and it supports a SQL-based query language called TriQL. We have completed an initial working prototype of the Trio system.



Meetings and Events

Here are some recent and upcoming events of interest to researchers of probabilistic data management and uncertainty in database systems.


TDM'06 (University of Twente)
http://wwwhome.cs.utwente.nl/~tdm/

A workshop on "Uncertainty in Databases" was held at the University of Twente (Enschede, The Netherlands) on June 6th, 2006. "The goal of this full day workshop is to bring together researchers and companies interested in the issue of uncertainty in data, and in particular, those concerned with problems and perspectives of handing uncertain data in the domain of databases." Slides and notes are available on their website.


"UPDB Day" (Stanford University)

Our first meeting was held at Stanford on September 22, 2006. Representatives from IBM Almaden, Intel Berkeley, Maryland, Purdue, Stanford, and Washington attended. Major results of this meeting included setting up a mailing list, creating this web page, and proposing to share data sets. The next meeting was tentatively planned for late Spring 2007 to be hosted by Maryland. Slides and notes from some of the presentations are available:



Last Updated: 12-Feb-2007 at 2:10 PM by cmayfiel