=============================================================================== Trio (Stanford) - Jennifer Widom =============================================================================== * Lineage supports uncertainty - representation, correlation, efficiency - goal for both internal and external * Current Trio Model - more general than discrete or sets - "maybe annotations" (tuple level) - doesn't require confidence values * ULDBs: Interesting Questions - Extraneous alternatives (minimality) - Extraction (drop tables -> errors) * Future work (research questions) - Uncertainty/lineage in Schemas - Optimizer Issues and statistics - alternatives per tuple? - cluster by attribute or verticalize? =============================================================================== MistiQ (Washington) - Dan Suciu =============================================================================== * Tables interpreted as events - configuration file (meta data) - point probablities, guaranteed ranking * Safe plans -vs- monte carlo simulations - heuristics, statistics (like group size) * Applications of ProbDB - Fuzzy object matching: IMDB + AMZN - Information extraction (i.e. in IR) - Cleaning of sensor data =============================================================================== Administrivia =============================================================================== * TODO - proceedings page (Purdue) - email key participants for slides - see http://www-db.stanford.edu/sdt/ - mailing list (Stanford) - sharing data sets - IMDB with prob values (Washington) - Yahoo! product data (Stanford) - Enron and Blogger (IBM) * Results 1. Mailing List 2. Web Page 3. Shared Data Sets 4. Meet regularly * Key Questions - approaches: thresholding versus ranking (i.e top k) - prob values: where they come from, how to use them =============================================================================== Avatar (IBM Almaden) - Sriram Raghavan =============================================================================== * Information Extraction (IE) - ie. how UPDB's may solve our problems - Classical IE: accuracy and scalability * Avatar IES for IE, like DBMS for data - declaritive IE (for interactions) - annotation store (tags for patterns) - simple interface (for average user) * Two problems (annotators make mistakes) - High precision -> recall problem - derived annotator anamoly (beta thresholds) * Applications of UPDB's - Precisions for annotator rules - We're asking the "reverse" question * Approach: Computing Probability Models - Simplified with Naive-Bayes assumption - input uncertainties low, outputs high =============================================================================== Prob Databases (Maryland) - Amol, Lise, Prithviraj =============================================================================== * Model-based Views (MauveD) - Step 1: Process data with stat model - Present user with inference views - Efficiently update in stream context * Statistical Relational Learning - apply traditional models to relational data - answers "where do [prob] numbers come from?" - both attribute and structural uncertainty - introduce aggragates for complex rel's - building conditional probability trees - same tricks for existance uncertainty - Probabilistic DBs (compare and contrast) * Arbitrarily correlated data - Probabilistic graphical models (from ML) - Boolean valued random variables - Factor functions over random vars - Similar to carrying event expressions - Query evaluation as inference problem =============================================================================== Orion (Purdue) - Sunil Prabhakar =============================================================================== * Initial focus on attribute uncertainty - what about moving objects / predictions? - brief survey of applications (i.e. sensors) * Independence b/t attributes of same tuple - what about aggragates? sub queries? arithmetic? - joining uncertain attributes? (w/o threshold) * Several highlights of prototype - Quality metrics of probabilistic results - Index structures (i.e. PTI) with thresholds - Comparing uncertain attributes: resolution * {a,b} JOIN {a,b} -> {a,b} | {a,b} Problem assuming attributes are independent =============================================================================== HeisenData (IR Berkeley) - Minos Garofalakis =============================================================================== * PDM for the Digital (Smart) Home - many sensors and actuators - example app: people tracking * Requirements 1. Handle uncertainty and correlation 2. Share knowledge across applications 3. Real-time & retrospective reasoning * Existing Approaches - insufficient granularity - assume tuple dependence * Current Apps only use DBMS as a store - SELECT * -> crunch -> INSERT * * HeisenData Project Goals - Query evidence or model directly - Basis for ML app development * Data Model: Possible Worlds - Evidence Data + Dependencies - Prob_model(World | Evidence) * Hierarchical graphical models - Inherit base model, i.e. spatial dependencies - Use it to compliment evidence (i.e. missed readings)