Demos and Apps
U-MAP: A System for Usage-Based Schema Matching and Mapping -
Publication Tutorial Poster
U-MAP is the first system that uses usage information buried in database query logs to tackle data integration and exchange challenges. U-MAP generates correspondences between the attributes of the source and target schemas, and then complex mapping rules to transform data records from one schema to another using these query logs.
GDR: A System for Guided Data Repair - Try it! Publication
- Improving data quality is a time-consuming, labor-intensive and often domain specific operation. Existing data repair approaches are either fully automated or not efficient in interactively involving the users. We present a demo of GDR, a Guided Data Repair system that uses a novel approach to efficiently involve the user alongside
automatic data repair techniques to reach better data quality as quickly as possible. Specifically, GDR generates data repairs and acquire feedback on them that would be most beneficial in improving the data quality. GDR quantifies the data quality benefit of generated repairs by combining mechanisms from decision theory and active learning. Based on these benefit scores, groups of repairs are ranked and displayed to the user. User feedback is used to train a machine learning component to eventually replace the user in deciding on the validity of a suggested repair. We describe how the generated repairs are ranked and displayed to the user in a .usefullooking"
way and demonstrate how data quality can be effectively improved with minimal feedback from the user.
Privometer: Privacy Protection in Social Networks - Install in Facebook Publication
The increasing popularity of social networks, such as Facebook and Orkut, has raised several privacy concerns. Traditional ways of safeguarding privacy of personal information by hiding sensitive attributes are no longer adequate. Research shows that probabilistic classification techniques can effectively infer such private information. The disclosed sensitive information of friends, group affiliations and even participation in activities, such as tagging and commenting, are considered background knowledge in this process. Privometer is a privacy protection tool that measures the amount of sensitive information leakage in a user profile and suggests self-sanitization actions to regulate the amount of leakage. actions suggested by Privometer. In contrast to previous research, where inference techniques use publicly available profile information, we consider an augmented model where a potentially malicious application installed in the user's friend profiles can access substantially more information. In our model, merely hiding the sensitive information is not sufficient to protect the user privacy. We present an implementation of Privometer in Facebook.
LitCloud: A Literature Social Network - Try it!
LitCloud is an innovative platform to harvest community-contributed literature references and leverage the .social engagement. of a community of scientists through the different interactions that the community has with these references to filter out, rank, and facilitate the discovery and use of these references. The basic premise of LitCloud is that many scientists access tables of content of scientific publications by subscribing to RSS feeds usually provided by publishers via their RSS readers such as Google Reader. They then access, star, tag, comment, and email references that are of interest to them. Hence, LitCloud aims to harvest these references, and capture these users. actions and aggregate them to filter out and rank these references. LitCloud researchers and educators will benefit from a .More Signal, Less Noise. effect, i.e., the focus of the content shared through LitCloud is considerably more focused and of a better quality as it provides better, more relevant recommendations via the social nature of LitCloud. More specifically, LitCloud provides the following benefits: (i) Fight information overload and discover references, new leads and information that are of interest to the user. (ii) Discover popular, recent and top-rated papers with high impact factors within specific disciplines or get personalized recommendations. (iii) Find and follow experts - discover references they have highlighted.
TAILOR: A Record Linkage Toolbox - Download Source Download Java Code Publication
Data cleaning is a vital process that ensures the quality of data stored in real-world databases. Data cleaning problems are frequently encountered in many research areas, such as knowledge discovery in databases, data warehousing, system integration and e-services. The process of identifying the record pairs that represent the same entity (duplicate records), commonly known as record linkage, is one of the essential elements of data cleaning. In this paper, we address the record linkage problem by adopting a machine learning approach. Three models are proposed and are analyzed empirically. Since no existing model, including those proposed in this paper, has been proved to be superior, we have developed an interactive Record Linkage Toolbox named TAILOR. Users of TAILOR can build their own record linkage models by tuning system parameters and by plugging in in-house developed and public domain tools. The proposed toolbox serves as a framework for the record linkage process, and is designed in an extensible way to interface with existing and future record linkage models.