With the advent of Web 2.0, web
sites actively involve visitors in the provision of content, called user
generate content. User generated content takes multiple forms, e.g., video
sharing, products, social networking, and reviews. We focus on user reviews.
With the nascence of social media, entities (e.g., hotels, restaurants, books)
presented on web sites are now accompanied by user generated content in the
form of reviews. Many data-intensive applications collect and integrate such
data from a variety of Web sources. A key task in this process is entity
matching, which is the problem of determining the records from these sources
that refer to the same real-world entities. Traditional approaches use the
record representation of entities to accomplish this task. We argue that this
hitherto untapped source of entity information can be used in entity matching.
In this talk, we present opportunities, challenges, and preliminary results in
entity matching with user generated content.