2nd International Workshop on
Business intelligencE and the WEB
Uppsala, Sweden, March 25, 2011
Sihem Amer-Yahia (Senior Research Scientist at Yahoo! Barcelona).

Title: I Am Complex: Cluster Me, Don't Just Rank Me

Short bio: Sihem spent 7 years as Member of Technical Staff at AT&T Labs and has been Senior Research Scientist at Yahoo! since May 2006. She holds a Ph.D. in Computer Science from U. Paris-Orsay and INRIA. Her research is on data management, query processing and relevance models that leverage social behavior for online content serving. Sihem has chaired SIGMOD 2009 Tutorials and several conference tracks including CIKM 2008, VLDB 2009, ICDE 2010, WWW 2010, SIGMOD 2011, and EDBT/ICDT 2012. She is a member of the Board of Trustees of the VLDB Endowment and of the ACM SIGMOD Executive Committee. Sihem serves as the Area Editor for TODS, VLDB Journal, and Information Systems.

Abstract: A large number of online applications are built over high dimensional data. That is the case for shopping where products have several features (e.g., price and color), dating where personal profiles are described using several dimensions (e.g., physical features and political views), and entertainment (e.g., movie genre and director, restaurant ambiance and location). I will argue that the 10-blue links experience we are used to in Web search, keywords as input - ranked list as output, is inappropriate when querying highly structured data. For example, a user looking for 1- or 2-bedroom apartments sorted by price will see a large number of cheap 1-bedrooms in undesirable neighborhoods before seeing any apartment with di?erent features. An alternative to ranking is to cluster results on their attributes and describe the clusters (e.g., cheap 2-bedrooms with 2 baths). However, not all clusters will be of interest to users. I will discuss two approaches: persona-driven search for which we have preliminary ideas in restaurant search, and rank-aware clustering for which we have results of a large-scale user study and a performance evaluation over datasets from a leading dating site.

Xin Luna Dong (AT&T Labs-Research).

Title: SOLOMON: Seeking the Truth Via Copying Detection

Short bio: Dr. Xin Luna Dong is a researcher at AT&T Labs-Research. She received a Ph.D. in Computer Science and Engineering from University of Washington in 2007, received a Master's Degree in Computer Science from Peking University in China in 2001, and received a Bachelor's Degree in Computer Science from Nankai University in China in 1988. Her research interests include databases, information retrieval and machine learning, with an emphasis on data integration, data cleaning, personal information management, and web search. She has led the Solomon project, whose goal is to detect copying between structured sources and to leverage the results in various aspects of data integration, and the Semex personal information management system, which got the Best Demo award (one of top-3) in Sigmod'05. She co-chaired WebDB'10 and has served in the program committee of Sigmod'11, VLDB'11, PVLDB'10, WWW'10, ICDE'10, VLDB'09, etc.

Abstract: We live in the Information Era, with access to a huge amount of information from a variety of data sources. However, data sources are of different qualities, often providing conflicting, out-of-date and incomplete data. Data sources can also easily copy, reformat and modify data from other sources, propagating erroneous data. These issues make the identification of high quality information and sources non-trivial.In this talk we present the SOLOMON system, whose core is a module that detects copying between sources. We show how we can effectively detect copying relationship between data sources, leverage the results in various aspects of data integration, and provide a user-friendly interface to facilitate users in identifying sources that best suit their information needs.
