2nd International Workshop on
Business intelligencE and the WEB
Uppsala, Sweden, March 25, 2011
· About the Workshop
· Important dates
· Topics of Interest
· Submission guidelines
· Invited papers
· Accepted papers
· Program
· Committee
· Call for Papers
Prof. Felix Naumann (Chair of Information Systems at the Hasso Plattner Institute, Postdam, Germany).

Title: Dr. Crowdsource: or How I Learned to Stop Worrying and Love Web Data

Short bio: Felix Naumann studied mathematics, economy, and computer sciences at the University of Technology in Berlin. After receiving his diploma in 1997 he joined the graduate school at Humboldt University of Berlin. He completed his PhD. thesis on data quality in 2000. Before moving to the University of Potsdam, he worked at the IBM Almaden Research Center and served as an assistant professor for information integration at the Humboldt-University of Berlin. Since 2006 he holds the chair of Information Systems at the Hasso Plattner Institute.

Abstract: The wealth of freely available, structured information on the Web is constantly growing. Driving domains are public data from and about governments and administrations, scientific data, and data about media, such as articles, books and albums. In addition, general-purpose datasets, such as DBpedia and Freebase from the linked open data community, serve as a focal point for many data sets. Thus, it is possible to query or integrate data from multiple sources and create new, integrated data sets with added value. Yet integration is far from simple: it happens at technical level by ingesting data in various formats, at structural level by providing a common ontology and mapping the data sources structures to it, and at semantic level by linking multiple records about same real world entities and fusing these representations into a clean and consistent record. The talk highlights the extreme heterogeneity of web data and points to methods to overcome them including a multitude of tasks that must be completed: source selection to identify appropriate and high quality sources, data extraction to create structured data, scrubbing to standarize and clean data, entity matching to associate different occurrences of the same entity, and finally data transformatiion and data fusion to combine all data about an entity in a single, consistent representation.
EDBT/ICDT 2010 To be held in conjuntion with the EDBT/ICDT 2011 Joint Conference
Fax: +34 965909326 | beweb2011 [at] easychair.org | Last updated: 09:43 14/02/2011 | Accesses since 2010-11-11: Comptador
[DLSI - UA]     University of Trento     HP