Title: Benchmarking and Evaluation Campaigns: the good, the bad, and the metrics
Presenter: Julio Gonzalo, UNED.
Abstract:
Competitive evaluation exercises have become mainstream in the Natural
Language Processing and Information Retrieval
research communities. Based on experiences at WePS (the Web People
Search evaluation initiative) and CLEF (the Cross-Language
Evaluation Forum), we will discuss the benefits and risks of focusing
research around evaluation campaigns, and highlight some
unresolved dilemmas. The presentation will make a special emphasis on
metric design as a key feature which is often
overlooked in mainstream experimental designs, and we will conclude
with the presentation of a new metric, the Unanimous Improvement Ratio
(UIR), which contributes to the analysis of experimental results when
more than one metric is involved in the evaluation - as it is the case
of Precision and Recall in many evaluation settings).
About the author:
Julio Gonzalo is a member of the nlp.uned.es research group, where he
conducts research on the
application of Language Engineering to Multilingual Information Access
problems, and in particular in the development
of evaluation metrics and methodologies. He has been involved in the
organization of CLEF (the international evaluation
campaign for Multilingual Information Access applications) since 2001,
and he is co-organizer of WePS (Web People Search
evaluation campaign). More information at
http://nlp.uned.es/~julio




Espaņol
English