1st Workshop on Evaluation of Human Language Technologies for modern iberian languages (IBEREVAL) 
SEPLN 2010 satellite workshop


Title: Benchmarking and Evaluation Campaigns: the good, the bad, and the metrics

Presenter: Julio Gonzalo, UNED.

Competitive evaluation exercises have become mainstream in the Natural Language Processing and Information Retrieval research communities. Based on experiences at WePS (the Web People Search evaluation initiative) and CLEF (the Cross-Language Evaluation Forum), we will discuss the benefits and risks of focusing research around evaluation campaigns, and highlight some unresolved dilemmas. The presentation will make a special emphasis on metric design as a key feature which is often overlooked in mainstream experimental designs, and we will conclude with the presentation of a new metric, the Unanimous Improvement Ratio (UIR), which contributes to the analysis of experimental results when more than one metric is involved in the evaluation - as it is the case of Precision and Recall in many evaluation settings).

About the author:
Julio Gonzalo is a member of the research group, where he conducts research on the application of Language Engineering to Multilingual Information Access problems, and in particular in the development of evaluation metrics and methodologies. He has been involved in the organization of CLEF (the international evaluation campaign for Multilingual Information Access applications) since 2001, and he is co-organizer of WePS (Web People Search evaluation campaign). More information at

[EspaÃol] Espaņol
[English] English

ibereval10 [at]   |   Last updated: 12:52 05/07/2010 | Accesses since 20/04/2010: Comptador
XHTML 1.0 & CSS2 compliant.