Tweet Translation Workshop at SEPLN 2015

Official web of the Workshop

TweetMT is a workshop and shared task on machine translation applied to tweets. It will take place in September, 2015, in Alicante, co-located with SEPLN 2015. The objective of the task is to bring together interested researchers to join forces to experiment with and compare different approaches to tweet MT. This workshop is a follow-up to two other workshops organized previously also at SEPLN: TweetNorm2013 and TweetLID2014.

The machine translation of tweets is a complex task that greatly depends on the type of data we work with. The translation process of tweets is very different from that of correct texts posted for instance through a content manager. Tweets are often written from mobile devices, which exacerbates the poor quality of the spelling, and include errors, symbols and diacritics. The texts also vary in terms of structure, where the latter include tweet-specific features such as hashtags, user mentions, and retweets, among others. The translation of tweets can be tackled as a direct translation (tweet-to-tweet) or as an indirect translation (tweet normalization to standard text, text translation and, if needed, tweet generation). Although the first approach looks attractive, the lack of parallel or comparable tweets for the working languages tends to lead us towards an indirect approach. Some authors also try to gather similar tweets in other languages (CLIR). Work in this area is scarce in the literature but a growing interest is evident. An important point of reference is the work done to translate SMS texts during the Haiti earthquake.

The current task will focus on MT of tweets between languages of the Iberian Peninsula (Basque, Catalan, Galician, Portuguese and Spanish), as well as English. The organizing committee will release development data including parallel tweets that will enable participants to train their systems. For the final evaluation participants will have to submit the automatic translation of a number of tweet corpora in a short period of time. The evaluation will be carried out using automatic distances to the reference corpora.These corpora are not meant to be representative of all types of messages that can be observed in informal communication. This is instead an initial attempt at tackling part of the task which starts by addressing one of its simplest parts. We are planing on using more informal and varied corpora in future tasks as we make progress on these initial issues.

The workshop aims to be a forum where researchers will have a chance to compare their methods, systems and results.

Organizers

Iñaki Alegria

Universidad del País Vasco / Euskal Herriko Unibertsitatea

Barrio Sarriena, s/n
48940 Leioa, Vizcaya
+34 946 01 20 00

Collaborators

Nora Aranberri	Universidad del País Vasco / Euskal Herriko Unibertsitatea Barrio Sarriena, s/n 48940 Leioa, Vizcaya +34 946 01 20 00
Cristina España	Universitat Politècnica de Catalunya Calle Jordi Girona, 31 08034 Barcelona 934 01 62 00
Eva Martínez
Pablo Gamallo	Universidade de Santiago de Compostela Praza do Obradoiro, s/n 15782 Santiago de Compostela +34 881 811 000
Hugo Oliveira	Universidade de Coimbra Palácio dos Grilos Rua da Ilha 3000-214 Coimbra +351 239 859 900
Iñaki San Vicente	Elhuyar
Antonio Toral (DCU, Dublin)	Dublin City University Glasnevin, Dublin 9 Ireland +353 1 700 5000
Arkaitz Zubiaga	University of Warwick Coventry CV4 7AL United Kingdom +44 24 7652 3523

Español

Tweet Translation Workshop at SEPLN 2015

Organizers

Universidad del País Vasco / Euskal Herriko Unibertsitatea

Collaborators

Universidad del País Vasco / Euskal Herriko Unibertsitatea

Universitat Politècnica de Catalunya

Universidade de Santiago de Compostela

Universidade de Coimbra

Elhuyar

Dublin City University

University of Warwick

Sponsors