Resources

For many years, the Natural Language Processing and Information Systems Group has been doing research on information technology and language processing. This arduous research work has resulted in linguistic resources such as tagged corpus, collections of documents and ontologies that we put at your disposal for investigation.

EmotiBlog

El esquema de anotación y el corpus de Emotiblog son un recurso multilingue que fue creado para detectar la subjetividad en los nuevos géneros textuales de la Web 2.0 con la intención de contribuir en la mejora de las tareas de Análisis de Sentimientos. Este corpus ha sido etiquetado con una granularidad fina sobre análisis de sentimientos en 3 dominios distintos. Advertencia: Este recurso es gratuito para investigación y debe ser referenciado convenientemente al siguiente artículo científico:

Digital Media Asset

Today's generation of Internet devices has changed how users are interacting with media, from passive and unidirectional users to proactive and interactive. Users can use these devices to comment or rate a TV show and search for related information regarding characters, facts or personalities. This phenomenon is known as second screen. This resource provides an ontology for representing Media Assets as part of the SAM project, an EU-funded research project that focuses on developing an advanced digital media delivery platform based on second screen interaction and content syndication within a social media context, providing open and standardised ways of characterising, discovering and syndicating digital assets.

Concit-Corpus: Context Citation Analysis to learn Function, Polarity and Influence

Citation analysis that uses counting methods causes deformations in impact factor assessment. To enrich impact factor calculation is necessary to understand the kind of influence that the contributions of an author have over another ́s work. For this purpose, it is required to perform citation content analysis to obtain its function, polarity and influence in a context within an article that mentioned it. In this corpus, we focus in the definition of an annotation scheme aimed at creating a public access corpus that be the basis of collaborative work in this field, in order to develop citation content analysis to obtain criteria for impact evaluation.

DrugSemantics Gold Standard

DrugSemantics gold standard consists of 5 Summaries of Product Characteristics (SPC) written in Spanish. SPCs were retrieved from Medicines Online Information Center - CIMA - that belongs to the Spanish Agency for Medicines and Health Products - AEMPS.

ONTOLegoLangUAge

ONTOLegolangUAge is an ontology that motivates the importance of associating linguistic information with standard ontologies and expressive models, beyond the label systems implemented in RDF and OWL. It is crucial to capture correctly the relation between natural language constructs and ontological structures.

Semantic Package

This ontology aims to capture the semantics of documents through a set of key aspects in texts, such as the temporal dimension, presence of named entities, detection of opinionated information, or conceptual classifications. In addition, the ontology provides a lexical dimension, where the sentence of each document, and a possible summary derived from it, are taken into account.