The Natural Language Processing and Information Systems Group is a research group at the University of Alicante focused on Natural Language Processing. Our goal is to bring computers closer to human language to facilitate understading between humans and machines with the purpose of replacing mechanical taks with other highly productive ones.
Our research focuses on human language technologies (HLT) and deals with topics such as the resolution of lexical, morphological and structural ambiguities, automatic summarization, applications, information retrieval systems and answer search. These investigations are financed at a European, national and regional level, and thanks to this, innovative applications and tools have been developed, in addition to also giving us the possibility of attending the most relevant conferences in our research area, resulting in a large number of publications in the best forums, magazines and media.
On this web page we show all the projects, products and publications in which our group has worked on and developed.
About us
Since 1993, from a small number of professors from the Department of Languages and Information Systems (DLSI), the group has been growing, thanks to the effort and motivation of all its members. Currently, the group has more than thirty members.
Some of our skills
Creation of new surveillance or decision-making processes, generation of new documents for specific purposes, analysis of the sentiment and opinion of texts, production of automatic summaries, automatically simplify texts to make them more accessible to a large audience.
Some statistics about our research
Projects
Publications
Products and resources
Lines of code
Products
Oportunity
GPLSI Oportunity is a web application that automatically tracks and classifies information from newspapers, websites and official bulletins so that only get the information that interests you, discarding the rest. From each tender and award of a newsletter, the system extracts the most relevant information automatically.
Social Analytics
GPLSI Social Analytics is an application that retrieves messages from users of Twitter and Instagram on a specific topic and, automatically, evaluates the opinions expressed in the messages . This allows you to monitor people's opinions on different topics, such as a tourist destination or an election. In this way, and thanks to this tool, it is possible to analyze and predict opinions and trends.
Social Observer
GPLSI Social Observer is an application that retrieves tweets from Twitter on a specific topic and, automatically, evaluates the opinions expressed in the messages. This allows you to monitor people's opinions about a celebrity, politician, soccer team, product, stock market investment, or even an election. In this way, and thanks to this tool, it is possible to analyze and predict opinions and trends.
Social Rankings
GPLSI Social Rankings is a web application that allows you to monitor in real time the evaluation of different entities, brands, products and people on Twitter. It uses sentiment analysis and opinion mining techniques to automatically classify tweets as positive or negative, and then uses that information to provide a numerical rating for each entity. This generates a ranking of entities to see in a very intuitive way which is the best and worst valued, as well as the evolution of the valuations over time. Thanks to this tool it is possible to carry out different analyzes to predict opinions and trends.
Projects
We are working with a large number of projects funded at European, national and regional level and even privately, which generate resources, software and products that are successfully used in commercial products or as a base for other projects.
Publications
Publications in the best and most well-known research forums and media thanks to our constant and active participation in the most relevant conferences in our research area.
Recursurces
Tagged corpora, document collections and ontologies at your disposal for research.
Context Citation Analysis to learn Function, Polarity and Influence
Citation analysis that uses counting methods causes deformations in impact factor assessment. To enrich impact factor calculation is necessary to understand the kind of influence that the contributions of an author have over another ́s work. For this purpose, it is required to perform citation content analysis to obtain its function, polarity and influence in a context within an article that mentioned it. In this corpus, we focus in the definition of an annotation scheme aimed at creating a public access corpus that be the basis of collaborative work in this field, in order to develop citation content analysis to obtain criteria for impact evaluation.
Today’s generation of Internet devices has changed how users are interacting with media, from passive and unidirectional users to proactive and interactive. Users can use these devices to comment or rate a TV show and search for related information regarding characters, facts or personalities. This phenomenon is known as second screen. This resource provides an ontology for representing Media Assets as part of the SAM project, an EU-funded research project that focuses on developing an advanced digital media delivery platform based on second screen interaction and content syndication within a social media context, providing open and standardised ways of characterising, discovering and syndicating digital assets.
DrugSemantics gold standard consists of 5 Summaries of Product Characteristics (SPC) written in Spanish. SPCs were retrieved from Medicines Online Information Center – CIMA – that belongs to the Spanish Agency for Medicines and Health Products – AEMPS.
This corpus is annotated with 10 Named Entities (NE) related to pharmacotherapeutic care, namely: Chemical Composition, Disease, Drug, Excipient, Food, Medicament, Pharmaceutical Form, Route, Therapeutic Action and Unit of Measurement. It contains 2241 ENs, 780 sentences and 226,729 tokens.
DrugSemantics was designed to be used for developing and testing of Spanish NE recogniton tools in the pharmacotherapeutic domain.
The Emotiblog annotation scheme and corpus are a multilingual resource that was created to detect subjectivity in the new textual genres of Web 2.0 with the intention of contributing to the improvement of Sentiment Analysis tasks. This corpus has been tagged with a fine granularity on sentiment analysis in 3 different domains. Warning: This resource is free for research and should be conveniently referenced to the following scientific article
ONTOLegolangUAge is an ontology that motivates the importance of associating linguistic information with standard ontologies and expressive models, beyond the label systems implemented in RDF and OWL. It is crucial to capture correctly the relation between natural language constructs and ontological structures.
ONTOLegolangUAge details the whole development life cycle of Language Generation and Deconstruction, based on a model that proposes, in a first Human Language Processing (HLP) phase, splitting texts into basic linguistic units (called L-Bricks) in order to combine them and infere knowledge (in a later Human Language Generation (HLG) phase).
This ontology aims to capture the semantics of documents through a set of key aspects in texts, such as the temporal dimension, presence of named entities, detection of opinionated information, or conceptual classifications. In addition, the ontology provides a lexical dimension, where the sentence of each document, and a possible summary derived from it, are taken into account. These are determining factors for setting up our own interpretation of possible scenarios (a meta-level specification) and vocabulary. Since our ontology aims to be reused by a large community, we tried to establish basic NLP terminology that was hierarchized by experts in this research field.
Contact us
- Contact page
- E-mail: gplsi.contact@dlsi.ua.es
- Phone number: +34 965 90 70 76