Reusable Text Analysis for Applications
Abstract: Diverse applications like document summarization, question answering, and information extraction, benefit from similar linguistic representations. In this talk, I will present a text analysis system that combines named-entity recognition and shallow semantic analysis processes in a modular way. Its architecture is based on redefining text analysis as a set of classification tasks, making the process thereby open to a machine learning approach (in our case memory-based learning). I will illustrate the use of the linguistic representations supplied by the text analysis system in a number of projects currently underway in our lab: (i) automatic subtitling from transcripts for the deaf and hard of hearing, (ii) information extraction from biomedical text, (iii) question answering from the WWW, and (iv) extraction of ontological knowledge from text.
Rada Mihalcea is an Assistant Professor of Computer Science at the University of North Texas. Her research interests are in Natural Language Processing, Machine Learning, and Information Retrieval. Specifically, she is currently working on the following problems: Word Sense Disambiguation, Building and Exploiting Parallel Texts, Multilingual NLP, Graph-based Ranking Algorithms for Natural Language Processing, Building Annotated Corpora with Volunteer Contributions over the Web, Semantic Parsing and Semantic-based Information Retrieval She has published more than fifty papers in conference proceeding and journals and is working on research projects related, among other, to word sense disambiguation, semantic parsing for open text or graph-based ranking algorithms for text processing. For more information, visit her Home Page.
Graph-Theory meets Computational Linguistics: Text Processing with
Graph-based Ranking Algorithms
Abstract: Since the early ages of artificial intelligence, associative networks have been proposed as representations that enable the storage of language units and the relationships that interconnect them, allowing for a variety of inference and reasoning processes, and simulating some of the functionalities of the human mind. The symbolic structures that emerge from these representations correspond naturally to graphs -- relational structures capable of encoding the meaning and structure of a cohesive text, following closely the associative memory representations. The activation or ranking of nodes in such graph structures mimics to some extent the functioning of human memory, and can be turned into a rich source of knowledge useful for several language processing applications.
In this talk, I will present a new framework for the application of graph-based ranking algorithms to structures derived from text, and show how the synergy between graph-theoretical algorithms and graph-based text representations can result in efficient unsupervised methods for several natural language processing tasks. I will illustrate this framework with several text processing applications, including word sense disambiguation, extractive summarization, and keyphrase extraction. I will also outline a number of other applications that can find successful solutions within this framework, and conclude with a discussion of opportunities and challenges for future research.