Content Selection through Paraphrase Detection: Capturing differentSemantic Realisations of the Same Idea

Summarisation can be seen as an instance of Natural Language Generation (NLG), where “what to say” corresponds to the identification of relevant information, and “how to say it” would be associated to the final creation of the summary. When dealing with data coming from the Semantic Web (e.g., RDF triples), the challenge of how a good summary can be produced arises. For instance, having the RDF properties from an infobox of a Wikipedia page, how could a summary expressed in natural language text be generated? and how could this summary sound as natural as possible (i.e., be an abstractive summary) far from only being a bunch of selected sentences output together(i.e., extractive summary)? This would imply to be able to successfully map the RDF information to a semantic representation of natural language sentences (e.g., predicate-argument (pred-arg) structures). Towards the long-term objective of generating abstractive summaries from Semantic Web data, the specific goal of this paper is to propose and validate an approach to map linguistic structures that can encode the same meaning but with different words (e.g., sentence-to-sentence, pred-arg-to-pred-arg, RDF-to-TEXT) using continuous semantic representation of text. The idea is to decide the level of document representation to work with; convert the text into that representation; and perform a pairwise comparison to decide to what extent two pairs can be mapped or not. For achieving this, different methods were analysed, including traditional Wordnet-based ones, as well as more recent ones based on word embeddings. Our approach was tested and validated in the context of document-abstract sentence mapping to check whether it was appropriate for identifying important information. The results obtained good performance, thus indicating that we can rely on the approach and apply it to further contexts (e.g., mapping RDFs into natural language).

Autores: 
Lloret, Elena
Garden, Claire
Tipo de publicación: 
Acta de congreso
Nombre de la revista: 
-
Nombre del libro: 
2nd International Workshop on Natural Language Generation and the Semantic Web
Subtítulo: 
WebNLG
Revisión por pares: 
Internacional: 
Publicable: 
Año de publicación: 
2 016