A Statistical, Grammar-Based Approach to Micro-Planning

While there has been much work in recent years on data-driven natural language generation, little attention has been paid to the fine grained interactions that arise during micro-planning between aggregation, surface realization and sentence segmentation. In this talk, I will  argue for a hybrid symbolic/statistical approach to jointly model the interactions arising in Natural Language Generation between syntactic, aggregation and sentence segmentation choices. The approach integrates a small hand-written grammar, a statistical hypertagger and a surface realization algorithm. It is applied to the verbalization of knowledge base queries and tested on 13 knowledge bases to demonstrate domain independence. We evaluate our approach in several ways. A quantitative analysis shows that the hybrid approach outperforms a purely symbolic approach in terms of both speed and coverage. Results from a human study indicate that users find the output of this hybrid
statistic/symbolic system more fluent than both a template- and a purely symbolic grammar-based approach. Finally, we illustrate by means of examples that our approach can account for various factors impacting aggregation, sentence segmentation and surface realization.

Claire Gardent
(Joint work with Laura Perez-Beltrachini)

Thursday, 11:30h, September 17th 2015