CIDEGENT: The limits and future of data-driven approaches:

A comparative study of deep learning, knowledge-based and rule-based models and methods in Natural Language Processing

Data driven models and, most prominently, Deep Learning (DL), have taken the world by storm. DL is used almost everywhere, in almost every discipline and Natural Language Processing (NLP) is not an exception. DL has been very promising so far, delivering improvements for almost every NLP task and application. However, as seen on numerous occasions, the outputs of DL models are not always ideal, with the failure of Neural Machine Translation to successfully translate multiword expressions being an obvious example. In addition, there have been earlier studies which report that machine learning approaches to anaphora resolution do not fare necessarily better than the ‘old-fashioned’ rule-based ones.

While it is widely accepted that DL techniques are superior to rule-based and knowledge-based ones, this GenT 4-year project funded by the Valencian Regional Government will seek to establish formally through proper evaluations spanning different datasets whether DL always performs better and if not, in what circumstances. In addition, the project will seek to find answers as to how to boost the performance of the successful DL-based applications even further. The study will examine different NLP tasks/applications and to the best of our knowledge will be the first study to establish the extent to which DL guarantees improvement over rule-based methods and whether combining DL methods with various techniques, models and resources, would offer further improvements.

Large Language Models (LLMs) will be added to this study. When the original project proposal was written, LLMs practically did not exist. Now we shall also compare the performance of applications powered by LLMs such as ChatGPT, Gemini, Claude, Mistral, Falcon.

NLP tasks and NLP applications to be experimented with will include (but will not be limited to) anaphora resolution, translation memories, sentiment analysis, hate speech and text categorisation.

The project will contrast domain-specific Deep Learning models with general purpose LLMs.

Prompt Engineering will be also experimented with. The premise is that suitable prompting techniques will be beneficial to the performance of LLMs.

This 4-year project is expected to show the best way forward for future NLP research and what would be the ‘route’ to increased performance.