Data driven models and, most prominently, Deep Learning (DL), have taken the world by storm. DL is used almost everywhere, in almost every discipline and Natural Language Processing (NLP) is not an exception. DL has been very promising so far, delivering improvements for almost every NLP task and application. However, as seen on numerous occasions, the outputs of DL models are not always ideal, with the failure of Neural Machine Translation to successfully translate multiword expressions being an obvious example. In addition, there have been earlier studies which report that machine learning approaches to anaphora resolution do not fare necessarily better than the ‘old-fashioned’ rule-based ones.