Authorship verification, combining linguistic features and different similarity functions

Authorship analysis is an important task for different text applications, for example in the field of digital forensic text analysis. Hence, we propose an authorship analysis method that compares the average similarity of a text of unknown authorship with all the texts of an author. Using this idea, a text that was not written by an author, would not exceed the average of similarity with known texts and a text of unknown authorship would be considered as written by the author, only if it exceeds the average of similarity obtained between texts written by him and if it got the major value comparing the average similarity with the rest of the authors. For each linguistic feature we obtain a vote by majority using different functions and for the final decision we divide the number of votes for each feature that consider as written by the author the unknown text by the total of features analyzed. The results obtained for each language in the PAN 2015

Castro, Daniel
Adame, Yaritza
Pelaez, Maria
Muñoz, Rafael
CLEF 2015 Conference and Labs Evaluation Forum. PAN labs
