Visible to the public Evaluation of Deep Learning-based Authorship Attribution Methods on Hungarian Texts

TitleEvaluation of Deep Learning-based Authorship Attribution Methods on Hungarian Texts
Publication TypeConference Paper
Year of Publication2022
AuthorsOldal, Laura Gulyás, Kertész, Gábor
Conference Name2022 IEEE 10th Jubilee International Conference on Computational Cybernetics and Cyber-Medical Systems (ICCC)
Keywordsauthorship analysis, authorship attribution, cybernetics, Deep Learning, feature extraction, Human Behavior, Metrics, natural language processing, pubcrawl, statistical analysis, stylometry, text analysis
AbstractThe range of text analysis methods in the field of natural language processing (NLP) has become more and more extensive thanks to the increasing computational resources of the 21st century. As a result, many deep learning-based solutions have been proposed for the purpose of authorship attribution, as they offer more flexibility and automated feature extraction compared to traditional statistical methods. A number of solutions have appeared for the attribution of English texts, however, the number of methods designed for Hungarian language is extremely small. Hungarian is a morphologically rich language, sentence formation is flexible and the alphabet is different from other languages. Furthermore, a language specific POS tagger, pretrained word embeddings, dependency parser, etc. are required. As a result, methods designed for other languages cannot be directly applied on Hungarian texts. In this paper, we review deep learning-based authorship attribution methods for English texts and offer techniques for the adaptation of these solutions to Hungarian language. As a part of the paper, we collected a new dataset consisting of Hungarian literary works of 15 authors. In addition, we extensively evaluate the implemented methods on the new dataset.
DOI10.1109/ICCC202255925.2022.9922818
Citation Keyoldal_evaluation_2022