Evaluation of Deep Learning-based Authorship Attribution Methods on Hungarian Texts

Submitted by grigby1 on Fri, 02/03/2023 - 5:07pm

Title	Evaluation of Deep Learning-based Authorship Attribution Methods on Hungarian Texts
Publication Type	Conference Paper
Year of Publication	2022
Authors	Oldal, Laura Gulyás, Kertész, Gábor
Conference Name	2022 IEEE 10th Jubilee International Conference on Computational Cybernetics and Cyber-Medical Systems (ICCC)
Keywords	authorship analysis, authorship attribution, cybernetics, Deep Learning, feature extraction, Human Behavior, Metrics, natural language processing, pubcrawl, statistical analysis, stylometry, text analysis
Abstract	The range of text analysis methods in the field of natural language processing (NLP) has become more and more extensive thanks to the increasing computational resources of the 21st century. As a result, many deep learning-based solutions have been proposed for the purpose of authorship attribution, as they offer more flexibility and automated feature extraction compared to traditional statistical methods. A number of solutions have appeared for the attribution of English texts, however, the number of methods designed for Hungarian language is extremely small. Hungarian is a morphologically rich language, sentence formation is flexible and the alphabet is different from other languages. Furthermore, a language specific POS tagger, pretrained word embeddings, dependency parser, etc. are required. As a result, methods designed for other languages cannot be directly applied on Hungarian texts. In this paper, we review deep learning-based authorship attribution methods for English texts and offer techniques for the adaptation of these solutions to Hungarian language. As a part of the paper, we collected a new dataset consisting of Hungarian literary works of 15 authors. In addition, we extensively evaluate the implemented methods on the new dataset.
DOI	10.1109/ICCC202255925.2022.9922818
Citation Key	oldal_evaluation_2022

Groups:

Science of Security VO