Visible to the public Computational Identification of Author Style on Electronic Libraries - Case of Lexical Features

TitleComputational Identification of Author Style on Electronic Libraries - Case of Lexical Features
Publication TypeConference Paper
Year of Publication2022
AuthorsOuamour, S., Sayoud, H.
Conference Name2022 5th International Symposium on Informatics and its Applications (ISIA)
KeywordsAuthor Style Analysis, digital libraries, feature extraction, Human Behavior, Informatics, Libraries, Metrics, natural language processing, Noise measurement, pubcrawl, stylometry, Task Analysis, text analysis, text mining
AbstractIn the present work, we intend to present a thorough study developed on a digital library, called HAT corpus, for a purpose of authorship attribution. Thus, a dataset of 300 documents that are written by 100 different authors, was extracted from the web digital library and processed for a task of author style analysis. All the documents are related to the travel topic and written in Arabic. Basically, three important rules in stylometry should be respected: the minimum document size, the same topic for all documents and the same genre too. In this work, we made a particular effort to respect those conditions seriously during the corpus preparation. That is, three lexical features: Fixed-length words, Rare words and Suffixes are used and evaluated by using a centroid based Manhattan distance. The used identification approach shows interesting results with an accuracy of about 0.94.
DOI10.1109/ISIA55826.2022.9993513
Citation Keyouamour_computational_2022