Title | Computational Identification of Author Style on Electronic Libraries - Case of Lexical Features |
Publication Type | Conference Paper |
Year of Publication | 2022 |
Authors | Ouamour, S., Sayoud, H. |
Conference Name | 2022 5th International Symposium on Informatics and its Applications (ISIA) |
Keywords | Author Style Analysis, digital libraries, feature extraction, Human Behavior, Informatics, Libraries, Metrics, natural language processing, Noise measurement, pubcrawl, stylometry, Task Analysis, text analysis, text mining |
Abstract | In the present work, we intend to present a thorough study developed on a digital library, called HAT corpus, for a purpose of authorship attribution. Thus, a dataset of 300 documents that are written by 100 different authors, was extracted from the web digital library and processed for a task of author style analysis. All the documents are related to the travel topic and written in Arabic. Basically, three important rules in stylometry should be respected: the minimum document size, the same topic for all documents and the same genre too. In this work, we made a particular effort to respect those conditions seriously during the corpus preparation. That is, three lexical features: Fixed-length words, Rare words and Suffixes are used and evaluated by using a centroid based Manhattan distance. The used identification approach shows interesting results with an accuracy of about 0.94. |
DOI | 10.1109/ISIA55826.2022.9993513 |
Citation Key | ouamour_computational_2022 |