Visible to the public On Sequential Selection of Attributes to Be Discretized for Authorship Attribution

TitleOn Sequential Selection of Attributes to Be Discretized for Authorship Attribution
Publication TypeConference Paper
Year of Publication2017
AuthorsBaron, G.
Conference Name2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA)
ISBN Number978-1-5090-5795-5
Keywordsauthorship attribution, Bayes methods, Computer science, data mining, data mining techniques, discretization, Electronic mail, Entropy, forward sequential selection, Human Behavior, Indexes, merging, Metrics, naive Bayes, naïve Bayes classifier, pattern classification, pubcrawl, sequential selection, stylometry, Training
Abstract

Different data mining techniques are employed in stylometry domain for performing authorship attribution tasks. Sometimes to improve the decision system the discretization of input data can be applied. In many cases such approach allows to obtain better classification results. On the other hand, there were situations in which discretization decreased overall performance of the system. Therefore, the question arose what would be the result if only some selected attributes were discretized. The paper presents the results of the research performed for forward sequential selection of attributes to be discretized. The influence of such approach on the performance of the decision system, based on Naive Bayes classifier in authorship attribution domain, is presented. Some basic discretization methods and different approaches to discretization of the test datasets are taken into consideration.

URLhttps://ieeexplore.ieee.org/document/8001162
DOI10.1109/INISTA.2017.8001162
Citation Keybaron_sequential_2017