On Sequential Selection of Attributes to Be Discretized for Authorship Attribution
Title | On Sequential Selection of Attributes to Be Discretized for Authorship Attribution |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Baron, G. |
Conference Name | 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA) |
ISBN Number | 978-1-5090-5795-5 |
Keywords | authorship attribution, Bayes methods, Computer science, data mining, data mining techniques, discretization, Electronic mail, Entropy, forward sequential selection, Human Behavior, Indexes, merging, Metrics, naive Bayes, naïve Bayes classifier, pattern classification, pubcrawl, sequential selection, stylometry, Training |
Abstract | Different data mining techniques are employed in stylometry domain for performing authorship attribution tasks. Sometimes to improve the decision system the discretization of input data can be applied. In many cases such approach allows to obtain better classification results. On the other hand, there were situations in which discretization decreased overall performance of the system. Therefore, the question arose what would be the result if only some selected attributes were discretized. The paper presents the results of the research performed for forward sequential selection of attributes to be discretized. The influence of such approach on the performance of the decision system, based on Naive Bayes classifier in authorship attribution domain, is presented. Some basic discretization methods and different approaches to discretization of the test datasets are taken into consideration. |
URL | https://ieeexplore.ieee.org/document/8001162 |
DOI | 10.1109/INISTA.2017.8001162 |
Citation Key | baron_sequential_2017 |
- Indexes
- Training
- stylometry
- sequential selection
- pubcrawl
- pattern classification
- naïve Bayes classifier
- Naive Bayes
- Metrics
- merging
- authorship attribution
- Human behavior
- forward sequential selection
- Entropy
- Electronic mail
- discretization
- data mining techniques
- Data mining
- computer science
- Bayes methods