Visible to the public Authorship Attribution of The Golden Lotus Based on Text Classification Methods

TitleAuthorship Attribution of The Golden Lotus Based on Text Classification Methods
Publication TypeConference Paper
Year of Publication2019
AuthorsTang, Xuemei, Liang, Shichen, Liu, Zhiying
Conference NameProceedings of the 2019 3rd International Conference on Innovation in Artificial Intelligence
Date PublishedMarch 2019
PublisherAssociation for Computing Machinery
Conference LocationSuzhou, China
ISBN Number978-1-4503-6128-6
Keywordsauthorship attribution, Human Behavior, human factors, machine learning, Metrics, pubcrawl, stylometry, text classification, The Golden Lotus

In this paper, we explore the authorship attribution of The Golden Lotus using the traditional machine learning method of text classification. There are four candidate authors: Shizhen Wang, Wei Xu, Kaixian Li and Zhideng Wang. We choose The Golden Lotus's poems and four candidate authors' poems as data set. According to the characteristics of Chinese ancient poem, we choose Chinese character, rhyme, genre and overlapped word as features. We use six supervised machine learning algorithms, including Logistic Regression, Random Forests, Decision Tree and Naive Bayes, SVM and KNN classifiers respectively for text binary classification and multi-classification. According to two experiments results, the style of writing of Wei Xu's poems is the most similar to that of The Golden Lotus. It is proved that among four authors, Wei Xu most likely be the author of The Golden Lotus.

Citation Keytang_authorship_2019