Visible to the public A modified language modeling method for authorship attribution

TitleA modified language modeling method for authorship attribution
Publication TypeConference Paper
Year of Publication2016
AuthorsVazirian, Samane, Zahedi, Morteza
PublisherIEEE
ISBN Number978-1-5090-4335-4
Keywordsattribution, composability, Human Behavior, Metrics, pubcrawl
Abstract

This paper presents an approach to a closed-class authorship attribution (AA) problem. It is based on language modeling for classification and called modified language modeling. Modified language modeling aims to offer a solution for AA problem by Combinations of both bigram words weighting and Unigram words weighting. It makes the relation between unseen text and training documents clearer with giving extra reward of training documents; training document including bigram word as well as unigram words. Moreover, IDF value multiplied by related word probability has been used, instead of removing stop words which are provided by Stop words list. we evaluate Experimental results by four approaches; unigram, bigram, trigram and modified language modeling by using two Persian poem corpora as WMPR-AA2016-A Dataset and WMPR-AA2016-B Dataset. Results show that modified language modeling attributes authors better than other approaches. The result on WMPR-AA2016-B, which is bigger dataset, is much better than another dataset for all approaches. This may indicate that if adequate data is provided to train language modeling the modified language modeling can be a good solution to AA problem.

URLhttp://ieeexplore.ieee.org/document/7777783/
DOI10.1109/IKT.2016.7777783
Citation Keyvazirian_modified_2016