A modified language modeling method for authorship attribution
Title | A modified language modeling method for authorship attribution |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Vazirian, Samane, Zahedi, Morteza |
Publisher | IEEE |
ISBN Number | 978-1-5090-4335-4 |
Keywords | attribution, composability, Human Behavior, Metrics, pubcrawl |
Abstract | This paper presents an approach to a closed-class authorship attribution (AA) problem. It is based on language modeling for classification and called modified language modeling. Modified language modeling aims to offer a solution for AA problem by Combinations of both bigram words weighting and Unigram words weighting. It makes the relation between unseen text and training documents clearer with giving extra reward of training documents; training document including bigram word as well as unigram words. Moreover, IDF value multiplied by related word probability has been used, instead of removing stop words which are provided by Stop words list. we evaluate Experimental results by four approaches; unigram, bigram, trigram and modified language modeling by using two Persian poem corpora as WMPR-AA2016-A Dataset and WMPR-AA2016-B Dataset. Results show that modified language modeling attributes authors better than other approaches. The result on WMPR-AA2016-B, which is bigger dataset, is much better than another dataset for all approaches. This may indicate that if adequate data is provided to train language modeling the modified language modeling can be a good solution to AA problem. |
URL | http://ieeexplore.ieee.org/document/7777783/ |
DOI | 10.1109/IKT.2016.7777783 |
Citation Key | vazirian_modified_2016 |