A modified language modeling method for authorship attribution

Submitted by grigby1 on Mon, 03/20/2017 - 12:17pm

Title	A modified language modeling method for authorship attribution
Publication Type	Conference Paper
Year of Publication	2016
Authors	Vazirian, Samane, Zahedi, Morteza
Publisher	IEEE
ISBN Number	978-1-5090-4335-4
Keywords	attribution, composability, Human Behavior, Metrics, pubcrawl
Abstract	This paper presents an approach to a closed-class authorship attribution (AA) problem. It is based on language modeling for classification and called modified language modeling. Modified language modeling aims to offer a solution for AA problem by Combinations of both bigram words weighting and Unigram words weighting. It makes the relation between unseen text and training documents clearer with giving extra reward of training documents; training document including bigram word as well as unigram words. Moreover, IDF value multiplied by related word probability has been used, instead of removing stop words which are provided by Stop words list. we evaluate Experimental results by four approaches; unigram, bigram, trigram and modified language modeling by using two Persian poem corpora as WMPR-AA2016-A Dataset and WMPR-AA2016-B Dataset. Results show that modified language modeling attributes authors better than other approaches. The result on WMPR-AA2016-B, which is bigger dataset, is much better than another dataset for all approaches. This may indicate that if adequate data is provided to train language modeling the modified language modeling can be a good solution to AA problem.
URL	http://ieeexplore.ieee.org/document/7777783/
DOI	10.1109/IKT.2016.7777783
Citation Key	vazirian_modified_2016

Groups:

Science of Security VO