Choosing a profile length in the SCAP method of source code authorship attribution
Title | Choosing a profile length in the SCAP method of source code authorship attribution |
Publication Type | Conference Paper |
Year of Publication | 2014 |
Authors | Tennyson, M.F., Mitropoulos, F.J. |
Conference Name | SOUTHEASTCON 2014, IEEE |
Date Published | March |
Keywords | authorship attribution, C++ language, data set, frequency control, Frequency measurement, information retrieval, Java, Java language, plagiarism detection, profile length, RNA, SCAP method, software forensics, source code (software), source code authorship attribution |
Abstract | Source code authorship attribution is the task of determining the author of source code whose author is not explicitly known. One specific method of source code authorship attribution that has been shown to be extremely effective is the SCAP method. This method, however, relies on a parameter L that has heretofore been quite nebulous. In the SCAP method, each candidate author's known work is represented as a profile of that author, where the parameter L defines the profile's maximum length. In this study, alternative approaches for selecting a value for L were investigated. Several alternative approaches were found to perform better than the baseline approach used in the SCAP method. The approach that performed the best was empirically shown to improve the performance from 91.0% to 97.2% measured as a percentage of documents correctly attributed using a data set consisting of 7,231 programs written in Java and C++. |
DOI | 10.1109/SECON.2014.6950705 |
Citation Key | 6950705 |