Biblio
Source code authorship attribution is the task of determining the author of source code whose author is not explicitly known. One specific method of source code authorship attribution that has been shown to be extremely effective is the SCAP method. This method, however, relies on a parameter L that has heretofore been quite nebulous. In the SCAP method, each candidate author's known work is represented as a profile of that author, where the parameter L defines the profile's maximum length. In this study, alternative approaches for selecting a value for L were investigated. Several alternative approaches were found to perform better than the baseline approach used in the SCAP method. The approach that performed the best was empirically shown to improve the performance from 91.0% to 97.2% measured as a percentage of documents correctly attributed using a data set consisting of 7,231 programs written in Java and C++.