Deep Metric Learning for Code Authorship Attribution and Verification

Submitted by grigby1 on Fri, 09/09/2022 - 2:28pm

Title	Deep Metric Learning for Code Authorship Attribution and Verification
Publication Type	Conference Paper
Year of Publication	2021
Authors	White, Riley, Sprague, Nathan
Conference Name	2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)
Date Published	dec
Keywords	authorship identification, authorship verification, codes, copyright protection, Deep Learning, Estimation, face recognition, Human Behavior, machine learning, malware recognition, Measurement, metric learning, Metrics, Plagiarism, pubcrawl, stylometry
Abstract	Code authorship identification can assist in identifying creators of malware, identifying plagiarism, and giving insights in copyright infringement cases. Taking inspiration from facial recognition work, we apply recent advances in metric learning to the problem of authorship identification and verification. The metric learning approach makes it possible to measure similarity in the learned embedding space. Access to a discriminative similarity measure allows for the estimation of probability distributions that facilitate open-set classification and verification. We extend our analysis to verification based on sets of files, a previously unexplored problem domain in large-scale author identification. On closed-set tasks we achieve competitive accuracies, but do not improve on the state of the art.
DOI	10.1109/ICMLA52953.2021.00178
Citation Key	white_deep_2021

Groups: