Visible to the public Source Code Authorship Approaches Natural Language Processing

TitleSource Code Authorship Approaches Natural Language Processing
Publication TypeConference Paper
Year of Publication2018
AuthorsPetrík, Juraj, Chudá, Daniela
Conference NameProceedings of the 19th International Conference on Computer Systems and Technologies
Date PublishedSeptember 2018
PublisherACM
ISBN Number978-1-4503-6425-6
Keywordsauthorship attribution, Deep Learning, Human Behavior, machine learning, Metrics, natural language processing, NLP, pubcrawl, source code, stylometry
Abstract

This paper proposed method for source code authorship attribution using modern natural language processing methods. Our method based on text embedding with convolutional recurrent neural network reaches 94.5% accuracy within 500 authors in one dataset, which outperformed many state of the art models for authorship attribution. Our approach is dealing with source code as with natural language texts, so it is potentially programming language independent with more potential of future improving.

URLhttps://dl.acm.org/doi/10.1145/3274005.3274031
DOI10.1145/3274005.3274031
Citation Keypetrik_source_2018