Source Code Authorship Approaches Natural Language Processing
Title | Source Code Authorship Approaches Natural Language Processing |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Petrík, Juraj, Chudá, Daniela |
Conference Name | Proceedings of the 19th International Conference on Computer Systems and Technologies |
Date Published | September 2018 |
Publisher | ACM |
ISBN Number | 978-1-4503-6425-6 |
Keywords | authorship attribution, Deep Learning, Human Behavior, machine learning, Metrics, natural language processing, NLP, pubcrawl, source code, stylometry |
Abstract | This paper proposed method for source code authorship attribution using modern natural language processing methods. Our method based on text embedding with convolutional recurrent neural network reaches 94.5% accuracy within 500 authors in one dataset, which outperformed many state of the art models for authorship attribution. Our approach is dealing with source code as with natural language texts, so it is potentially programming language independent with more potential of future improving. |
URL | https://dl.acm.org/doi/10.1145/3274005.3274031 |
DOI | 10.1145/3274005.3274031 |
Citation Key | petrik_source_2018 |