A Neural Embedding for Source Code: Security Analysis and CWE Lists

Submitted by grigby1 on Thu, 06/24/2021 - 10:51am

Title	A Neural Embedding for Source Code: Security Analysis and CWE Lists
Publication Type	Conference Paper
Year of Publication	2020
Authors	Saletta, Martina, Ferretti, Claudio
Conference Name	2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)
Date Published	Aug. 2020
Publisher	IEEE
ISBN Number	978-1-7281-6609-4
Keywords	Autonomic Security, composability, human factors, Java, Metrics, natural language processing, Pervasive Computing Security, pubcrawl, resilience, Resiliency, Scalability, security, Software, source code embedding, static analysis, supervised learning, Syntactics, Training, vulnerability classification
Abstract	In this paper, we design a technique for mapping the source code into a vector space and we show its application in the recognition of security weaknesses. By applying ideas commonly used in Natural Language Processing, we train a model for producing an embedding of programs starting from their Abstract Syntax Trees. We then show how such embedding is able to infer clusters roughly separating different classes of software weaknesses. Even if the training of the embedding is unsupervised and made on a generic Java dataset, we show that the model can be used for supervised learning of specific classes of vulnerabilities, helping to capture some features distinguishing them in code. Finally, we discuss how our model performs over the different types of vulnerabilities categorized by the CWE initiative.
URL	https://ieeexplore.ieee.org/document/9251115
DOI	10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00095
Citation Key	saletta_neural_2020

Groups:

Science of Security VO