Visible to the public Representation vs. Model: What Matters Most for Source Code Vulnerability Detection

TitleRepresentation vs. Model: What Matters Most for Source Code Vulnerability Detection
Publication TypeConference Paper
Year of Publication2021
AuthorsZheng, Wei, Abdallah Semasaba, Abubakar Omari, Wu, Xiaoxue, Agyemang, Samuel Akwasi, Liu, Tao, Ge, Yuan
Conference Name2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
KeywordsAnalytical models, compositionality, Conferences, Deep Learning, Human Behavior, Measurement, Metrics, Neural networks, pubcrawl, Resiliency, security, software vulnerability detection, Syntactics, transfer learning, vulnerability detection
AbstractVulnerabilities in the source code of software are critical issues in the realm of software engineering. Coping with vulnerabilities in software source code is becoming more challenging due to several aspects of complexity and volume. Deep learning has gained popularity throughout the years as a means of addressing such issues. In this paper, we propose an evaluation of vulnerability detection performance on source code representations and evaluate how Machine Learning (ML) strategies can improve them. The structure of our experiment consists of 3 Deep Neural Networks (DNNs) in conjunction with five different source code representations; Abstract Syntax Trees (ASTs), Code Gadgets (CGs), Semantics-based Vulnerability Candidates (SeVCs), Lexed Code Representations (LCRs), and Composite Code Representations (CCRs). Experimental results show that employing different ML strategies in conjunction with the base model structure influences the performance results to a varying degree. However, ML-based techniques suffer from poor performance on class imbalance handling when used in conjunction with source code representations for software vulnerability detection.
DOI10.1109/SANER50967.2021.00082
Citation Keyzheng_representation_2021