Title | Representation vs. Model: What Matters Most for Source Code Vulnerability Detection |
Publication Type | Conference Paper |
Year of Publication | 2021 |
Authors | Zheng, Wei, Abdallah Semasaba, Abubakar Omari, Wu, Xiaoxue, Agyemang, Samuel Akwasi, Liu, Tao, Ge, Yuan |
Conference Name | 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) |
Keywords | Analytical models, compositionality, Conferences, Deep Learning, Human Behavior, Measurement, Metrics, Neural networks, pubcrawl, Resiliency, security, software vulnerability detection, Syntactics, transfer learning, vulnerability detection |
Abstract | Vulnerabilities in the source code of software are critical issues in the realm of software engineering. Coping with vulnerabilities in software source code is becoming more challenging due to several aspects of complexity and volume. Deep learning has gained popularity throughout the years as a means of addressing such issues. In this paper, we propose an evaluation of vulnerability detection performance on source code representations and evaluate how Machine Learning (ML) strategies can improve them. The structure of our experiment consists of 3 Deep Neural Networks (DNNs) in conjunction with five different source code representations; Abstract Syntax Trees (ASTs), Code Gadgets (CGs), Semantics-based Vulnerability Candidates (SeVCs), Lexed Code Representations (LCRs), and Composite Code Representations (CCRs). Experimental results show that employing different ML strategies in conjunction with the base model structure influences the performance results to a varying degree. However, ML-based techniques suffer from poor performance on class imbalance handling when used in conjunction with source code representations for software vulnerability detection. |
DOI | 10.1109/SANER50967.2021.00082 |
Citation Key | zheng_representation_2021 |