Representation vs. Model: What Matters Most for Source Code Vulnerability Detection

Submitted by aekwall on Tue, 05/10/2022 - 1:50pm

Title	Representation vs. Model: What Matters Most for Source Code Vulnerability Detection
Publication Type	Conference Paper
Year of Publication	2021
Authors	Zheng, Wei, Abdallah Semasaba, Abubakar Omari, Wu, Xiaoxue, Agyemang, Samuel Akwasi, Liu, Tao, Ge, Yuan
Conference Name	2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
Keywords	Analytical models, compositionality, Conferences, Deep Learning, Human Behavior, Measurement, Metrics, Neural networks, pubcrawl, Resiliency, security, software vulnerability detection, Syntactics, transfer learning, vulnerability detection
Abstract	Vulnerabilities in the source code of software are critical issues in the realm of software engineering. Coping with vulnerabilities in software source code is becoming more challenging due to several aspects of complexity and volume. Deep learning has gained popularity throughout the years as a means of addressing such issues. In this paper, we propose an evaluation of vulnerability detection performance on source code representations and evaluate how Machine Learning (ML) strategies can improve them. The structure of our experiment consists of 3 Deep Neural Networks (DNNs) in conjunction with five different source code representations; Abstract Syntax Trees (ASTs), Code Gadgets (CGs), Semantics-based Vulnerability Candidates (SeVCs), Lexed Code Representations (LCRs), and Composite Code Representations (CCRs). Experimental results show that employing different ML strategies in conjunction with the base model structure influences the performance results to a varying degree. However, ML-based techniques suffer from poor performance on class imbalance handling when used in conjunction with source code representations for software vulnerability detection.
DOI	10.1109/SANER50967.2021.00082
Citation Key	zheng_representation_2021

Groups:

Science of Security VO