Visible to the public Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study

TitleMachine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study
Publication TypeConference Paper
Year of Publication2021
AuthorsPereira, José D'Abruzzo, Campos, João R., Vieira, Marco
Conference Name2021 17th European Dependable Computing Conference (EDCC)
Keywordscodes, Computing Theory, machine learning, Measurement, Metrics, pubcrawl, security, security metrics, Software algorithms, software metrics, static analysis, static code analysis, Tools, vulnerability detection
Abstract

Software developers can use diverse techniques and tools to reduce the number of vulnerabilities, but the effectiveness of existing solutions in real projects is questionable. For example, Static Analysis Tools (SATs) report potential vulnerabilities by analyzing code patterns, and Software Metrics (SMs) can be used to predict vulnerabilities based on high-level characteristics of the code. In theory, both approaches can be applied from the early stages of the development process, but it is well known that they fail to detect critical vulnerabilities and raise a large number of false alarms. This paper studies the hypothesis of using Machine Learning (ML) to combine alerts from SATs with SMs to predict vulnerabilities in a large software project (under development for many years). In practice, we use four ML algorithms, alerts from two SATs, and a large number of SMs to predict whether a source code file is vulnerable or not (binary classification) and to predict the vulnerability category (multiclass classification). Results show that one can achieve either high precision or high recall, but not both at the same time. To understand the reason, we analyze and compare snippets of source code, demonstrating that vulnerable and non-vulnerable files share similar characteristics, making it hard to distinguish vulnerable from non-vulnerable code based on SAT alerts and SMs.

DOI10.1109/EDCC53658.2021.00008
Citation Keypereira_machine_2021