Visible to the public Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning

TitleDiscovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning
Publication TypeConference Paper
Year of Publication2018
AuthorsKronjee, Jorrit, Hommersom, Arjen, Vranken, Harald
Conference NameProceedings of the 13th International Conference on Availability, Reliability and Security
Date PublishedAugust 2018
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-6448-5
Keywordscompositionality, Cross Site Scripting, data-flow analysis, Human Behavior, machine learning, Metrics, pubcrawl, Resiliency, Scalability, software security, static code analysis, vulnerability detection
Abstract

We present a novel method for static analysis in which we combine data-flow analysis with machine learning to detect SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities in PHP applications. We assembled a dataset from the National Vulnerability Database and the SAMATE project, containing vulnerable PHP code samples and their patched versions in which the vulnerability is solved. We extracted features from the code samples by applying data-flow analysis techniques, including reaching definitions analysis, taint analysis, and reaching constants analysis. We used these features in machine learning to train various probabilistic classifiers. To demonstrate the effectiveness of our approach, we built a tool called WIRECAML, and compared our tool to other tools for vulnerability detection in PHP code. Our tool performed best for detecting both SQLi and XSS vulnerabilities. We also tried our approach on a number of open-source software applications, and found a previously unknown vulnerability in a photo-sharing web application.

URLhttp://doi.acm.org/10.1145/3230833.3230856
DOI10.1145/3230833.3230856
Citation Keykronjee_discovering_2018