Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning
Title | Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Kronjee, Jorrit, Hommersom, Arjen, Vranken, Harald |
Conference Name | Proceedings of the 13th International Conference on Availability, Reliability and Security |
Date Published | August 2018 |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-6448-5 |
Keywords | compositionality, Cross Site Scripting, data-flow analysis, Human Behavior, machine learning, Metrics, pubcrawl, Resiliency, Scalability, software security, static code analysis, vulnerability detection |
Abstract | We present a novel method for static analysis in which we combine data-flow analysis with machine learning to detect SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities in PHP applications. We assembled a dataset from the National Vulnerability Database and the SAMATE project, containing vulnerable PHP code samples and their patched versions in which the vulnerability is solved. We extracted features from the code samples by applying data-flow analysis techniques, including reaching definitions analysis, taint analysis, and reaching constants analysis. We used these features in machine learning to train various probabilistic classifiers. To demonstrate the effectiveness of our approach, we built a tool called WIRECAML, and compared our tool to other tools for vulnerability detection in PHP code. Our tool performed best for detecting both SQLi and XSS vulnerabilities. We also tried our approach on a number of open-source software applications, and found a previously unknown vulnerability in a photo-sharing web application. |
URL | http://doi.acm.org/10.1145/3230833.3230856 |
DOI | 10.1145/3230833.3230856 |
Citation Key | kronjee_discovering_2018 |