Visible to the public Using Software Metrics for Predicting Vulnerable Code-Components: A Study on Java and Python Open Source Projects

TitleUsing Software Metrics for Predicting Vulnerable Code-Components: A Study on Java and Python Open Source Projects
Publication TypeConference Paper
Year of Publication2019
AuthorsChong, T., Anu, V., Sultana, K. Z.
Conference Name2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC)
Keywordscode component, code-component, Java, Java projects, Java vulnerable functions, learning (artificial intelligence), machine learning, Measurement, Metrics, metrics testing, pubcrawl, public domain software, python, Python open source projects, Python vulnerable function prediction, safety-critical software, security, security of data, security vulnerabilities, software metrics, software metrics-based vulnerability prediction, software projects, software reliability, software security, Testing, Vulnerability prediction, vulnerability prediction performance, vulnerability predictors
Abstract

Software vulnerabilities often remain hidden until an attacker exploits the weak/insecure code. Therefore, testing the software from a vulnerability discovery perspective becomes challenging for developers if they do not inspect their code thoroughly (which is time-consuming). We propose that vulnerability prediction using certain software metrics can support the testing process by identifying vulnerable code-components (e.g., functions, classes, etc.). Once a code-component is predicted as vulnerable, the developers can focus their testing efforts on it, thereby avoiding the time/effort required for testing the entire application. The current paper presents a study that compares how software metrics perform as vulnerability predictors for software projects developed in two different languages (Java vs Python). The goal of this research is to analyze the vulnerability prediction performance of software metrics for different programming languages. We designed and conducted experiments on security vulnerabilities reported for three Java projects (Apache Tomcat 6, Tomcat 7, Apache CXF) and two Python projects (Django and Keystone). In this paper, we focus on a specific type of code component: Functions. We apply Machine Learning models for predicting vulnerable functions. Overall results show that software metrics-based vulnerability prediction is more useful for Java projects than Python projects (i.e., software metrics when used as features were able to predict Java vulnerable functions with a higher recall and precision compared to Python vulnerable functions prediction).

DOI10.1109/CSE/EUC.2019.00028
Citation Keychong_using_2019