Biblio
Software vulnerabilities often remain hidden until an attacker exploits the weak/insecure code. Therefore, testing the software from a vulnerability discovery perspective becomes challenging for developers if they do not inspect their code thoroughly (which is time-consuming). We propose that vulnerability prediction using certain software metrics can support the testing process by identifying vulnerable code-components (e.g., functions, classes, etc.). Once a code-component is predicted as vulnerable, the developers can focus their testing efforts on it, thereby avoiding the time/effort required for testing the entire application. The current paper presents a study that compares how software metrics perform as vulnerability predictors for software projects developed in two different languages (Java vs Python). The goal of this research is to analyze the vulnerability prediction performance of software metrics for different programming languages. We designed and conducted experiments on security vulnerabilities reported for three Java projects (Apache Tomcat 6, Tomcat 7, Apache CXF) and two Python projects (Django and Keystone). In this paper, we focus on a specific type of code component: Functions. We apply Machine Learning models for predicting vulnerable functions. Overall results show that software metrics-based vulnerability prediction is more useful for Java projects than Python projects (i.e., software metrics when used as features were able to predict Java vulnerable functions with a higher recall and precision compared to Python vulnerable functions prediction).
Software security is a major concern of the developers who intend to deliver a reliable software. Although there is research that focuses on vulnerability prediction and discovery, there is still a need for building security-specific metrics to measure software security and vulnerability-proneness quantitatively. The existing methods are either based on software metrics (defined on the physical characteristics of code; e.g. complexity or lines of code) which are not security-specific or some generic patterns known as nano-patterns (Java method-level traceable patterns that characterize a Java method or function). Other methods predict vulnerabilities using text mining approaches or graph algorithms which perform poorly in cross-project validation and fail to be a generalized prediction model for any system. In this paper, we envision to construct an automated framework that will assist developers to assess the security level of their code and guide them towards developing secure code. To accomplish this goal, we aim to refine and redefine the existing nano-patterns and software metrics to make them more security-centric so that they can be used for measuring the software security level of a source code (either file or function) with higher accuracy. In this paper, we present our visionary approach through a series of three consecutive studies where we (1) will study the challenges of the current software metrics and nano-patterns in vulnerability prediction, (2) will redefine and characterize the nano-patterns and software metrics so that they can capture security-specific properties of code and measure the security level quantitatively, and finally (3) will implement an automated framework for the developers to automatically extract the values of all the patterns and metrics for the given code segment and then flag the estimated security level as a feedback based on our research results. We accomplished some preliminary experiments and presented the results which indicate that our vision can be practically implemented and will have valuable implications in the community of software security.
Vulnerabilities need to be detected and removed from software. Although previous studies demonstrated the usefulness of employing prediction techniques in deciding about vulnerabilities of software components, the improvement of effectiveness of these prediction techniques is still a grand challenging research question. This paper employed a technique based on a deep neural network with rectifier linear units trained with stochastic gradient descent method and batch normalization, for predicting vulnerable software components. The features are defined as continuous sequences of tokens in source code files. Besides, a statistical feature selection algorithm is then employed to reduce the feature and search space. We evaluated the proposed technique based on some Java Android applications, and the results demonstrated that the proposed technique could predict vulnerable classes, i.e., software components, with high precision, accuracy and recall.
Vulnerability being the buzz word in the modern time is the most important jargon related to software and operating system. Since every now and then, software is developed some loopholes and incompleteness lie in the development phase, so there always remains a vulnerability of abruptness in it which can come into picture anytime. Detecting vulnerability is one thing and predicting its occurrence in the due course of time is another thing. If we get to know the vulnerability of any software in the due course of time then it acts as an active alarm for the developers to again develop sound and improvised software the second time. The proposal talks about the implementation of the idea using the artificial neural network, where different data sets are being given as input for being used for further analysis for successful results. As of now, there are models for studying the vulnerabilities in the software and networks, this paper proposal in addition to the current work, will throw light on the predictability of vulnerabilities over the due course of time.
Software security is an important concern in the world moving towards Information Technology. Detecting software vulnerabilities is a difficult and resource consuming task. Therefore, automatic vulnerability prediction would help development teams to predict vulnerability-prone components and prioritize security inspection efforts. Software source code metrics and data mining techniques have been recently used to predict vulnerability-prone components. Some of previous studies used a set of unit complexity and coupling metrics to predict vulnerabilities. In this study, first, we compare the predictability power of these two groups of metrics in cross-project vulnerability prediction. In cross-project vulnerability prediction we create the prediction model based on datasets of completely different projects and try to detect vulnerabilities in another project. The experimental results show that unit complexity metrics are stronger vulnerability predictors than coupling metrics. Then, we propose a new set of coupling metrics which are called Included Vulnerable Header (IVH) metrics. These new coupling metrics, which consider interaction of application modules with outside of the application, predict vulnerabilities highly better than regular coupling metrics. Furthermore, adding IVH metrics to the set of complexity metrics improves Recall of the best predictor from 60.9% to 87.4% and shows the best set of metrics for cross-project vulnerability prediction.
Due to limited time and resources, web software engineers need support in identifying vulnerable code. A practical approach to predicting vulnerable code would enable them to prioritize security auditing efforts. In this paper, we propose using a set of hybrid (static+dynamic) code attributes that characterize input validation and input sanitization code patterns and are expected to be significant indicators of web application vulnerabilities. Because static and dynamic program analyses complement each other, both techniques are used to extract the proposed attributes in an accurate and scalable way. Current vulnerability prediction techniques rely on the availability of data labeled with vulnerability information for training. For many real world applications, past vulnerability data is often not available or at least not complete. Hence, to address both situations where labeled past data is fully available or not, we apply both supervised and semi-supervised learning when building vulnerability predictors based on hybrid code attributes. Given that semi-supervised learning is entirely unexplored in this domain, we describe how to use this learning scheme effectively for vulnerability prediction. We performed empirical case studies on seven open source projects where we built and evaluated supervised and semi-supervised models. When cross validated with fully available labeled data, the supervised models achieve an average of 77 percent recall and 5 percent probability of false alarm for predicting SQL injection, cross site scripting, remote code execution and file inclusion vulnerabilities. With a low amount of labeled data, when compared to the supervised model, the semi-supervised model showed an average improvement of 24 percent higher recall and 3 percent lower probability of false alarm, thus suggesting semi-supervised learning may be a preferable solution for many real world applications where vulnerability data is missing.