Visible to the public Biblio

Filters: Keyword is Vulnerability prediction  [Clear All Filters]
2021-03-15
Brauckmann, A., Goens, A., Castrillon, J..  2020.  ComPy-Learn: A toolbox for exploring machine learning representations for compilers. 2020 Forum for Specification and Design Languages (FDL). :1–4.
Deep Learning methods have not only shown to improve software performance in compiler heuristics, but also e.g. to improve security in vulnerability prediction or to boost developer productivity in software engineering tools. A key to the success of such methods across these use cases is the expressiveness of the representation used to abstract from the program code. Recent work has shown that different such representations have unique advantages in terms of performance. However, determining the best-performing one for a given task is often not obvious and requires empirical evaluation. Therefore, we present ComPy-Learn, a toolbox for conveniently defining, extracting, and exploring representations of program code. With syntax-level language information from the Clang compiler frontend and low-level information from the LLVM compiler backend, the tool supports the construction of linear and graph representations and enables an efficient search for the best-performing representation and model for tasks on program code.
2020-11-02
Chong, T., Anu, V., Sultana, K. Z..  2019.  Using Software Metrics for Predicting Vulnerable Code-Components: A Study on Java and Python Open Source Projects. 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). :98–103.

Software vulnerabilities often remain hidden until an attacker exploits the weak/insecure code. Therefore, testing the software from a vulnerability discovery perspective becomes challenging for developers if they do not inspect their code thoroughly (which is time-consuming). We propose that vulnerability prediction using certain software metrics can support the testing process by identifying vulnerable code-components (e.g., functions, classes, etc.). Once a code-component is predicted as vulnerable, the developers can focus their testing efforts on it, thereby avoiding the time/effort required for testing the entire application. The current paper presents a study that compares how software metrics perform as vulnerability predictors for software projects developed in two different languages (Java vs Python). The goal of this research is to analyze the vulnerability prediction performance of software metrics for different programming languages. We designed and conducted experiments on security vulnerabilities reported for three Java projects (Apache Tomcat 6, Tomcat 7, Apache CXF) and two Python projects (Django and Keystone). In this paper, we focus on a specific type of code component: Functions. We apply Machine Learning models for predicting vulnerable functions. Overall results show that software metrics-based vulnerability prediction is more useful for Java projects than Python projects (i.e., software metrics when used as features were able to predict Java vulnerable functions with a higher recall and precision compared to Python vulnerable functions prediction).

2020-04-24
Shuvro, Rezoan A., Das, Pankaz, Hayat, Majeed M., Talukder, Mitun.  2019.  Predicting Cascading Failures in Power Grids using Machine Learning Algorithms. 2019 North American Power Symposium (NAPS). :1—6.
Although there has been notable progress in modeling cascading failures in power grids, few works included using machine learning algorithms. In this paper, cascading failures that lead to massive blackouts in power grids are predicted and classified into no, small, and large cascades using machine learning algorithms. Cascading-failure data is generated using a cascading failure simulator framework developed earlier. The data set includes the power grid operating parameters such as loading level, level of load shedding, the capacity of the failed lines, and the topological parameters such as edge betweenness centrality and the average shortest distance for numerous combinations of two transmission line failures as features. Then several machine learning algorithms are used to classify cascading failures. Further, linear regression is used to predict the number of failed transmission lines and the amount of load shedding during a cascade based on initial feature values. This data-driven technique can be used to generate cascading failure data set for any real-world power grids and hence, power-grid engineers can use this approach for cascade data generation and hence predicting vulnerabilities and enhancing robustness of the grid.
2020-03-02
Sultana, Kazi Zakia, Chong, Tai-Yin.  2019.  A Proposed Approach to Build an Automated Software Security Assessment Framework using Mined Patterns and Metrics. 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). :176–181.

Software security is a major concern of the developers who intend to deliver a reliable software. Although there is research that focuses on vulnerability prediction and discovery, there is still a need for building security-specific metrics to measure software security and vulnerability-proneness quantitatively. The existing methods are either based on software metrics (defined on the physical characteristics of code; e.g. complexity or lines of code) which are not security-specific or some generic patterns known as nano-patterns (Java method-level traceable patterns that characterize a Java method or function). Other methods predict vulnerabilities using text mining approaches or graph algorithms which perform poorly in cross-project validation and fail to be a generalized prediction model for any system. In this paper, we envision to construct an automated framework that will assist developers to assess the security level of their code and guide them towards developing secure code. To accomplish this goal, we aim to refine and redefine the existing nano-patterns and software metrics to make them more security-centric so that they can be used for measuring the software security level of a source code (either file or function) with higher accuracy. In this paper, we present our visionary approach through a series of three consecutive studies where we (1) will study the challenges of the current software metrics and nano-patterns in vulnerability prediction, (2) will redefine and characterize the nano-patterns and software metrics so that they can capture security-specific properties of code and measure the security level quantitatively, and finally (3) will implement an automated framework for the developers to automatically extract the values of all the patterns and metrics for the given code segment and then flag the estimated security level as a feedback based on our research results. We accomplished some preliminary experiments and presented the results which indicate that our vision can be practically implemented and will have valuable implications in the community of software security.

2019-03-04
Imtiaz, Sayem Mohammad, Bhowmik, Tanmay.  2018.  Towards Data-driven Vulnerability Prediction for Requirements. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. :744–748.
Due to the abundance of security breaches we continue to see, the software development community is recently paying attention to a more proactive approach towards security. This includes predicting vulnerability before exploitation employing static code analysis and machine learning techniques. Such mechanisms, however, are designed to detect post-implementation vulnerabilities. As the root of a vulnerability can often be traced back to the requirement specification, and vulnerability discovered later in the development life cycle is more expensive to fix, we need additional preventive mechanisms capable of predicting vulnerability at a much earlier stage. In this paper, we propose a novel framework providing an automated support to predict vulnerabilities for a requirement as early as during requirement engineering. We further present a preliminary demonstration of our framework and the promising results we observe clearly indicate the value of this new research idea.
2018-11-19
Pang, Yulei, Xue, Xiaozhen, Wang, Huaying.  2017.  Predicting Vulnerable Software Components Through Deep Neural Network. Proceedings of the 2017 International Conference on Deep Learning Technologies. :6–10.

Vulnerabilities need to be detected and removed from software. Although previous studies demonstrated the usefulness of employing prediction techniques in deciding about vulnerabilities of software components, the improvement of effectiveness of these prediction techniques is still a grand challenging research question. This paper employed a technique based on a deep neural network with rectifier linear units trained with stochastic gradient descent method and batch normalization, for predicting vulnerable software components. The features are defined as continuous sequences of tokens in source code files. Besides, a statistical feature selection algorithm is then employed to reduce the feature and search space. We evaluated the proposed technique based on some Java Android applications, and the results demonstrated that the proposed technique could predict vulnerable classes, i.e., software components, with high precision, accuracy and recall.

2018-04-04
Majumder, R., Som, S., Gupta, R..  2017.  Vulnerability prediction through self-learning model. 2017 International Conference on Infocom Technologies and Unmanned Systems (Trends and Future Directions) (ICTUS). :400–402.

Vulnerability being the buzz word in the modern time is the most important jargon related to software and operating system. Since every now and then, software is developed some loopholes and incompleteness lie in the development phase, so there always remains a vulnerability of abruptness in it which can come into picture anytime. Detecting vulnerability is one thing and predicting its occurrence in the due course of time is another thing. If we get to know the vulnerability of any software in the due course of time then it acts as an active alarm for the developers to again develop sound and improvised software the second time. The proposal talks about the implementation of the idea using the artificial neural network, where different data sets are being given as input for being used for further analysis for successful results. As of now, there are models for studying the vulnerabilities in the software and networks, this paper proposal in addition to the current work, will throw light on the predictability of vulnerabilities over the due course of time.

2017-05-19
Moshtari, Sara, Sami, Ashkan.  2016.  Evaluating and Comparing Complexity, Coupling and a New Proposed Set of Coupling Metrics in Cross-project Vulnerability Prediction. Proceedings of the 31st Annual ACM Symposium on Applied Computing. :1415–1421.

Software security is an important concern in the world moving towards Information Technology. Detecting software vulnerabilities is a difficult and resource consuming task. Therefore, automatic vulnerability prediction would help development teams to predict vulnerability-prone components and prioritize security inspection efforts. Software source code metrics and data mining techniques have been recently used to predict vulnerability-prone components. Some of previous studies used a set of unit complexity and coupling metrics to predict vulnerabilities. In this study, first, we compare the predictability power of these two groups of metrics in cross-project vulnerability prediction. In cross-project vulnerability prediction we create the prediction model based on datasets of completely different projects and try to detect vulnerabilities in another project. The experimental results show that unit complexity metrics are stronger vulnerability predictors than coupling metrics. Then, we propose a new set of coupling metrics which are called Included Vulnerable Header (IVH) metrics. These new coupling metrics, which consider interaction of application modules with outside of the application, predict vulnerabilities highly better than regular coupling metrics. Furthermore, adding IVH metrics to the set of complexity metrics improves Recall of the best predictor from 60.9% to 87.4% and shows the best set of metrics for cross-project vulnerability prediction.

2015-05-05
SHAR, L., Briand, L., Tan, H..  2014.  Web Application Vulnerability Prediction using Hybrid Program Analysis and Machine Learning. Dependable and Secure Computing, IEEE Transactions on. PP:1-1.

Due to limited time and resources, web software engineers need support in identifying vulnerable code. A practical approach to predicting vulnerable code would enable them to prioritize security auditing efforts. In this paper, we propose using a set of hybrid (static+dynamic) code attributes that characterize input validation and input sanitization code patterns and are expected to be significant indicators of web application vulnerabilities. Because static and dynamic program analyses complement each other, both techniques are used to extract the proposed attributes in an accurate and scalable way. Current vulnerability prediction techniques rely on the availability of data labeled with vulnerability information for training. For many real world applications, past vulnerability data is often not available or at least not complete. Hence, to address both situations where labeled past data is fully available or not, we apply both supervised and semi-supervised learning when building vulnerability predictors based on hybrid code attributes. Given that semi-supervised learning is entirely unexplored in this domain, we describe how to use this learning scheme effectively for vulnerability prediction. We performed empirical case studies on seven open source projects where we built and evaluated supervised and semi-supervised models. When cross validated with fully available labeled data, the supervised models achieve an average of 77 percent recall and 5 percent probability of false alarm for predicting SQL injection, cross site scripting, remote code execution and file inclusion vulnerabilities. With a low amount of labeled data, when compared to the supervised model, the semi-supervised model showed an average improvement of 24 percent higher recall and 3 percent lower probability of false alarm, thus suggesting semi-supervised learning may be a preferable solution for many real world applications where vulnerability data is missing.