Visible to the public Biblio

Filters: Keyword is prediction models  [Clear All Filters]
2019-11-12
Ferenc, Rudolf, Heged\H us, Péter, Gyimesi, Péter, Antal, Gábor, Bán, Dénes, Gyimóthy, Tibor.  2019.  Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript Functions. 2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE). :8-14.

The rapid rise of cyber-crime activities and the growing number of devices threatened by them place software security issues in the spotlight. As around 90% of all attacks exploit known types of security issues, finding vulnerable components and applying existing mitigation techniques is a viable practical approach for fighting against cyber-crime. In this paper, we investigate how the state-of-the-art machine learning techniques, including a popular deep learning algorithm, perform in predicting functions with possible security vulnerabilities in JavaScript programs. We applied 8 machine learning algorithms to build prediction models using a new dataset constructed for this research from the vulnerability information in public databases of the Node Security Project and the Snyk platform, and code fixing patches from GitHub. We used static source code metrics as predictors and an extensive grid-search algorithm to find the best performing models. We also examined the effect of various re-sampling strategies to handle the imbalanced nature of the dataset. The best performing algorithm was KNN, which created a model for the prediction of vulnerable functions with an F-measure of 0.76 (0.91 precision and 0.66 recall). Moreover, deep learning, tree and forest based classifiers, and SVM were competitive with F-measures over 0.70. Although the F-measures did not vary significantly with the re-sampling strategies, the distribution of precision and recall did change. No re-sampling seemed to produce models preferring high precision, while re-sampling strategies balanced the IR measures.

2017-09-06
C. Theisen, K. Herzig, B. Murphy, L. Williams.  2017.  Risk-based attack surface approximation: how much data is enough? 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). :273-282.

Proactive security reviews and test efforts are a necessary component of the software development lifecycle. Resource limitations often preclude reviewing the entire code base. Making informed decisions on what code to review can improve a team's ability to find and remove vulnerabilities. Risk-based attack surface approximation (RASA) is a technique that uses crash dump stack traces to predict what code may contain exploitable vulnerabilities. The goal of this research is to help software development teams prioritize security efforts by the efficient development of a risk-based attack surface approximation. We explore the use of RASA using Mozilla Firefox and Microsoft Windows stack traces from crash dumps. We create RASA at the file level for Firefox, in which the 15.8% of the files that were part of the approximation contained 73.6% of the vulnerabilities seen for the product. We also explore the effect of random sampling of crashes on the approximation, as it may be impractical for organizations to store and process every crash received. We find that 10-fold random sampling of crashes at a rate of 10% resulted in 3% less vulnerabilities identified than using the entire set of stack traces for Mozilla Firefox. Sampling crashes in Windows 8.1 at a rate of 40% resulted in insignificant differences in vulnerability and file coverage as compared to a rate of 100%.

2017-06-05
Hovsepyan, Aram, Scandariato, Riccardo, Joosen, Wouter.  2016.  Is Newer Always Better?: The Case of Vulnerability Prediction Models Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. :26:1–26:6.

Finding security vulnerabilities in the source code as early as possible is becoming more and more essential. In this respect, vulnerability prediction models have the potential to help the security assurance activities by identifying code locations that deserve the most attention. In this paper, we investigate whether prediction models behave like milk (i.e., they turn with time) or wine (i.e., the improve with time) when used to predict future vulnerabilities. Our findings indicate that the recall values are largely in favor of predictors based on older versions. However, the better recall comes at the price of much higher file inspection ratio values.