Biblio
Given the growing sophistication of cyber attacks, designing a perfectly secure system is not generally possible. Quantitative security metrics are thus needed to measure and compare the relative security of proposed security designs and policies. Since the investigation of security breaches has shown a strong impact of human errors, ignoring the human user in computing these metrics can lead to misleading results. Despite this, and although security researchers have long observed the impact of human behavior on system security, few improvements have been made in designing systems that are resilient to the uncertainties in how humans interact with a cyber system. In this work, we develop an approach for including models of user behavior, emanating from the fields of social sciences and psychology, in the modeling of systems intended to be secure. We then illustrate how one of these models, namely general deterrence theory, can be used to study the effectiveness of the password security requirements policy and the frequency of security audits in a typical organization. Finally, we discuss the many challenges that arise when adopting such a modeling approach, and then present our recommendations for future work.
The correct prediction of faulty modules or classes has a number of advantages such as improving the quality of software and assigning capable development resources to fix such faults. There have been different kinds of fault/defect prediction models proposed in literature, but a great majority of them makes use of static code metrics as independent variables for making predictions. Recently, process metrics have gained a considerable attention as alternative metrics to use for making trust-worthy predictions. The objective of this paper is to investigate different combinations of static code and process metrics for evaluating fault prediction performance. We have used publicly available data sets, along with a frequently used classifier, Naive Bayes, to run our experiments. We have, both statistically and visually, analyzed our experimental results. The statistical analysis showed evidence against any significant difference in fault prediction performances for a variety of different combinations of metrics. This reinforced earlier research results that process metrics are as good as predictors of fault proneness as static code metrics. Furthermore, the visual inspection of box plots revealed that the best set of metrics for fault prediction is a mix of both static code and process metrics. We also presented evidence in support of some process metrics being more discriminating than others and thus making them as good predictors to use.
Many fault-proneness prediction models have been proposed in literature to identify fault-prone code in software systems. Most of the approaches use fault data history and supervised learning algorithms to build these models. However, since fault data history is not always available, some approaches also suggest using semi-supervised or unsupervised fault-proneness prediction models. The HySOM model, proposed in literature, uses function-level source code metrics to predict fault-prone functions in software systems, without using any fault data. In this paper, we adapt the HySOM approach for object-oriented software systems to predict fault-prone code at class-level granularity using object-oriented source code metrics. This adaptation makes it easier to prioritize the efforts of the testing team as unit tests are often written for classes in object-oriented software systems, and not for methods. Our adaptation also generalizes one main element of the HySOM model, which is the calculation of the source code metrics threshold values. We conducted an empirical study using 12 public datasets. Results show that the adaptation of the HySOM model for class-level fault-proneness prediction improves the consistency and the performance of the model. We additionally compared the performance of the adapted model to supervised approaches based on the Naive Bayes Network, ANN and Random Forest algorithms.
Several defect prediction models proposed are effective when historical datasets are available. Defect prediction becomes difficult when no historical data exist. Cross-project defect prediction (CPDP), which uses projects from other sources/companies to predict the defects in the target projects proposed in recent studies has shown promising results. However, the performance of most CPDP approaches are still beyond satisfactory mainly due to distribution mismatch between the source and target projects. In this study, a credibility theory based Naïve Bayes (CNB) classifier is proposed to establish a novel reweighting mechanism between the source projects and target projects so that the source data could simultaneously adapt to the target data distribution and retain its own pattern. Our experimental results show that the feasibility of the novel algorithm design and demonstrate the significant improvement in terms of the performance metrics considered achieved by CNB over other CPDP approaches.
Software security is an important aspect of ensuring software quality. Early detection of vulnerable code during development is essential for the developers to make cost and time effective software testing. The traditional software metrics are used for early detection of software vulnerability, but they are not directly related to code constructs and do not specify any particular granularity level. The goal of this study is to help developers evaluate software security using class-level traceable patterns called micro patterns to reduce security risks. The concept of micro patterns is similar to design patterns, but they can be automatically recognized and mined from source code. If micro patterns can better predict vulnerable classes compared to traditional software metrics, they can be used in developing a vulnerability prediction model. This study explores the performance of class-level patterns in vulnerability prediction and compares them with traditional class-level software metrics. We studied security vulnerabilities as reported for one major release of Apache Tomcat, Apache Camel and three stand-alone Java web applications. We used machine learning techniques for predicting vulnerabilities using micro patterns and class-level metrics as features. We found that micro patterns have higher recall in detecting vulnerable classes than the software metrics.
Software security is an important aspect of ensuring software quality. The goal of this study is to help developers evaluate software security using traceable patterns and software metrics during development. The concept of traceable patterns is similar to design patterns but they can be automatically recognized and extracted from source code. If these patterns can better predict vulnerable code compared to traditional software metrics, they can be used in developing a vulnerability prediction model to classify code as vulnerable or not. By analyzing and comparing the performance of traceable patterns with metrics, we propose a vulnerability prediction model. This study explores the performance of some code patterns in vulnerability prediction and compares them with traditional software metrics. We use the findings to build an effective vulnerability prediction model. We evaluate security vulnerabilities reported for Apache Tomcat, Apache CXF and three stand-alone Java web applications. We use machine learning and statistical techniques for predicting vulnerabilities using traceable patterns and metrics as features. We found that patterns have a lower false negative rate and higher recall in detecting vulnerable code than the traditional software metrics.
In the last decade, numerous fake websites have been developed on the World Wide Web to mimic trusted websites, with the aim of stealing financial assets from users and organizations. This form of online attack is called phishing, and it has cost the online community and the various stakeholders hundreds of million Dollars. Therefore, effective counter measures that can accurately detect phishing are needed. Machine learning (ML) is a popular tool for data analysis and recently has shown promising results in combating phishing when contrasted with classic anti-phishing approaches, including awareness workshops, visualization and legal solutions. This article investigates ML techniques applicability to detect phishing attacks and describes their pros and cons. In particular, different types of ML techniques have been investigated to reveal the suitable options that can serve as anti-phishing tools. More importantly, we experimentally compare large numbers of ML techniques on real phishing datasets and with respect to different metrics. The purpose of the comparison is to reveal the advantages and disadvantages of ML predictive models and to show their actual performance when it comes to phishing attacks. The experimental results show that Covering approach models are more appropriate as anti-phishing solutions, especially for novice users, because of their simple yet effective knowledge bases in addition to their good phishing detection rate.
Android applications pose security and privacy risks for end-users. These risks are often quantified by performing dynamic analysis and permission analysis of the Android applications after release. Prediction of security and privacy risks associated with Android applications at early stages of application development, e.g. when the developer (s) are
writing the code of the application, might help Android application developers in releasing applications to end-users that have less security and privacy risk. The goal of this paper
is to aid Android application developers in assessing the security and privacy risk associated with Android applications by using static code metrics as predictors. In our paper, we consider security and privacy risk of Android application as how susceptible the application is to leaking private information of end-users and to releasing vulnerabilities. We investigate how effectively static code metrics that are extracted from the source code of Android applications, can be used to predict security and privacy risk of Android applications. We collected 21 static code metrics of 1,407 Android applications, and use the collected static code metrics to predict security and privacy risk of the applications. As the oracle of security and privacy risk, we used Androrisk, a tool that quantifies the amount of security and privacy risk of an Android application using analysis of Android permissions and dynamic analysis. To accomplish our goal, we used statistical learners such as, radial-based support vector machine (r-SVM). For r-SVM, we observe a precision of 0.83. Findings from our paper suggest that with proper selection of static code metrics, r-SVM can be used effectively to predict security and privacy risk of Android applications