Biblio
Web applications have become an essential resource to access the services of diverse subjects (e.g., financial, healthcare) available on the Internet. Despite the efforts that have been made on its security, namely on the investigation of better techniques to detect vulnerabilities on its source code, the number of vulnerabilities exploited has not decreased. Static analysis tools (SATs) are often used to test the security of applications since their outcomes can help developers in the correction of the bugs they found. The conducted investigation made over SATs stated they often generate errors (false positives (FP) and false negatives (FN)), whose cause is recurrently associated with very diverse coding styles, i.e., similar functionality is implemented in distinct manners, and programming practices that create ambiguity, such as the reuse and share of variables. Based on a common practice of using multiple forms in a same webpage and its processing in a single file, we defined a use case for user login and register with six coding styles scenarios for processing their data, and evaluated the behaviour of three SATs (phpSAFE, RIPS and WAP) with them to verify and understand why SATs produce FP and FN.
With the rapid increase of practical problem complexity and code scale, the threat of software security is increasingly serious. Consequently, it is crucial to pay attention to the analysis of software source code vulnerability in the development stage and take efficient measures to detect the vulnerability as soon as possible. Machine learning techniques have made remarkable achievements in various fields. However, the application of machine learning in the domain of vulnerability static analysis is still in its infancy and the characteristics and performance of diverse methods are quite different. In this survey, we focus on a source code-oriented static vulnerability analysis method using machine learning techniques. We review the studies on source code vulnerability analysis based on machine learning in the past decade. We systematically summarize the development trends and different technical characteristics in this field from the perspectives of the intermediate representation of source code and vulnerability prediction model and put forward several feasible research directions in the future according to the limitations of the current approaches.
Software vulnerabilities are weaknesses in software systems that can have serious consequences when exploited. Examples of side effects include unauthorized authentication, data breaches, and financial losses. Due to the nature of the software industry, companies are increasingly pressured to deploy software as quickly as possible, leading to a large number of undetected software vulnerabilities. Static code analysis, with the support of Static Analysis Tools (SATs), can generate security alerts that highlight potential vulnerabilities in an application's source code. Software Metrics (SMs) have also been used to predict software vulnerabilities, usually with the support of Machine Learning (ML) classification algorithms. Several datasets are available to support the development of improved software vulnerability detection techniques. However, they suffer from the same issues: they are either outdated or use a single type of information. In this paper, we present a methodology for collecting software vulnerabilities from known vulnerability databases and enhancing them with static information (namely SAT alerts and SMs). The proposed methodology aims to define a mechanism capable of more easily updating the collected data.
Development of quality object-oriented software contains security as an integral aspect of that process. During that process, a ceaseless burden on the developers was posed in order to maximize the development and at the same time to reduce the expense and time invested in security. In this paper, the authors analyzed metrics for object-oriented software in order to evaluate and identify the relation between metric value and security of the software. Identification of these relations was achieved by study of software vulnerabilities with code level metrics. By using OWASP classification of vulnerabilities and experimental results, we proved that there was relation between metric values and possible security issues in software. For experimental code analysis, we have developed special software called SOFTMET.
Context: Software security is an imperative aspect of software quality. Early detection of vulnerable code during development can better ensure the security of the codebase and minimize testing efforts. Although traditional software metrics are used for early detection of vulnerabilities, they do not clearly address the granularity level of the issue to precisely pinpoint vulnerabilities. The goal of this study is to employ method-level traceable patterns (nano-patterns) in vulnerability prediction and empirically compare their performance with traditional software metrics. The concept of nano-patterns is similar to design patterns, but these constructs can be automatically recognized and extracted from source code. If nano-patterns can better predict vulnerable methods compared to software metrics, they can be used in developing vulnerability prediction models with better accuracy. Aims: This study explores the performance of method-level patterns in vulnerability prediction. We also compare them with method-level software metrics. Method: We studied vulnerabilities reported for two major releases of Apache Tomcat (6 and 7), Apache CXF, and two stand-alone Java web applications. We used three machine learning techniques to predict vulnerabilities using nano-patterns as features. We applied the same techniques using method-level software metrics as features and compared their performance with nano-patterns. Results: We found that nano-patterns show lower false negative rates for classifying vulnerable methods (for Tomcat 6, 21% vs 34.7%) and therefore, have higher recall in predicting vulnerable code than the software metrics used. On the other hand, software metrics show higher precision than nano-patterns (79.4% vs 76.6%). Conclusion: In summary, we suggest developers use nano-patterns as features for vulnerability prediction to augment existing approaches as these code constructs outperform standard metrics in terms of prediction recall.
Confidentiality, Integrity, and Availability are principal keys to build any secure software. Considering the security principles during the different software development phases would reduce software vulnerabilities. This paper measures the impact of the different software quality metrics on Confidentiality, Integrity, or Availability for any given object-oriented PHP application, which has a list of reported vulnerabilities. The National Vulnerability Database was used to provide the impact score on confidentiality, integrity, and availability for the reported vulnerabilities on the selected applications. This paper includes a study for these scores and its correlation with 25 code metrics for the given vulnerable source code. The achieved results were able to correlate 23.7% of the variability in `Integrity' to four metrics: Vocabulary Used in Code, Card and Agresti, Intelligent Content, and Efferent Coupling metrics. The Length (Halstead metric) could alone predict about 24.2 % of the observed variability in ` Availability'. The results indicate no significant correlation of `Confidentiality' with the tested code metrics.
Software vulnerabilities often remain hidden until an attacker exploits the weak/insecure code. Therefore, testing the software from a vulnerability discovery perspective becomes challenging for developers if they do not inspect their code thoroughly (which is time-consuming). We propose that vulnerability prediction using certain software metrics can support the testing process by identifying vulnerable code-components (e.g., functions, classes, etc.). Once a code-component is predicted as vulnerable, the developers can focus their testing efforts on it, thereby avoiding the time/effort required for testing the entire application. The current paper presents a study that compares how software metrics perform as vulnerability predictors for software projects developed in two different languages (Java vs Python). The goal of this research is to analyze the vulnerability prediction performance of software metrics for different programming languages. We designed and conducted experiments on security vulnerabilities reported for three Java projects (Apache Tomcat 6, Tomcat 7, Apache CXF) and two Python projects (Django and Keystone). In this paper, we focus on a specific type of code component: Functions. We apply Machine Learning models for predicting vulnerable functions. Overall results show that software metrics-based vulnerability prediction is more useful for Java projects than Python projects (i.e., software metrics when used as features were able to predict Java vulnerable functions with a higher recall and precision compared to Python vulnerable functions prediction).