Biblio
Software vulnerabilities are weaknesses in software systems that can have serious consequences when exploited. Examples of side effects include unauthorized authentication, data breaches, and financial losses. Due to the nature of the software industry, companies are increasingly pressured to deploy software as quickly as possible, leading to a large number of undetected software vulnerabilities. Static code analysis, with the support of Static Analysis Tools (SATs), can generate security alerts that highlight potential vulnerabilities in an application's source code. Software Metrics (SMs) have also been used to predict software vulnerabilities, usually with the support of Machine Learning (ML) classification algorithms. Several datasets are available to support the development of improved software vulnerability detection techniques. However, they suffer from the same issues: they are either outdated or use a single type of information. In this paper, we present a methodology for collecting software vulnerabilities from known vulnerability databases and enhancing them with static information (namely SAT alerts and SMs). The proposed methodology aims to define a mechanism capable of more easily updating the collected data.
Web technologies are typically built with time constraints and security vulnerabilities. Automatic software vulnerability scanners are common tools for detecting such vulnerabilities among software developers. It helps to illustrate the program for the attacker by creating a great deal of engagement within the program. SQL Injection and Cross-Site Scripting (XSS) are two of the most commonly spread and dangerous vulnerabilities in web apps that cause to the user. It is very important to trust the findings of the site vulnerability scanning software. Without a clear idea of the accuracy and the coverage of the open-source tools, it is difficult to analyze the result from the automatic vulnerability scanner that provides. The important to do a comparison on the key figure on the automated vulnerability scanners because there are many kinds of a scanner on the market and this comparison can be useful to decide which scanner has better performance in term of SQL Injection and Cross-Site Scripting (XSS) vulnerabilities. In this paper, a method by Jose Fonseca et al, is used to compare open-source automated vulnerability scanners based on detection coverage and a method by Yuki Makino and Vitaly Klyuev for precision rate. The criteria vulnerabilities will be injected into the web applications which then be scanned by the scanners. The results then are compared by analyzing the precision rate and detection coverage of vulnerability detection. Two leading open source automated vulnerability scanners will be evaluated. In this paper, the scanner that being utilizes is OW ASP ZAP and Skipfish for comparison. The results show that from precision rate and detection rate scope, OW ASP ZAP has better performance than Skipfish by two times for precision rate and have almost the same result for detection coverage where OW ASP ZAP has a higher number in high vulnerabilities.
Software developers can use diverse techniques and tools to reduce the number of vulnerabilities, but the effectiveness of existing solutions in real projects is questionable. For example, Static Analysis Tools (SATs) report potential vulnerabilities by analyzing code patterns, and Software Metrics (SMs) can be used to predict vulnerabilities based on high-level characteristics of the code. In theory, both approaches can be applied from the early stages of the development process, but it is well known that they fail to detect critical vulnerabilities and raise a large number of false alarms. This paper studies the hypothesis of using Machine Learning (ML) to combine alerts from SATs with SMs to predict vulnerabilities in a large software project (under development for many years). In practice, we use four ML algorithms, alerts from two SATs, and a large number of SMs to predict whether a source code file is vulnerable or not (binary classification) and to predict the vulnerability category (multiclass classification). Results show that one can achieve either high precision or high recall, but not both at the same time. To understand the reason, we analyze and compare snippets of source code, demonstrating that vulnerable and non-vulnerable files share similar characteristics, making it hard to distinguish vulnerable from non-vulnerable code based on SAT alerts and SMs.