Biblio
Static vulnerability detection has shown its effectiveness in detecting well-defined low-level memory errors. However, high-level control-flow related (CFR) vulnerabilities, such as insufficient control flow management (CWE-691), business logic errors (CWE-840), and program behavioral problems (CWE-438), which are often caused by a wide variety of bad programming practices, posing a great challenge for existing general static analysis solutions. This paper presents a new deep-learning-based graph embedding approach to accurate detection of CFR vulnerabilities. Our approach makes a new attempt by applying a recent graph convolutional network to embed code fragments in a compact and low-dimensional representation that preserves high-level control-flow information of a vulnerable program. We have conducted our experiments using 8,368 real-world vulnerable programs by comparing our approach with several traditional static vulnerability detectors and state-of-the-art machine-learning-based approaches. The experimental results show the effectiveness of our approach in terms of both accuracy and recall. Our research has shed light on the promising direction of combining program analysis with deep learning techniques to address the general static analysis challenges.
As a modern power transmission network, smart grid connects plenty of terminal devices. However, along with the growth of devices are the security threats. Different from the previous separated environment, an adversary nowadays can destroy the power system by attacking these devices. Therefore, it's critical to ensure the security and safety of terminal devices. To achieve this goal, detecting the pre-existing vulnerabilities of the device program and enhance the terminal security, are of great importance and necessity. In this paper, we propose a novel approach that detects existing buffer-overflow vulnerabilities of terminal devices via automatic static analysis (ASA). We utilize the static analysis to extract the device program information and build corresponding program models. By further matching the generated program model with pre-defined vulnerability patterns, we achieve vulnerability detection and error reporting. The evaluation results demonstrate that our method can effectively detect buffer-overflow vulnerabilities of smart terminals with a high accuracy and a low false positive rate.
Cross-site scripting (XSS) is a scripting attack targeting web applications by injecting malicious scripts into web pages. Blind XSS is a subset of stored XSS, where an attacker blindly deploys malicious payloads in web pages that are stored in a persistent manner on target servers. Most of the XSS detection techniques used to detect the XSS vulnerabilities are inadequate to detect blind XSS attacks. In this research, we present machine learning based approach to detect blind XSS attacks. Testing results help to identify malicious payloads that are likely to get stored in databases through web applications.
With the rapid development of network and communication technologies, everything is able to be connected to the Internet. IoT devices, which include home routers, IP cameras, wireless printers and so on, are crucial parts facilitating to build pervasive and ubiquitous networks. As the number of IoT devices around the world increases, the security issues become more and more serious. To handle with the security issues and protect the IoT devices from being compromised, the firmware of devices needs to be strengthened by discovering and repairing vulnerabilities. Current vulnerability detection tools can only help strengthening traditional software, nevertheless these tools are not practical enough for IoT device firmware, because of the peculiarity in firmware's structure and embedded device's architecture. Therefore, new vulnerability detection framework is required for analyzing IoT device firmware. This paper reviews related works on vulnerability detection in IoT firmware, proposes and implements a framework to automatically detect authentication-bypass flaws in a large scale of Linux-based firmware. The proposed framework is evaluated with a data set of 2351 firmware images from several target vendors, which is proved to be capable of performing large-scale and automated analysis on firmware, and 1 known and 10 unknown authentication-bypass flaws are found by the analysis.
We present a novel method for static analysis in which we combine data-flow analysis with machine learning to detect SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities in PHP applications. We assembled a dataset from the National Vulnerability Database and the SAMATE project, containing vulnerable PHP code samples and their patched versions in which the vulnerability is solved. We extracted features from the code samples by applying data-flow analysis techniques, including reaching definitions analysis, taint analysis, and reaching constants analysis. We used these features in machine learning to train various probabilistic classifiers. To demonstrate the effectiveness of our approach, we built a tool called WIRECAML, and compared our tool to other tools for vulnerability detection in PHP code. Our tool performed best for detecting both SQLi and XSS vulnerabilities. We also tried our approach on a number of open-source software applications, and found a previously unknown vulnerability in a photo-sharing web application.
Software vulnerabilities are a primary concern in the IT security industry, as malicious hackers who discover these vulnerabilities can often exploit them for nefarious purposes. However, complex programs, particularly those written in a relatively low-level language like C, are difficult to fully scan for bugs, even when both manual and automated techniques are used. Since analyzing code and making sure it is securely written is proven to be a non-trivial task, both static analysis and dynamic analysis techniques have been heavily investigated, and this work focuses on the former. The contribution of this paper is a demonstration of how it is possible to catch a large percentage of bugs by extracting text features from functions in C source code and analyzing them with a machine learning classifier. Relatively simple features (character count, character diversity, entropy, maximum nesting depth, arrow count, "if" count, "if" complexity, "while" count, and "for" count) were extracted from these functions, and so were complex features (character n-grams, word n-grams, and suffix trees). The simple features performed unexpectedly better compared to the complex features (74% accuracy compared to 69% accuracy).
This paper 1 addresses a problem of vulnerability detection in software represented as assembly code. An extended approach to the vulnerability detection problem is proposed. This work concentrates on improvement of neural network-based approach described in previous works of authors. The authors propose to include the morphology of instructions in vector representations. The bidirectional recurrent neural network is used with access to the execution traces of the program. This has significantly improved the vulnerability detecting accuracy.
With the development of Internet technology, software vulnerabilities have become a major threat to current computer security. In this work, we propose the vulnerability detection for source code using Contextual LSTM. Compared with CNN and LSTM, we evaluated the CLSTM on 23185 programs, which are collected from SARD. We extracted the features through the program slicing. Based on the features, we used the natural language processing to analysis programs with source code. The experimental results demonstrate that CLSTM has the best performance for vulnerability detection, reaching the accuracy of 96.711% and the F1 score of 0.96984.