Visible to the public Biblio

Filters: Keyword is vulnerability detection  [Clear All Filters]
2021-05-18
Zheng, Wei, Gao, Jialiang, Wu, Xiaoxue, Xun, Yuxing, Liu, Guoliang, Chen, Xiang.  2020.  An Empirical Study of High-Impact Factors for Machine Learning-Based Vulnerability Detection. 2020 IEEE 2nd International Workshop on Intelligent Bug Fixing (IBF). :26–34.
Ahstract-Vulnerability detection is an important topic of software engineering. To improve the effectiveness and efficiency of vulnerability detection, many traditional machine learning-based and deep learning-based vulnerability detection methods have been proposed. However, the impact of different factors on vulnerability detection is unknown. For example, classification models and vectorization methods can directly affect the detection results and code replacement can affect the features of vulnerability detection. We conduct a comparative study to evaluate the impact of different classification algorithms, vectorization methods and user-defined variables and functions name replacement. In this paper, we collected three different vulnerability code datasets. These datasets correspond to different types of vulnerabilities and have different proportions of source code. Besides, we extract and analyze the features of vulnerability code datasets to explain some experimental results. Our findings from the experimental results can be summarized as follows: (i) the performance of using deep learning is better than using traditional machine learning and BLSTM can achieve the best performance. (ii) CountVectorizer can improve the performance of traditional machine learning. (iii) Different vulnerability types and different code sources will generate different features. We use the Random Forest algorithm to generate the features of vulnerability code datasets. These generated features include system-related functions, syntax keywords, and user-defined names. (iv) Datasets without user-defined variables and functions name replacement will achieve better vulnerability detection results.
Ogawa, Yuji, Kimura, Tomotaka, Cheng, Jun.  2020.  Vulnerability Assessment for Machine Learning Based Network Anomaly Detection System. 2020 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan). :1–2.
In this paper, we assess the vulnerability of network anomaly detection systems that use machine learning methods. Although the performance of these network anomaly detection systems is high in comparison to that of existing methods without machine learning methods, the use of machine learning methods for detecting vulnerabilities is a growing concern among researchers of image processing. If the vulnerabilities of machine learning used in the network anomaly detection method are exploited by attackers, large security threats are likely to emerge in the near future. Therefore, in this paper we clarify how vulnerability detection of machine learning network anomaly detection methods affects their performance.
Zhang, Chi, Chen, Jinfu, Cai, Saihua, Liu, Bo, Wu, Yiming, Geng, Ye.  2020.  iTES: Integrated Testing and Evaluation System for Software Vulnerability Detection Methods. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1455–1460.
To find software vulnerabilities using software vulnerability detection technology is an important way to ensure the system security. Existing software vulnerability detection methods have some limitations as they can only play a certain role in some specific situations. To accurately analyze and evaluate the existing vulnerability detection methods, an integrated testing and evaluation system (iTES) is designed and implemented in this paper. The main functions of the iTES are:(1) Vulnerability cases with source codes covering common vulnerability types are collected automatically to form a vulnerability cases library; (2) Fourteen methods including static and dynamic vulnerability detection are evaluated in iTES, involving the Windows and Linux platforms; (3) Furthermore, a set of evaluation metrics is designed, including accuracy, false positive rate, utilization efficiency, time cost and resource cost. The final evaluation and test results of iTES have a good guiding significance for the selection of appropriate software vulnerability detection methods or tools according to the actual situation in practice.
2021-01-15
Korshunov, P., Marcel, S..  2019.  Vulnerability assessment and detection of Deepfake videos. 2019 International Conference on Biometrics (ICB). :1—6.
It is becoming increasingly easy to automatically replace a face of one person in a video with the face of another person by using a pre-trained generative adversarial network (GAN). Recent public scandals, e.g., the faces of celebrities being swapped onto pornographic videos, call for automated ways to detect these Deepfake videos. To help developing such methods, in this paper, we present the first publicly available set of Deepfake videos generated from videos of VidTIMIT database. We used open source software based on GANs to create the Deepfakes, and we emphasize that training and blending parameters can significantly impact the quality of the resulted videos. To demonstrate this impact, we generated videos with low and high visual quality (320 videos each) using differently tuned parameter sets. We showed that the state of the art face recognition systems based on VGG and Facenet neural networks are vulnerable to Deepfake videos, with 85.62% and 95.00% false acceptance rates (on high quality versions) respectively, which means methods for detecting Deepfake videos are necessary. By considering several baseline approaches, we found the best performing method based on visual quality metrics, which is often used in presentation attack detection domain, to lead to 8.97% equal error rate on high quality Deep-fakes. Our experiments demonstrate that GAN-generated Deepfake videos are challenging for both face recognition systems and existing detection methods, and the further development of face swapping technology will make it even more so.
2020-09-28
Ibrahim, Ahmed, El-Ramly, Mohammad, Badr, Amr.  2019.  Beware of the Vulnerability! How Vulnerable are GitHub's Most Popular PHP Applications? 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA). :1–7.
The presence of software vulnerabilities is a serious threat to any software project. Exploiting them can compromise system availability, data integrity, and confidentiality. Unfortunately, many open source projects go for years with undetected ready-to-exploit critical vulnerabilities. In this study, we investigate the presence of software vulnerabilities in open source projects and the factors that influence this presence. We analyzed the top 100 open source PHP applications in GitHub using a static analysis vulnerability scanner to examine how common software vulnerabilities are. We also discussed which vulnerabilities are most present and what factors contribute to their presence. We found that 27% of these projects are insecure, with a median number of 3 vulnerabilities per vulnerable project. We found that the most common type is injection vulnerabilities, which made 58% of all detected vulnerabilities. Out of these, cross-site scripting (XSS) was the most common and made 43.5% of all vulnerabilities found. Statistical analysis revealed that project activities like branching, pulling, and committing have a moderate positive correlation with the number of vulnerabilities in the project. Other factors like project popularity, number of releases, and number of issues had almost no influence on the number of vulnerabilities. We recommend that open source project owners should set secure code development guidelines for their project members and establish secure code reviews as part of the project's development process.
Piskachev, Goran, Nguyen Quang Do, Lisa, Johnson, Oshando, Bodden, Eric.  2019.  SWAN\_ASSIST: Semi-Automated Detection of Code-Specific, Security-Relevant Methods. 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). :1094–1097.
To detect specific types of bugs and vulnerabilities, static analysis tools must be correctly configured with security-relevant methods (SRM), e.g., sources, sinks, sanitizers and authentication methods-usually a very labour-intensive and error-prone process. This work presents the semi-automated tool SWAN\_ASSIST, which aids the configuration with an IntelliJ plugin based on active machine learning. It integrates our novel automated machine-learning approach SWAN, which identifies and classifies Java SRM. SWAN\_ASSIST further integrates user feedback through iterative learning. SWAN\_ASSIST aids developers by asking them to classify at each point in time exactly those methods whose classification best impact the classification result. Our experiments show that SWAN\_ASSIST classifies SRM with a high precision, and requires a relatively low effort from the user. A video demo of SWAN\_ASSIST can be found at https://youtu.be/fSyD3V6EQOY. The source code is available at https://github.com/secure-software-engineering/swan.
2020-09-21
Fang, Zheng, Fu, Hao, Gu, Tianbo, Qian, Zhiyun, Jaeger, Trent, Mohapatra, Prasant.  2019.  ForeSee: A Cross-Layer Vulnerability Detection Framework for the Internet of Things. 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). :236–244.
The exponential growth of Internet-of-Things (IoT) devices not only brings convenience but also poses numerous challenging safety and security issues. IoT devices are distributed, highly heterogeneous, and more importantly, directly interact with the physical environment. In IoT systems, the bugs in device firmware, the defects in network protocols, and the design flaws in system configurations all may lead to catastrophic accidents, causing severe threats to people's lives and properties. The challenge gets even more escalated as the possible attacks may be chained together in a long sequence across multiple layers, rendering the current vulnerability analysis inapplicable. In this paper, we present ForeSee, a cross-layer formal framework to comprehensively unveil the vulnerabilities in IoT systems. ForeSee generates a novel attack graph that depicts all of the essential components in IoT, from low-level physical surroundings to high-level decision-making processes. The corresponding graph-based analysis then enables ForeSee to precisely capture potential attack paths. An optimization algorithm is further introduced to reduce the computational complexity of our analysis. The illustrative case studies show that our multilayer modeling can capture threats ignored by the previous approaches.
2020-07-27
Liu, Xianyu, Zheng, Min, Pan, Aimin, Lu, Quan.  2018.  Hardening the Core: Understanding and Detection of XNU Kernel Vulnerabilities. 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). :10–13.
The occurrence of security vulnerabilities in kernel, especially for macOS/iOS kernel XNU, has increased rapidly in recent years. Naturally, concerns were raised due to the high risks they would lead to, which in general are much more serious than common application vulnerabilities. However, discovering XNU kernel vulnerabilities is always very challenging, and the main approach in practice is still manual analysis, which obviously is not a scalable method. In this paper, we perform an in-depth empirical study on the 406 published XNU kernel vulnerabilities to identify distinguishing characteristics of them and then leverage the features to guide our vulnerability detection, i.e., locating suspicious functions. To further improve the efficiency of vulnerability detection, we present KInspector, a new and lightweight framework to detect XNU kernel vulnerabilities by leveraging feedback-based fuzzing techniques. We thoroughly evaluate our approach on XNU with various versions, and the results turn out to be quite promising: 21 N/0-day vulnerabilities have been discovered in our experiments.
2020-03-09
Nilizadeh, Shirin, Noller, Yannic, Pasareanu, Corina S..  2019.  DifFuzz: Differential Fuzzing for Side-Channel Analysis. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). :176–187.
Side-channel attacks allow an adversary to uncover secret program data by observing the behavior of a program with respect to a resource, such as execution time, consumed memory or response size. Side-channel vulnerabilities are difficult to reason about as they involve analyzing the correlations between resource usage over multiple program paths. We present DifFuzz, a fuzzing-based approach for detecting side-channel vulnerabilities related to time and space. DifFuzz automatically detects these vulnerabilities by analyzing two versions of the program and using resource-guided heuristics to find inputs that maximize the difference in resource consumption between secret-dependent paths. The methodology of DifFuzz is general and can be applied to programs written in any language. For this paper, we present an implementation that targets analysis of Java programs, and uses and extends the Kelinci and AFL fuzzers. We evaluate DifFuzz on a large number of Java programs and demonstrate that it can reveal unknown side-channel vulnerabilities in popular applications. We also show that DifFuzz compares favorably against Blazer and Themis, two state-of-the-art analysis tools for finding side-channels in Java programs.
2020-02-17
Wang, Xinda, Sun, Kun, Batcheller, Archer, Jajodia, Sushil.  2019.  Detecting "0-Day" Vulnerability: An Empirical Study of Secret Security Patch in OSS. 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :485–492.
Security patches in open source software (OSS) not only provide security fixes to identified vulnerabilities, but also make the vulnerable code public to the attackers. Therefore, armored attackers may misuse this information to launch N-day attacks on unpatched OSS versions. The best practice for preventing this type of N-day attacks is to keep upgrading the software to the latest version in no time. However, due to the concerns on reputation and easy software development management, software vendors may choose to secretly patch their vulnerabilities in a new version without reporting them to CVE or even providing any explicit description in their change logs. When those secretly patched vulnerabilities are being identified by armored attackers, they can be turned into powerful "0-day" attacks, which can be exploited to compromise not only unpatched version of the same software, but also similar types of OSS (e.g., SSL libraries) that may contain the same vulnerability due to code clone or similar design/implementation logic. Therefore, it is critical to identify secret security patches and downgrade the risk of those "0-day" attacks to at least "n-day" attacks. In this paper, we develop a defense system and implement a toolset to automatically identify secret security patches in open source software. To distinguish security patches from other patches, we first build a security patch database that contains more than 4700 security patches mapping to the records in CVE list. Next, we identify a set of features to help distinguish security patches from non-security ones using machine learning approaches. Finally, we use code clone identification mechanisms to discover similar patches or vulnerabilities in similar types of OSS. The experimental results show our approach can achieve good detection performance. A case study on OpenSSL, LibreSSL, and BoringSSL discovers 12 secret security patches.
Ullah, Imtiaz, Mahmoud, Qusay H..  2019.  A Two-Level Hybrid Model for Anomalous Activity Detection in IoT Networks. 2019 16th IEEE Annual Consumer Communications Networking Conference (CCNC). :1–6.
In this paper we propose a two-level hybrid anomalous activity detection model for intrusion detection in IoT networks. The level-1 model uses flow-based anomaly detection, which is capable of classifying the network traffic as normal or anomalous. The flow-based features are extracted from the CICIDS2017 and UNSW-15 datasets. If an anomaly activity is detected then the flow is forwarded to the level-2 model to find the category of the anomaly by deeply examining the contents of the packet. The level-2 model uses Recursive Feature Elimination (RFE) to select significant features and Synthetic Minority Over-Sampling Technique (SMOTE) for oversampling and Edited Nearest Neighbors (ENN) for cleaning the CICIDS2017 and UNSW-15 datasets. Our proposed model precision, recall and F score for level-1 were measured 100% for the CICIDS2017 dataset and 99% for the UNSW-15 dataset, while the level-2 model precision, recall, and F score were measured at 100 % for the CICIDS2017 dataset and 97 % for the UNSW-15 dataset. The predictor we introduce in this paper provides a solid framework for the development of malicious activity detection in IoT networks.
Letychevskyi, Oleksandr, Peschanenko, Volodymyr, Radchenko, Viktor, Hryniuk, Yaroslav, Yakovlev, Viktor.  2019.  Algebraic Patterns of Vulnerabilities in Binary Code. 2019 10th International Conference on Dependable Systems, Services and Technologies (DESSERT). :70–73.
This paper presents an algebraic approach for formalizing and detecting vulnerabilities in binary code. It uses behaviour algebra equations for creating patterns of vulnerabilities and algebraic matching methods for vulnerability detection. Algebraic matching is based on symbolic modelling. This paper considers a known vulnerability, buffer overflow, as an example to demonstrate an algebraic approach for pattern creation.
Malik, Yasir, Campos, Carlos Renato Salim, Jaafar, Fehmi.  2019.  Detecting Android Security Vulnerabilities Using Machine Learning and System Calls Analysis. 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C). :109–113.
Android operating systems have become a prime target for cyber attackers due to security vulnerabilities in the underlying operating system and application design. Recently, anomaly detection techniques are widely studied for security vulnerabilities detection and classification. However, the ability of the attackers to create new variants of existing malware using various masking techniques makes it harder to deploy these techniques effectively. In this research, we present a robust and effective vulnerabilities detection approach based on anomaly detection in a system calls of benign and malicious Android application. The anomaly in our study is type, frequency, and sequence of system calls that represent a vulnerability. Our system monitors the processes of benign and malicious application and detects security vulnerabilities based on the combination of parameters and metrics, i.e., type, frequency and sequence of system calls to classify the process behavior as benign or malign. The detection algorithm detects the anomaly based on the defined scoring function f and threshold ρ. The system refines the detection process by applying machine learning techniques to find a combination of system call metrics and explore the relationship between security bugs and the pattern of system calls detected. The experiment results show the detection rate of the proposed algorithm based on precision, recall, and f-score for different machine learning algorithms.
Chen, Lu, Ma, Yuanyuan, SHAO, Zhipeng, CHEN, Mu.  2019.  Research on Mobile Application Local Denial of Service Vulnerability Detection Technology Based on Rule Matching. 2019 IEEE International Conference on Energy Internet (ICEI). :585–590.
Aiming at malicious application flooding in mobile application market, this paper proposed a method based on rule matching for mobile application local denial of service vulnerability detection. By combining the advantages of static detection and dynamic detection, static detection adopts smali abstract syntax tree as rule matching object. This static detection method has higher code coverage and better guarantees the integrity of mobile application information. The dynamic detection performs targeted hook verification on the static detection result, which improves the accuracy of the detection result and saves the test workload at the same time. This dynamic detection method has good scalability, can be upgraded with discovery and variants of the vulnerability. Through experiments, it is verified that the mobile application with this vulnerability can be accurately found in a large number of mobile applications, and the effectiveness of the system is verified.
Tunde-Onadele, Olufogorehan, He, Jingzhu, Dai, Ting, Gu, Xiaohui.  2019.  A Study on Container Vulnerability Exploit Detection. 2019 IEEE International Conference on Cloud Engineering (IC2E). :121–127.
Containers have become increasingly popular for deploying applications in cloud computing infrastructures. However, recent studies have shown that containers are prone to various security attacks. In this paper, we conduct a study on the effectiveness of various vulnerability detection schemes for containers. Specifically, we implement and evaluate a set of static and dynamic vulnerability attack detection schemes using 28 real world vulnerability exploits that widely exist in docker images. Our results show that the static vulnerability scanning scheme only detects 3 out of 28 tested vulnerabilities and dynamic anomaly detection schemes detect 22 vulnerability exploits. Combining static and dynamic schemes can further improve the detection rate to 86% (i.e., 24 out of 28 exploits). We also observe that the dynamic anomaly detection scheme can achieve more than 20 seconds lead time (i.e., a time window before attacks succeed) for a group of commonly seen attacks in containers that try to gain a shell and execute arbitrary code.
Letychevskyi, Oleksandr.  2019.  Two-Level Algebraic Method for Detection of Vulnerabilities in Binary Code. 2019 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS). 2:1074–1077.
This study introduces formal methods for detection of vulnerabilities in binary code. It considers the transformation of binary code into behavior algebra expressions and formalization of vulnerabilities. The detection method has two levels: behavior matching and symbolic execution with vulnerability pattern matching. This enables more efficient performance.
Guo, Qingrui, Xie, Peng, Li, Feng, Guo, Xuerang, Li, Yutao, Ma, Lin.  2019.  Research on Linkage Model of Network Resource Survey and Vulnerability Detection in Power Information System. 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). :1068–1071.
this paper first analyses the new challenges of power information network management, difficulties of the power information network resource survey and vulnerability detection are proposed. Then, a linkage model of network resource survey and vulnerability detection is designed, and the framework of three modules in the model is described, meanwhile the process of network resources survey and vulnerability detection linkage is proposed. Finally, the implementation technologies are given corresponding to the main functions of each module.
2020-02-10
Cheng, Xiao, Wang, Haoyu, Hua, Jiayi, Zhang, Miao, Xu, Guoai, Yi, Li, Sui, Yulei.  2019.  Static Detection of Control-Flow-Related Vulnerabilities Using Graph Embedding. 2019 24th International Conference on Engineering of Complex Computer Systems (ICECCS). :41–50.

Static vulnerability detection has shown its effectiveness in detecting well-defined low-level memory errors. However, high-level control-flow related (CFR) vulnerabilities, such as insufficient control flow management (CWE-691), business logic errors (CWE-840), and program behavioral problems (CWE-438), which are often caused by a wide variety of bad programming practices, posing a great challenge for existing general static analysis solutions. This paper presents a new deep-learning-based graph embedding approach to accurate detection of CFR vulnerabilities. Our approach makes a new attempt by applying a recent graph convolutional network to embed code fragments in a compact and low-dimensional representation that preserves high-level control-flow information of a vulnerable program. We have conducted our experiments using 8,368 real-world vulnerable programs by comparing our approach with several traditional static vulnerability detectors and state-of-the-art machine-learning-based approaches. The experimental results show the effectiveness of our approach in terms of both accuracy and recall. Our research has shed light on the promising direction of combining program analysis with deep learning techniques to address the general static analysis challenges.

2019-11-19
Ying, Huan, Zhang, Yanmiao, Han, Lifang, Cheng, Yushi, Li, Jiyuan, Ji, Xiaoyu, Xu, Wenyuan.  2019.  Detecting Buffer-Overflow Vulnerabilities in Smart Grid Devices via Automatic Static Analysis. 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). :813-817.

As a modern power transmission network, smart grid connects plenty of terminal devices. However, along with the growth of devices are the security threats. Different from the previous separated environment, an adversary nowadays can destroy the power system by attacking these devices. Therefore, it's critical to ensure the security and safety of terminal devices. To achieve this goal, detecting the pre-existing vulnerabilities of the device program and enhance the terminal security, are of great importance and necessity. In this paper, we propose a novel approach that detects existing buffer-overflow vulnerabilities of terminal devices via automatic static analysis (ASA). We utilize the static analysis to extract the device program information and build corresponding program models. By further matching the generated program model with pre-defined vulnerability patterns, we achieve vulnerability detection and error reporting. The evaluation results demonstrate that our method can effectively detect buffer-overflow vulnerabilities of smart terminals with a high accuracy and a low false positive rate.

2019-08-05
Kaur, Gurpreet, Malik, Yasir, Samuel, Hamman, Jaafar, Fehmi.  2018.  Detecting Blind Cross-Site Scripting Attacks Using Machine Learning. Proceedings of the 2018 International Conference on Signal Processing and Machine Learning. :22–25.

Cross-site scripting (XSS) is a scripting attack targeting web applications by injecting malicious scripts into web pages. Blind XSS is a subset of stored XSS, where an attacker blindly deploys malicious payloads in web pages that are stored in a persistent manner on target servers. Most of the XSS detection techniques used to detect the XSS vulnerabilities are inadequate to detect blind XSS attacks. In this research, we present machine learning based approach to detect blind XSS attacks. Testing results help to identify malicious payloads that are likely to get stored in databases through web applications.

2019-05-01
Jiang, Yikun, Xie, Wei, Tang, Yong.  2018.  Detecting Authentication-Bypass Flaws in a Large Scale of IoT Embedded Web Servers. Proceedings of the 8th International Conference on Communication and Network Security. :56–63.

With the rapid development of network and communication technologies, everything is able to be connected to the Internet. IoT devices, which include home routers, IP cameras, wireless printers and so on, are crucial parts facilitating to build pervasive and ubiquitous networks. As the number of IoT devices around the world increases, the security issues become more and more serious. To handle with the security issues and protect the IoT devices from being compromised, the firmware of devices needs to be strengthened by discovering and repairing vulnerabilities. Current vulnerability detection tools can only help strengthening traditional software, nevertheless these tools are not practical enough for IoT device firmware, because of the peculiarity in firmware's structure and embedded device's architecture. Therefore, new vulnerability detection framework is required for analyzing IoT device firmware. This paper reviews related works on vulnerability detection in IoT firmware, proposes and implements a framework to automatically detect authentication-bypass flaws in a large scale of Linux-based firmware. The proposed framework is evaluated with a data set of 2351 firmware images from several target vendors, which is proved to be capable of performing large-scale and automated analysis on firmware, and 1 known and 10 unknown authentication-bypass flaws are found by the analysis.

2019-01-21
Kronjee, Jorrit, Hommersom, Arjen, Vranken, Harald.  2018.  Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning. Proceedings of the 13th International Conference on Availability, Reliability and Security. :6:1–6:10.

We present a novel method for static analysis in which we combine data-flow analysis with machine learning to detect SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities in PHP applications. We assembled a dataset from the National Vulnerability Database and the SAMATE project, containing vulnerable PHP code samples and their patched versions in which the vulnerability is solved. We extracted features from the code samples by applying data-flow analysis techniques, including reaching definitions analysis, taint analysis, and reaching constants analysis. We used these features in machine learning to train various probabilistic classifiers. To demonstrate the effectiveness of our approach, we built a tool called WIRECAML, and compared our tool to other tools for vulnerability detection in PHP code. Our tool performed best for detecting both SQLi and XSS vulnerabilities. We also tried our approach on a number of open-source software applications, and found a previously unknown vulnerability in a photo-sharing web application.

Chernis, Boris, Verma, Rakesh.  2018.  Machine Learning Methods for Software Vulnerability Detection. Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics. :31–39.

Software vulnerabilities are a primary concern in the IT security industry, as malicious hackers who discover these vulnerabilities can often exploit them for nefarious purposes. However, complex programs, particularly those written in a relatively low-level language like C, are difficult to fully scan for bugs, even when both manual and automated techniques are used. Since analyzing code and making sure it is securely written is proven to be a non-trivial task, both static analysis and dynamic analysis techniques have been heavily investigated, and this work focuses on the former. The contribution of this paper is a demonstration of how it is possible to catch a large percentage of bugs by extracting text features from functions in C source code and analyzing them with a machine learning classifier. Relatively simple features (character count, character diversity, entropy, maximum nesting depth, arrow count, "if" count, "if" complexity, "while" count, and "for" count) were extracted from these functions, and so were complex features (character n-grams, word n-grams, and suffix trees). The simple features performed unexpectedly better compared to the complex features (74% accuracy compared to 69% accuracy).

Pechenkin, Alexander, Demidov, Roman.  2018.  Applying Deep Learning and Vector Representation for Software Vulnerabilities Detection. Proceedings of the 11th International Conference on Security of Information and Networks. :13:1–13:6.

This paper 1 addresses a problem of vulnerability detection in software represented as assembly code. An extended approach to the vulnerability detection problem is proposed. This work concentrates on improvement of neural network-based approach described in previous works of authors. The authors propose to include the morphology of instructions in vector representations. The bidirectional recurrent neural network is used with access to the execution traces of the program. This has significantly improved the vulnerability detecting accuracy.

Xu, A., Dai, T., Chen, H., Ming, Z., Li, W..  2018.  Vulnerability Detection for Source Code Using Contextual LSTM. 2018 5th International Conference on Systems and Informatics (ICSAI). :1225–1230.

With the development of Internet technology, software vulnerabilities have become a major threat to current computer security. In this work, we propose the vulnerability detection for source code using Contextual LSTM. Compared with CNN and LSTM, we evaluated the CLSTM on 23185 programs, which are collected from SARD. We extracted the features through the program slicing. Based on the features, we used the natural language processing to analysis programs with source code. The experimental results demonstrate that CLSTM has the best performance for vulnerability detection, reaching the accuracy of 96.711% and the F1 score of 0.96984.