Visible to the public Biblio

Filters: Keyword is Syntactics  [Clear All Filters]
2022-03-14
Staniloiu, Eduard, Nitu, Razvan, Becerescu, Cristian, Rughiniş, Razvan.  2021.  Automatic Integration of D Code With the Linux Kernel. 2021 20th RoEduNet Conference: Networking in Education and Research (RoEduNet). :1—6.
The Linux kernel is implemented in C, an unsafe programming language, which puts the burden of memory management, type and bounds checking, and error handling in the hands of the developer. Hundreds of buffer overflow bugs have compromised Linux systems over the years, leading to endless layers of mitigations applied on top of C. In contrast, the D programming language offers automated memory safety checks and modern features such as OOP, templates and functional style constructs. In addition, interoper-ability with C is supported out of the box. However, to integrate a D module with the Linux kernel it is required that the needed C header files are translated to D header files. This is a tedious, time consuming, manual task. Although a tool to automate this process exists, called DPP, it does not work with the complicated, sometimes convoluted, kernel code. In this paper, we improve DPP with the ability to translate any Linux kernel C header to D. Our work enables the development and integration of D code inside the Linux kernel, thus facilitating a method of making the kernel memory safe.
McQuistin, Stephen, Band, Vivian, Jacob, Dejice, Perkins, Colin.  2021.  Investigating Automatic Code Generation for Network Packet Parsing. 2021 IFIP Networking Conference (IFIP Networking). :1—9.
Use of formal protocol description techniques and code generation can reduce bugs in network packet parsing code. However, such techniques are themselves complex, and don't see wide adoption in the protocol standards development community, where the focus is on consensus building and human-readable specifications. We explore the utility and effectiveness of new techniques for describing protocol data, specifically designed to integrate with the standards development process, and discuss how they can be used to generate code that is safer and more trustworthy, while maintaining correctness and performance.
2022-03-10
Zhang, Zhongtang, Liu, Shengli, Yang, Qichao, Guo, Shichen.  2021.  Semantic Understanding of Source and Binary Code based on Natural Language Processing. 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). 4:2010—2016.
With the development of open source projects, a large number of open source codes will be reused in binary software, and bugs in source codes will also be introduced into binary codes. In order to detect the reused open source codes in binary codes, it is sometimes necessary to compare and analyze the similarity between source codes and binary codes. One of the main challenge is that the compilation process can generate different binary code representations for the same source code, such as different compiler versions, compilation optimization options and target architectures, which greatly increases the difficulty of semantic similarity detection between source code and binary code. In order to solve the influence of the compilation process on the comparison of semantic similarity of codes, this paper transforms the source code and binary code into LLVM intermediate representation (LLVM IR), which is a universal intermediate representation independent of source code and binary code. We carry out semantic feature extraction and embedding training on LLVM IR based on natural language processing model. Experimental results show that LLVM IR eliminates the influence of compilation on the syntax differences between source code and binary code, and the semantic features of code are well represented and preserved.
2022-02-24
Gondron, Sébastien, Mödersheim, Sebastian.  2021.  Vertical Composition and Sound Payload Abstraction for Stateful Protocols. 2021 IEEE 34th Computer Security Foundations Symposium (CSF). :1–16.
This paper deals with a problem that arises in vertical composition of protocols, i.e., when a channel protocol is used to encrypt and transport arbitrary data from an application protocol that uses the channel. Our work proves that we can verify that the channel protocol ensures its security goals independent of a particular application. More in detail, we build a general paradigm to express vertical composition of an application protocol and a channel protocol, and we give a transformation of the channel protocol where the application payload messages are replaced by abstract constants in a particular way that is feasible for standard automated verification tools. We prove that this transformation is sound for a large class of channel and application protocols. The requirements that channel and application have to satisfy for the vertical composition are all of an easy-to-check syntactic nature.
2021-11-29
Albó, Laia, Beardsley, Marc, Amarasinghe, Ishari, Hernández-Leo, Davinia.  2020.  Individual versus Computer-Supported Collaborative Self-Explanations: How Do Their Writing Analytics Differ? 2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT). :132–134.
Researchers have demonstrated the effectiveness of self-explanations (SE) as an instructional practice and study strategy. However, there is a lack of work studying the characteristics of SE responses prompted by collaborative activities. In this paper, we use writing analytics to investigate differences between SE text responses resulting from individual versus collaborative learning activities. A Coh-Metrix analysis suggests that students in the collaborative SE activity demonstrated a higher level of comprehension. Future research should explore how writing analytics can be incorporated into CSCL systems to support student performance of SE activities.
2021-10-04
Zhong, Chiyang, Sakis Meliopoulos, A. P., AlOwaifeer, Maad, Xie, Jiahao, Ilunga, Gad.  2020.  Object-Oriented Security Constrained Quadratic Optimal Power Flow. 2020 IEEE Power Energy Society General Meeting (PESGM). :1–5.
Increased penetration of distributed energy resources (DERs) creates challenges in formulating the security constrained optimal power flow (SCOPF) problem as the number of models for these resources proliferate. Specifically, the number of devices with different mathematical models is large and their integration into the SCOPF becomes tedious. Henceforth, a process that seamlessly models and integrates such new devices into the SCOPF problem is needed. We propose an object-oriented modeling approach that leads to the autonomous formation of the SCOPF problem. All device models in the system are cast into a universal syntax. We have also introduced a quadratization method which makes the models consisting of linear and quadratic equations, if nonlinear. We refer to this model as the State and Control Quadratized Device Model (SCQDM). The SCQDM includes a number of equations and a number of inequalities expressing the operating limits of the device. The SCOPF problem is then formed in a seamless manner by operating only on the SCQDM device objects. The SCOPF problem, formed this way, is also quadratic (i.e. consists of linear and quadratic equations), and of the same form and syntax as the SCQDM for an individual device. For this reason, we named it security constrained quadratic optimal power flow (SCQOPF). We solve the SCQOPF problem using a sequential linear programming (SLP) algorithm and compare the results with those obtained from the commercial solver Knitro on the IEEE 57 bus system.
2021-06-24
Saletta, Martina, Ferretti, Claudio.  2020.  A Neural Embedding for Source Code: Security Analysis and CWE Lists. 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :523—530.
In this paper, we design a technique for mapping the source code into a vector space and we show its application in the recognition of security weaknesses. By applying ideas commonly used in Natural Language Processing, we train a model for producing an embedding of programs starting from their Abstract Syntax Trees. We then show how such embedding is able to infer clusters roughly separating different classes of software weaknesses. Even if the training of the embedding is unsupervised and made on a generic Java dataset, we show that the model can be used for supervised learning of specific classes of vulnerabilities, helping to capture some features distinguishing them in code. Finally, we discuss how our model performs over the different types of vulnerabilities categorized by the CWE initiative.
2021-05-05
Zhang, Qiao-Jia, Ye, Qing, Yuan, Zhi-Min, Li, Liang.  2020.  Fast HEVC Selective Encryption Scheme Based on Improved CABAC Coding Algorithm. 2020 IEEE 6th International Conference on Computer and Communications (ICCC). :1022—1028.

Context-based adaptive binary arithmetic coding (CABAC) is the only entropy coding method in HEVC. According to statistics, CABAC encoders account for more than 25% of the high efficiency video coding (HEVC) coding time. Therefore, the improved CABAC algorithm can effectively improve the coding speed of HEVC. On this basis, a selective encryption scheme based on the improved CABAC algorithm is proposed. Firstly, the improved CABAC algorithm is used to optimize the regular mode encoding, and then the cryptographic algorithm is used to selectively encrypt the syntax elements in bypass mode encoding. The experimental results show that the encoding time is reduced by nearly 10% when there is great interference to the video information. The scheme is both safe and effective.

Zhang, Yunan, Xu, Aidong Xu, Jiang, Yixin.  2020.  Scalable and Accurate Binary Code Search Method Based on Simhash and Partial Trace. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :818—826.

Binary code search has received much attention recently due to its impactful applications, e.g., plagiarism detection, malware detection and software vulnerability auditing. However, developing an effective binary code search tool is challenging due to the gigantic syntax and structural differences in binaries resulted from different compilers, compiler options and malware family. In this paper, we propose a scalable and accurate binary search engine which performs syntactic matching by combining a set of key techniques to address the challenges above. The key contribution is binary code searching technique which combined function filtering and partial trace method to match the function code relatively quick and accurate. In addition, a simhash and basic information based function filtering is proposed to dramatically reduce the irrelevant target functions. Besides, we introduce a partial trace method for matching the shortlisted function accurately. The experimental results show that our method can find similar functions, even with the presence of program structure distortion, in a scalable manner.

2021-04-09
Noiprasong, P., Khurat, A..  2020.  An IDS Rule Redundancy Verification. 2020 17th International Joint Conference on Computer Science and Software Engineering (JCSSE). :110—115.
Intrusion Detection System (IDS) is a network security software and hardware widely used to detect anomaly network traffics by comparing the traffics against rules specified beforehand. Snort is one of the most famous open-source IDS system. To write a rule, Snort specifies structure and values in Snort manual. This specification is expressive enough to write in different way with the same meaning. If there are rule redundancy, it could distract performance. We, thus, propose a proof of semantical issues for Snort rule and found four pairs of Snort rule combinations that can cause redundancy. In addition, we create a tool to verify such redundancy between two rules on the public rulesets from Snort community and Emerging threat. As a result of our test, we found several redundancy issues in public rulesets if the user enables commented rules.
Ravikumar, G., Singh, A., Babu, J. R., A, A. Moataz, Govindarasu, M..  2020.  D-IDS for Cyber-Physical DER Modbus System - Architecture, Modeling, Testbed-based Evaluation. 2020 Resilience Week (RWS). :153—159.
Increasing penetration of distributed energy resources (DERs) in distribution networks expands the cyberattack surface. Moreover, the widely used standard protocols for communicating DER inverters such as Modbus is more vulnerable to data-integrity attacks and denial of service (DoS) attacks because of its native clear-text packet format. This paper proposes a distributed intrusion detection system (D-IDS) architecture and algorithms for detecting anomalies on the DER Modbus communication. We devised a model-based approach to define physics-based threshold bands for analog data points and transaction-based threshold bands for both the analog and discrete data points. The proposed IDS algorithm uses the model- based approach to develop Modbus-specific IDS rule sets, which can enhance the detection accuracy of the anomalies either by data-integrity attacks or maloperation on cyber-physical DER Modbus devices. Further, the IDS algorithm autogenerates the Modbus-specific IDS rulesets in compliance with various open- source IDS rule syntax formats, such as Snort and Suricata, for seamless integration and mitigation of semantic/syntax errors in the development and production environment. We considered the IEEE 13-bus distribution grid, including DERs, as a case study. We conducted various DoS type attacks and data-integrity attacks on the hardware-in-the-loop (HIL) CPS DER testbed at ISU to evaluate the proposed D-IDS. Consequently, we computed the performance metrics such as IDS detection accuracy, IDS detection rate, and end-to-end latency. The results demonstrated that 100% detection accuracy, 100% detection rate for 60k DoS packets, 99.96% detection rate for 80k DoS packets, and 0.25 ms end-to-end latency between DERs to Control Center.
Mir, N., Khan, M. A. U..  2020.  Copyright Protection for Online Text Information : Using Watermarking and Cryptography. 2020 3rd International Conference on Computer Applications Information Security (ICCAIS). :1—4.
Information and security are interdependent elements. Information security has evolved to be a matter of global interest and to achieve this; it requires tools, policies and assurance of technologies against any relevant security risks. Internet influx while providing a flexible means of sharing the online information economically has rapidly attracted countless writers. Text being an important constituent of online information sharing, creates a huge demand of intellectual copyright protection of text and web itself. Various visible watermarking techniques have been studied for text documents but few for web-based text. In this paper, web page watermarking and cryptography for online content copyrights protection is proposed utilizing the semantic and syntactic rules using HTML (Hypertext Markup Language) and is tested for English and Arabic languages.
2021-03-04
Tang, R., Yang, Z., Li, Z., Meng, W., Wang, H., Li, Q., Sun, Y., Pei, D., Wei, T., Xu, Y. et al..  2020.  ZeroWall: Detecting Zero-Day Web Attacks through Encoder-Decoder Recurrent Neural Networks. IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. :2479—2488.

Zero-day Web attacks are arguably the most serious threats to Web security, but are very challenging to detect because they are not seen or known previously and thus cannot be detected by widely-deployed signature-based Web Application Firewalls (WAFs). This paper proposes ZeroWall, an unsupervised approach, which works with an existing WAF in pipeline, to effectively detecting zero-day Web attacks. Using historical Web requests allowed by an existing signature-based WAF, a vast majority of which are assumed to be benign, ZeroWall trains a self-translation machine using an encoder-decoder recurrent neural network to capture the syntax and semantic patterns of benign requests. In real-time detection, a zero-day attack request (which the WAF fails to detect), not understood well by self-translation machine, cannot be translated back to its original request by the machine, thus is declared as an attack. In our evaluation using 8 real-world traces of 1.4 billion Web requests, ZeroWall successfully detects real zero-day attacks missed by existing WAFs and achieves high F1-scores over 0.98, which significantly outperforms all baseline approaches.

2021-03-01
Taylor, E., Shekhar, S., Taylor, G. W..  2020.  Response Time Analysis for Explainability of Visual Processing in CNNs. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). :1555–1558.
Explainable artificial intelligence (XAI) methods rely on access to model architecture and parameters that is not always feasible for most users, practitioners, and regulators. Inspired by cognitive psychology, we present a case for response times (RTs) as a technique for XAI. RTs are observable without access to the model. Moreover, dynamic inference models performing conditional computation generate variable RTs for visual learning tasks depending on hierarchical representations. We show that MSDNet, a conditional computation model with early-exit architecture, exhibits slower RT for images with more complex features in the ObjectNet test set, as well as the human phenomenon of scene grammar, where object recognition depends on intrascene object-object relationships. These results cast light on MSDNet's feature space without opening the black box and illustrate the promise of RT methods for XAI.
2020-11-23
Tagliaferri, M., Aldini, A..  2018.  A Trust Logic for Pre-Trust Computations. 2018 21st International Conference on Information Fusion (FUSION). :2006–2012.
Computational trust is the digital counterpart of the human notion of trust as applied in social systems. Its main purpose is to improve the reliability of interactions in online communities and of knowledge transfer in information management systems. Trust models are formal frameworks in which the notion of computational trust is described rigorously and where its dynamics are explained precisely. In this paper we will consider and extend a computational trust model, i.e., JØsang's Subjective Logic: we will show how this model is well-suited to describe the dynamics of computational trust, but lacks effective tools to compute initial trust values to feed in the model. To overcome some of the issues with subjective logic, we will introduce a logical language which can be employed to describe and reason about trust. The core ideas behind the logical language will turn out to be useful in computing initial trust values to feed into subjective logic. The aim of the paper is, therefore, that of providing an improvement on subjective logic.
2020-10-12
Rudd-Orthner, Richard N M, Mihaylova, Lyudmilla.  2019.  An Algebraic Expert System with Neural Network Concepts for Cyber, Big Data and Data Migration. 2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). :1–6.

This paper describes a machine assistance approach to grading decisions for values that might be missing or need validation, using a mathematical algebraic form of an Expert System, instead of the traditional textual or logic forms and builds a neural network computational graph structure. This Experts System approach is also structured into a neural network like format of: input, hidden and output layers that provide a structured approach to the knowledge-base organization, this provides a useful abstraction for reuse for data migration applications in big data, Cyber and relational databases. The approach is further enhanced with a Bayesian probability tree approach to grade the confidences of value probabilities, instead of the traditional grading of the rule probabilities, and estimates the most probable value in light of all evidence presented. This is ground work for a Machine Learning (ML) experts system approach in a form that is closer to a Neural Network node structure.

2020-08-28
Jafariakinabad, Fereshteh, Hua, Kien A..  2019.  Style-Aware Neural Model with Application in Authorship Attribution. 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). :325—328.

Writing style is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. In this paper, we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more to capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature.

2020-08-14
Gu, Zuxing, Zhou, Min, Wu, Jiecheng, Jiang, Yu, Liu, Jiaxiang, Gu, Ming.  2019.  IMSpec: An Extensible Approach to Exploring the Incorrect Usage of APIs. 2019 International Symposium on Theoretical Aspects of Software Engineering (TASE). :216—223.
Application Programming Interfaces (APIs) usually have usage constraints, such as call conditions or call orders. Incorrect usage of these constraints, called API misuse, will result in system crashes, bugs, and even security problems. It is crucial to detect such misuses early in the development process. Though many approaches have been proposed over the last years, recent studies show that API misuses are still prevalent, especially the ones specific to individual projects. In this paper, we strive to improve current API-misuse detection capability for large-scale C programs. First, We propose IMSpec, a lightweight domain-specific language enabling developers to specify API usage constraints in three different aspects (i.e., parameter validation, error handling, and causal calling), which are the majority of API-misuse bugs. Then, we have tailored a constraint guided static analysis engine to automatically parse IMSpec rules and detect API-misuse bugs with rich semantics. We evaluate our approach on widely used benchmarks and real-world projects. The results show that our easily extensible approach performs better than state-of-the-art tools. We also discover 19 previously unknown bugs in real-world open-source projects, all of which have been confirmed by the corresponding developers.
2020-07-30
Zhang, Jin, Jin, Dahai, Gong, Yunzhan.  2018.  File Similarity Determination Based on Function Call Graph. 2018 IEEE International Conference on Electronics and Communication Engineering (ICECE). :55—59.
The similarity detection of the program has important significance in code reuse, plagiarism detection, intellectual property protection and information retrieval methods. Attribute counting methods cannot take into account program semantics. The method based on syntax tree or graph structure has a very high construction cost and low space efficiency. So it is difficult to solve problems in large-scale software systems. This paper uses different decision strategies for different levels, then puts forward a similarity detection method at the file level. This method can make full use of the features of the program and take into account the space-time efficiency. By using static analysis methods, we get function features and control flow features of files. And based on this, we establish the function call graph. The similar degree between two files can be measured with the two graphs. Experimental results show the method can effectively detect similar files. Finally, this paper discusses the direction of development of this method.
2020-07-06
Chai, Yadeng, Liu, Yong.  2019.  Natural Spoken Instructions Understanding for Robot with Dependency Parsing. 2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER). :866–871.
This paper presents a method based on syntactic information, which can be used for intent determination and slot filling tasks in a spoken language understanding system including the spoken instructions understanding module for robot. Some studies in recent years attempt to solve the problem of spoken language understanding via syntactic information. This research is a further extension of these approaches which is based on dependency parsing. In this model, the input for neural network are vectors generated by a dependency parsing tree, which we called window vector. This vector contains dependency features that improves performance of the syntactic-based model. The model has been evaluated on the benchmark ATIS task, and the results show that it outperforms many other syntactic-based approaches, especially in terms of slot filling, it has a performance level on par with some state of the art deep learning algorithms in recent years. Also, the model has been evaluated on FBM3, a dataset of the RoCKIn@Home competition. The overall rate of correctly understanding the instructions for robot is quite good but still not acceptable in practical use, which is caused by the small scale of FBM3.
2020-03-23
Naik, Nitin, Jenkins, Paul, Gillett, Jonathan, Mouratidis, Haralambos, Naik, Kshirasagar, Song, Jingping.  2019.  Lockout-Tagout Ransomware: A Detection Method for Ransomware using Fuzzy Hashing and Clustering. 2019 IEEE Symposium Series on Computational Intelligence (SSCI). :641–648.

Ransomware attacks are a prevalent cybersecurity threat to every user and enterprise today. This is attributed to their polymorphic behaviour and dispersion of inexhaustible versions due to the same ransomware family or threat actor. A certain ransomware family or threat actor repeatedly utilises nearly the same style or codebase to create a vast number of ransomware versions. Therefore, it is essential for users and enterprises to keep well-informed about this threat landscape and adopt proactive prevention strategies to minimise its spread and affects. This requires a technique to detect ransomware samples to determine the similarity and link with the known ransomware family or threat actor. Therefore, this paper presents a detection method for ransomware by employing a combination of a similarity preserving hashing method called fuzzy hashing and a clustering method. This detection method is applied on the collected WannaCry/WannaCryptor ransomware samples utilising a range of fuzzy hashing and clustering methods. The clustering results of various clustering methods are evaluated through the use of the internal evaluation indexes to determine the accuracy and consistency of their clustering results, thus the effective combination of fuzzy hashing and clustering method as applied to the particular ransomware corpus. The proposed detection method is a static analysis method, which requires fewer computational overheads and performs rapid comparative analysis with respect to other static analysis methods.

Naik, Nitin, Jenkins, Paul, Savage, Nick, Yang, Longzhi.  2019.  Cyberthreat Hunting - Part 1: Triaging Ransomware using Fuzzy Hashing, Import Hashing and YARA Rules. 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). :1–6.

Ransomware is currently one of the most significant cyberthreats to both national infrastructure and the individual, often requiring severe treatment as an antidote. Triaging ran-somware based on its similarity with well-known ransomware samples is an imperative preliminary step in preventing a ransomware pandemic. Selecting the most appropriate triaging method can improve the precision of further static and dynamic analysis in addition to saving significant t ime a nd e ffort. Currently, the most popular and proven triaging methods are fuzzy hashing, import hashing and YARA rules, which can ascertain whether, or to what degree, two ransomware samples are similar to each other. However, the mechanisms of these three methods are quite different and their comparative assessment is difficult. Therefore, this paper presents an evaluation of these three methods for triaging the four most pertinent ransomware categories WannaCry, Locky, Cerber and CryptoWall. It evaluates their triaging performance and run-time system performance, highlighting the limitations of each method.

2020-02-10
Marin, M\u ad\u alina Angelica, Carabas, Costin, Deaconescu, R\u azvan, T\u apus, Nicolae.  2019.  Proactive Secure Coding for iOS Applications. 2019 18th RoEduNet Conference: Networking in Education and Research (RoEduNet). :1–5.

In this paper we propose a solution to support iOS developers in creating better applications, to use static analysis to investigate source code and detect secure coding issues while simultaneously pointing out good practices and/or secure APIs they should use.

Ding, Steven H. H., Fung, Benjamin C. M., Charland, Philippe.  2019.  Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. 2019 IEEE Symposium on Security and Privacy (SP). :472–489.

Reverse engineering is a manually intensive but necessary technique for understanding the inner workings of new malware, finding vulnerabilities in existing systems, and detecting patent infringements in released software. An assembly clone search engine facilitates the work of reverse engineers by identifying those duplicated or known parts. However, it is challenging to design a robust clone search engine, since there exist various compiler optimization options and code obfuscation techniques that make logically similar assembly functions appear to be very different. A practical clone search engine relies on a robust vector representation of assembly code. However, the existing clone search approaches, which rely on a manual feature engineering process to form a feature vector for an assembly function, fail to consider the relationships between features and identify those unique patterns that can statistically distinguish assembly functions. To address this problem, we propose to jointly learn the lexical semantic relationships and the vector representation of assembly functions based on assembly code. We have developed an assembly code representation learning model \textbackslashemphAsm2Vec. It only needs assembly code as input and does not require any prior knowledge such as the correct mapping between assembly functions. It can find and incorporate rich semantic relationships among tokens appearing in assembly code. We conduct extensive experiments and benchmark the learning model with state-of-the-art static and dynamic clone search approaches. We show that the learned representation is more robust and significantly outperforms existing methods against changes introduced by obfuscation and optimizations.

Simos, Dimitris E., Zivanovic, Jovan, Leithner, Manuel.  2019.  Automated Combinatorial Testing for Detecting SQL Vulnerabilities in Web Applications. 2019 IEEE/ACM 14th International Workshop on Automation of Software Test (AST). :55–61.

In this paper, we present a combinatorial testing methodology for testing web applications in regards to SQL injection vulnerabilities. We describe three attack grammars that were developed and used to generate concrete attack vectors. Furthermore, we present and evaluate two different oracles used to observe the application's behavior when subjected to such attack vectors. We also present a prototype tool called SQLInjector capable of automated SQL injection vulnerability testing for web applications. The developed methodology can be applied to any web application that uses server side scripting and HTML for handling user input and has a SQL database backend. Our approach relies on the use of a database proxy, making this a gray-box testing method. We establish the effectiveness of the proposed tool with the WAVSEP verification framework and conduct a case study on real-world web applications, where we are able to discover both known vulnerabilities and additional previously undiscovered flaws.