Visible to the public Biblio

Filters: Keyword is Program slicing  [Clear All Filters]
2023-02-02
Muske, Tukaram, Serebrenik, Alexander.  2022.  Classification and Ranking of Delta Static Analysis Alarms. 2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM). :197–207.

Static analysis tools help to detect common pro-gramming errors but generate a large number of false positives. Moreover, when applied to evolving software systems, around 95 % of alarms generated on a version are repeated, i.e., they have also been generated on the previous version. Version-aware static analysis techniques (VSATs) have been proposed to suppress the repeated alarms that are not impacted by the code changes between the two versions. The alarms reported by VSATs after the suppression, called delta alarms, still constitute 63% of the tool-generated alarms. We observe that delta alarms can be further postprocessed using their corresponding code changes: the code changes due to which VSATs identify them as delta alarms. However, none of the existing VSATs or alarms postprocessing techniques postprocesses delta alarms using the corresponding code changes. Based on this observation, we use the code changes to classify delta alarms into six classes that have different priorities assigned to them. The assignment of priorities is based on the type of code changes and their likelihood of actually impacting the delta alarms. The ranking of alarms, obtained by prioritizing the classes, can help suppress alarms that are ranked lower, when resources to inspect all the tool-generated alarms are limited. We performed an empirical evaluation using 9789 alarms generated on 59 versions of seven open source C applications. The evaluation results indicate that the proposed classification and ranking of delta alarms help to identify, on average, 53 % of delta alarms as more likely to be false positives than the others.

2022-05-10
Wang, Ben, Chu, Hanting, Zhang, Pengcheng, Dong, Hai.  2021.  Smart Contract Vulnerability Detection Using Code Representation Fusion. 2021 28th Asia-Pacific Software Engineering Conference (APSEC). :564–565.
At present, most smart contract vulnerability detection use manually-defined patterns, which is time-consuming and far from satisfactory. To address this issue, researchers attempt to deploy deep learning techniques for automatic vulnerability detection in smart contracts. Nevertheless, current work mostly relies on a single code representation such as AST (Abstract Syntax Tree) or code tokens to learn vulnerability characteristics, which might lead to incompleteness of learned semantics information. In addition, the number of available vulnerability datasets is also insufficient. To address these limitations, first, we construct a dataset covering most typical types of smart contract vulnerabilities, which can accurately indicate the specific row number where a vulnerability may exist. Second, for each single code representation, we propose a novel way called AFS (AST Fuse program Slicing) to fuse code characteristic information. AFS can fuse the structured information of AST with program slicing information and detect vulnerabilities by learning new vulnerability characteristic information.
2020-01-20
Zhu, Lipeng, Fu, Xiaotong, Yao, Yao, Zhang, Yuqing, Wang, He.  2019.  FIoT: Detecting the Memory Corruption in Lightweight IoT Device Firmware. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :248–255.
The IoT industry has developed rapidly in recent years, which has attracted the attention of security researchers. However, the researchers are hampered by the wide variety of IoT device operating systems and their hardware architectures. Especially for the lightweight IoT devices, many manufacturers do not provide the device firmware images, embedded firmware source code or even the develop documents. As a result, it hinders traditional static analysis and dynamic analysis techniques. In this paper, we propose a novel dynamic analysis framework, called FIoT, which aims at finding memory corruption vulnerabilities in lightweight IoT device firmware images. The key idea is dynamically run the binary code snippets through symbolic execution with carrying out a fuzzing test. Specifically, we generate code snippets through traversing the control-flow graph (CFG) in a backward manner. We improved the CFG recovery approach and backward slice approach for better performance. To reduce the influence of the binary firmware, FIoT leverages loading address determination analysis and library function identification approach. We have implemented a prototype of FIoT and conducted experiments. Our results show that FIoT can complete the Fuzzing test within 40 seconds in average. Considering 170 seconds for static analysis, FIoT can load and analyze a lightweight IoT firmware within 210 seconds in total. Furthermore, we illustrate the effectiveness of FIoT by applying it over 115 firmware images from 17 manufacturers. We have found 35 images exist memory corruptions, which are all zero-day vulnerabilities.
2019-06-28
Chen, G., Wang, D., Li, T., Zhang, C., Gu, M., Sun, J..  2018.  Scalable Verification Framework for C Program. 2018 25th Asia-Pacific Software Engineering Conference (APSEC). :129-138.

Software verification has been well applied in safety critical areas and has shown the ability to provide better quality assurance for modern software. However, as lines of code and complexity of software systems increase, the scalability of verification becomes a challenge. In this paper, we present an automatic software verification framework TSV to address the scalability issues: (i) the extended structural abstraction and property-guided program slicing to solve large-scale program verification problem, saving time and memory without losing accuracy; (ii) automatically select different verification methods according to the program and property context to improve the verification efficiency. For evaluation, we compare TSV's different configurations with existing C program verifiers based on open benchmarks. We found that TSV with auto-selection performs better than with bounded model checking only or with extended structural abstraction only. Compared to existing tools such as CMBC and CPAChecker, it acquires 10%-20% improvement of accuracy and 50%-90% improvement of memory consumption.

2018-06-20
Aslanyan, H., Avetisyan, A., Arutunian, M., Keropyan, G., Kurmangaleev, S., Vardanyan, V..  2017.  Scalable Framework for Accurate Binary Code Comparison. 2017 Ivannikov ISPRAS Open Conference (ISPRAS). :34–38.
Comparison of two binary files has many practical applications: the ability to detect programmatic changes between two versions, the ability to find old versions of statically linked libraries to prevent the use of well-known bugs, malware analysis, etc. In this article, a framework for comparison of binary files is presented. Framework uses IdaPro [1] disassembler and Binnavi [2] platform to recover structure of the target program and represent it as a call graph (CG). A program dependence graph (PDG) corresponds to each vertex of the CG. The proposed comparison algorithm consists of two main stages. At the first stage, several heuristics are applied to find the exact matches. Two functions are matched if at least one of the calculated heuristics is the same and unique in both binaries. At the second stage, backward and forward slicing is applied on matched vertices of CG to find further matches. According to empiric results heuristic method is effective and has high matching quality for unchanged or slightly modified functions. As a contradiction, to match heavily modified functions, binary code clone detection is used and it is based on finding maximum common subgraph for pair of PDGs. To achieve high performance on extensive binaries, the whole matching process is parallelized. The framework is tested on the number of real world libraries, such as python, openssh, openssl, libxml2, rsync, php, etc. Results show that in most cases more than 95% functions are truly matched. The tool is scalable due to parallelization of functions matching process and generation of PDGs and CGs.
2018-06-07
Koc, Ugur, Saadatpanah, Parsa, Foster, Jeffrey S., Porter, Adam A..  2017.  Learning a Classifier for False Positive Error Reports Emitted by Static Code Analysis Tools. Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. :35–42.
The large scale and high complexity of modern software systems make perfectly precise static code analysis (SCA) infeasible. Therefore SCA tools often over-approximate, so not to miss any real problems. This, however, comes at the expense of raising false alarms, which, in practice, reduces the usability of these tools. To partially address this problem, we propose a novel learning process whose goal is to discover program structures that cause a given SCA tool to emit false error reports, and then to use this information to predict whether a new error report is likely to be a false positive as well. To do this, we first preprocess code to isolate the locations that are related to the error report. Then, we apply machine learning techniques to the preprocessed code to discover correlations and to learn a classifier. We evaluated this approach in an initial case study of a widely-used SCA tool for Java. Our results showed that for our dataset we could accurately classify a large majority of false positive error reports. Moreover, we identified some common coding patterns that led to false positive errors. We believe that SCA developers may be able to redesign their methods to address these patterns and reduce false positive error reports.
2017-05-30
Srinivasan, Venkatesh, Reps, Thomas.  2016.  An Improved Algorithm for Slicing Machine Code. Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. :378–393.

Machine-code slicing is an important primitive for building binary analysis and rewriting tools, such as taint trackers, fault localizers, and partial evaluators. However, it is not easy to create a machine-code slicer that exhibits a high level of precision. Moreover, the problem of creating such a tool is compounded by the fact that a small amount of local imprecision can be amplified via cascade effects. Most instructions in instruction sets such as Intel's IA-32 and ARM are multi-assignments: they have several inputs and several outputs (registers, flags, and memory locations). This aspect of the instruction set introduces a granularity issue during slicing: there are often instructions at which we would like the slice to include only a subset of the instruction's semantics, whereas the slice is forced to include the entire instruction. Consequently, the slice computed by state-of-the-art tools is very imprecise, often including essentially the entire program. This paper presents an algorithm to slice machine code more accurately. To counter the granularity issue, our algorithm performs slicing at the microcode level, instead of the instruction level, and obtains a more precise microcode slice. To reconstitute a machine-code program from a microcode slice, our algorithm uses machine-code synthesis. Our experiments on IA-32 binaries of FreeBSD utilities show that, in comparison to slices computed by a state-of-the-art tool, our algorithm reduces the size of backward slices by 33%, and forward slices by 70%.