Biblio
Filters: Author is Li, Lian [Clear All Filters]
Scaling Up the IFDS Algorithm with Efficient Disk-Assisted Computing. 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). :236–247.
.
2021. The IFDS algorithm can be memory-intensive, requiring a memory budget of more than 100 GB of RAM for some applications. The large memory requirements significantly restrict the deployment of IFDS-based tools in practise. To improve this, we propose a disk-assisted solution that drastically reduces the memory requirements of traditional IFDS solvers. Our solution saves memory by 1) recomputing instead of memorizing intermediate analysis data, and 2) swapping in-memory data to disk when memory usages reach a threshold. We implement sophisticated scheduling schemes to swap data between memory and disks efficiently. We have developed a new taint analysis tool, DiskDroid, based on our disk-assisted IFDS solver. Compared to FlowDroid, a state-of-the-art IFDS-based taint analysis tool, for a set of 19 apps which take from 10 to 128 GB of RAM by FlowDroid, DiskDroid can analyze them with less than 10GB of RAM at a slight performance improvement of 8.6%. In addition, for 21 apps requiring more than 128GB of RAM by FlowDroid, DiskDroid can analyze each app in 3 hours, under the same memory budget of 10GB. This makes the tool deployable to normal desktop environments. We make the tool publicly available at https://github.com/HaofLi/DiskDroid.
Performance-Boosting Sparsification of the IFDS Algorithm with Applications to Taint Analysis. 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). :267–279.
.
2019. The IFDS algorithm can be compute-and memoryintensive for some large programs, often running for a long time (more than expected) or terminating prematurely after some time and/or memory budgets have been exhausted. In the latter case, the corresponding IFDS data-flow analyses may suffer from false negatives and/or false positives. To improve this, we introduce a sparse alternative to the traditional IFDS algorithm. Instead of propagating the data-flow facts across all the program points along the program’s (interprocedural) control flow graph, we propagate every data-flow fact directly to its next possible use points along its own sparse control flow graph constructed on the fly, thus reducing significantly both the time and memory requirements incurred by the traditional IFDS algorithm. In our evaluation, we compare FLOWDROID, a taint analysis performed by using the traditional IFDS algorithm, with our sparse incarnation, SPARSEDROID, on a set of 40 Android apps selected. For the time budget (5 hours) and memory budget (220GB) allocated per app, SPARSEDROID can run every app to completion but FLOWDROID terminates prematurely for 9 apps, resulting in an average speedup of 22.0x. This implies that when used as a market-level vetting tool, SPARSEDROID can finish analyzing these 40 apps in 2.13 hours (by issuing 228 leak warnings) while FLOWDROID manages to analyze only 30 apps in the same time period (by issuing only 147 leak warnings).
Simplex Based Vector Mapping for Categorical Attributes Clustering. Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems. :56–60.
.
2018. When clustering unlabeled data, categorical attributes are usually treated differently from numerical attributes because of their unique characteristics, which introduces difficulties in clustering data with both types of attributes. In this paper, we propose a strategy to map categorical attributes to high dimensional vectors based on the Simplex Theory, hence categorical attributes could be handled the same as numeral attributes. To achieve identical distances between any two values under Euclidean distance, we theoretically prove a categorical attribute with n types of values should be mapped to at least n–1 dimensional vectors. Furthermore, numerical vector mapping solutions are provided on condition of 0 normalized constraint. Experimentally, we show that integrating our vector mapping strategy with K-means algorithm achieves better accuracy than integrating similarities for categorical attributes with K-modes algorithm on four datasets.
Dynamic Symbolic Execution for Polymorphism. Proceedings of the 26th International Conference on Compiler Construction. :120–130.
.
2017. Symbolic execution is an important program analysis technique that provides auxiliary execution semantics to execute programs with symbolic rather than concrete values. There has been much recent interest in symbolic execution for automatic test case generation and security vulnerability detection, resulting in various tools being deployed in academia and industry. Nevertheless, (subtype or dynamic) polymorphism of object-oriented programs has been neglected: existing symbolic execution techniques can explore different targets of conditional branches but not different targets of method invocations. We address the problem of how this polymorphism can be expressed in a symbolic execution framework. We propose the notion of symbolic types, which make object types symbolic. With symbolic types,[ various targets of a method invocation can be explored systematically by mutating the type of the receiver object of the method during automatic test case generation. To the best of our knowledge, this is the first attempt to address polymorphism in symbolic execution. Mutation of method invocation targets is critical for effectively testing object-oriented programs, especially libraries. Our experimental results show that symbolic types are significantly more effective than existing symbolic execution techniques in achieving test coverage and finding bugs and security vulnerabilities in OpenJDK.