Visible to the public Biblio

Filters: Author is Xu, Fengyuan  [Clear All Filters]
2023-04-28
Chen, Ligeng, He, Zhongling, Wu, Hao, Xu, Fengyuan, Qian, Yi, Mao, Bing.  2022.  DIComP: Lightweight Data-Driven Inference of Binary Compiler Provenance with High Accuracy. 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). :112–122.
Binary analysis is pervasively utilized to assess software security and test vulnerabilities without accessing source codes. The analysis validity is heavily influenced by the inferring ability of information related to the code compilation. Among the compilation information, compiler type and optimization level, as the key factors determining how binaries look like, are still difficult to be inferred efficiently with existing tools. In this paper, we conduct a thorough empirical study on the binary's appearance under various compilation settings and propose a lightweight binary analysis tool based on the simplest machine learning method, called DIComP to infer the compiler and optimization level via most relevant features according to the observation. Our comprehensive evaluations demonstrate that DIComP can fully recognize the compiler provenance, and it is effective in inferring the optimization levels with up to 90% accuracy. Also, it is efficient to infer thousands of binaries at a millisecond level with our lightweight machine learning model (1MB).
2019-01-21
Tang, Yutao, Li, Ding, Li, Zhichun, Zhang, Mu, Jee, Kangkook, Xiao, Xusheng, Wu, Zhenyu, Rhee, Junghwan, Xu, Fengyuan, Li, Qun.  2018.  NodeMerge: Template Based Efficient Data Reduction For Big-Data Causality Analysis. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :1324–1337.
Today's enterprises are exposed to sophisticated attacks, such as Advanced Persistent Threats\textbackslashtextasciitilde(APT) attacks, which usually consist of stealthy multiple steps. To counter these attacks, enterprises often rely on causality analysis on the system activity data collected from a ubiquitous system monitoring to discover the initial penetration point, and from there identify previously unknown attack steps. However, one major challenge for causality analysis is that the ubiquitous system monitoring generates a colossal amount of data and hosting such a huge amount of data is prohibitively expensive. Thus, there is a strong demand for techniques that reduce the storage of data for causality analysis and yet preserve the quality of the causality analysis. To address this problem, in this paper, we propose NodeMerge, a template based data reduction system for online system event storage. Specifically, our approach can directly work on the stream of system dependency data and achieve data reduction on the read-only file events based on their access patterns. It can either reduce the storage cost or improve the performance of causality analysis under the same budget. Only with a reasonable amount of resource for online data reduction, it nearly completely preserves the accuracy for causality analysis. The reduced form of data can be used directly with little overhead. To evaluate our approach, we conducted a set of comprehensive evaluations, which show that for different categories of workloads, our system can reduce the storage capacity of raw system dependency data by as high as 75.7 times, and the storage capacity of the state-of-the-art approach by as high as 32.6 times. Furthermore, the results also demonstrate that our approach keeps all the causality analysis information and has a reasonably small overhead in memory and hard disk.
2017-05-30
Xu, Zhang, Wu, Zhenyu, Li, Zhichun, Jee, Kangkook, Rhee, Junghwan, Xiao, Xusheng, Xu, Fengyuan, Wang, Haining, Jiang, Guofei.  2016.  High Fidelity Data Reduction for Big Data Security Dependency Analyses. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :504–516.

Intrusive multi-step attacks, such as Advanced Persistent Threat (APT) attacks, have plagued enterprises with significant financial losses and are the top reason for enterprises to increase their security budgets. Since these attacks are sophisticated and stealthy, they can remain undetected for years if individual steps are buried in background "noise." Thus, enterprises are seeking solutions to "connect the suspicious dots" across multiple activities. This requires ubiquitous system auditing for long periods of time, which in turn causes overwhelmingly large amount of system audit events. Given a limited system budget, how to efficiently handle ever-increasing system audit logs is a great challenge. This paper proposes a new approach that exploits the dependency among system events to reduce the number of log entries while still supporting high-quality forensic analysis. In particular, we first propose an aggregation algorithm that preserves the dependency of events during data reduction to ensure the high quality of forensic analysis. Then we propose an aggressive reduction algorithm and exploit domain knowledge for further data reduction. To validate the efficacy of our proposed approach, we conduct a comprehensive evaluation on real-world auditing systems using log traces of more than one month. Our evaluation results demonstrate that our approach can significantly reduce the size of system logs and improve the efficiency of forensic analysis without losing accuracy.