Biblio | CPS-VO

Liu, Weijie, Wang, Wenhao, Chen, Hongbo, Wang, XiaoFeng, Lu, Yaosong, Chen, Kai, Wang, Xinyu, Shen, Qintao, Chen, Yi, Tang, Haixu. 2021. Practical and Efficient In-Enclave Verification of Privacy Compliance. 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :413–425.

A trusted execution environment (TEE) such as Intel Software Guard Extension (SGX) runs attestation to prove to a data owner the integrity of the initial state of an enclave, including the program to operate on her data. For this purpose, the data-processing program is supposed to be open to the owner or a trusted third party, so its functionality can be evaluated before trust being established. In the real world, however, increasingly there are application scenarios in which the program itself needs to be protected (e.g., proprietary algorithm). So its compliance with privacy policies as expected by the data owner should be verified without exposing its code.To this end, this paper presents DEFLECTION, a new model for TEE-based delegated and flexible in-enclave code verification. Given that the conventional solutions do not work well under the resource-limited and TCB-frugal TEE, we come up with a new design inspired by Proof-Carrying Code. Our design strategically moves most of the workload to the code generator, which is responsible for producing easy-to-check code, while keeping the consumer simple. Also, the whole consumer can be made public and verified through a conventional attestation. We implemented this model on Intel SGX and demonstrate that it introduces a very small part of TCB. We also thoroughly evaluated its performance on micro-and macro-benchmarks and real-world applications, showing that the design only incurs a small overhead when enforcing several categories of security policies.

Zhu, Jianping, HOU, RUI, Wang, XiaoFeng, Wang, Wenhao, Cao, Jiangfeng, Zhao, Boyan, Wang, Zhongpu, Zhang, Yuhui, Ying, Jiameng, Zhang, Lixin et al.. 2020. Enabling Rack-scale Confidential Computing using Heterogeneous Trusted Execution Environment. 2020 IEEE Symposium on Security and Privacy (SP). :1450—1465.

With its huge real-world demands, large-scale confidential computing still cannot be supported by today's Trusted Execution Environment (TEE), due to the lack of scalable and effective protection of high-throughput accelerators like GPUs, FPGAs, and TPUs etc. Although attempts have been made recently to extend the CPU-like enclave to GPUs, these solutions require change to the CPU or GPU chips, may introduce new security risks due to the side-channel leaks in CPU-GPU communication and are still under the resource constraint of today's CPU TEE.To address these problems, we present the first Heterogeneous TEE design that can truly support large-scale compute or data intensive (CDI) computing, without any chip-level change. Our approach, called HETEE, is a device for centralized management of all computing units (e.g., GPUs and other accelerators) of a server rack. It is uniquely designed to work with today's data centres and clouds, leveraging modern resource pooling technologies to dynamically compartmentalize computing tasks, and enforce strong isolation and reduce TCB through hardware support. More specifically, HETEE utilizes the PCIe ExpressFabric to allocate its accelerators to the server node on the same rack for a non-sensitive CDI task, and move them back into a secure enclave in response to the demand for confidential computing. Our design runs a thin TCB stack for security management on a security controller (SC), while leaving a large set of software (e.g., AI runtime, GPU driver, etc.) to the integrated microservers that operate enclaves. An enclaves is physically isolated from others through hardware and verified by the SC at its inception. Its microserver and computing units are restored to a secure state upon termination.We implemented HETEE on a real hardware system, and evaluated it with popular neural network inference and training tasks. Our evaluations show that HETEE can easily support the CDI tasks on the real-world scale and incurred a maximal throughput overhead of 2.17% for inference and 0.95% for training on ResNet152.

Mi, Xianghang, Feng, Xuan, Liao, Xiaojing, Liu, Baojun, Wang, XiaoFeng, Qian, Feng, Li, Zhou, Alrwais, Sumayah, Sun, Limin, Liu, Ying. 2019. Resident Evil: Understanding Residential IP Proxy as a Dark Service. 2019 IEEE Symposium on Security and Privacy (SP). :1185—1201.

An emerging Internet business is residential proxy (RESIP) as a service, in which a provider utilizes the hosts within residential networks (in contrast to those running in a datacenter) to relay their customers' traffic, in an attempt to avoid server- side blocking and detection. With the prominent roles the services could play in the underground business world, little has been done to understand whether they are indeed involved in Cybercrimes and how they operate, due to the challenges in identifying their RESIPs, not to mention any in-depth analysis on them. In this paper, we report the first study on RESIPs, which sheds light on the behaviors and the ecosystem of these elusive gray services. Our research employed an infiltration framework, including our clients for RESIP services and the servers they visited, to detect 6 million RESIP IPs across 230+ countries and 52K+ ISPs. The observed addresses were analyzed and the hosts behind them were further fingerprinted using a new profiling system. Our effort led to several surprising findings about the RESIP services unknown before. Surprisingly, despite the providers' claim that the proxy hosts are willingly joined, many proxies run on likely compromised hosts including IoT devices. Through cross-matching the hosts we discovered and labeled PUP (potentially unwanted programs) logs provided by a leading IT company, we uncovered various illicit operations RESIP hosts performed, including illegal promotion, Fast fluxing, phishing, malware hosting, and others. We also reverse engi- neered RESIP services' internal infrastructures, uncovered their potential rebranding and reselling behaviors. Our research takes the first step toward understanding this new Internet service, contributing to the effective control of their security risks.

Chen, Yi, You, Wei, Lee, Yeonjoon, Chen, Kai, Wang, XiaoFeng, Zou, Wei. 2017. Mass Discovery of Android Traffic Imprints Through Instantiated Partial Execution. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. :815–828.

Monitoring network behaviors of mobile applications, controlling their resource access and detecting potentially harmful apps are becoming increasingly important for the security protection within today's organizational, ISP and carriers. For this purpose, apps need to be identified from their communication, based upon their individual traffic signatures (called imprints in our research). Creating imprints for a large number of apps is nontrivial, due to the challenges in comprehensively analyzing their network activities at a large scale, for millions of apps on today's rapidly-growing app marketplaces. Prior research relies on automatic exploration of an app's user interfaces (UIs) to trigger its network activities, which is less likely to scale given the cost of the operation (at least 5 minutes per app) and its effectiveness (limited coverage of an app's behaviors). In this paper, we present Tiger (Traffic Imprint Generator), a novel technique that makes comprehensive app imprint generation possible in a massive scale. At the center of Tiger is a unique instantiated slicing technique, which aggressively prunes the program slice extracted from the app's network-related code by evaluating each variable's impact on possible network invariants, and removing those unlikely to contribute through assigning them concrete values. In this way, Tiger avoids exploring a large number of program paths unrelated to the app's identifiable traffic, thereby reducing the cost of the code analysis by more than one order of magnitude, in comparison with the conventional slicing and execution approach. Our experiments show that Tiger is capable of recovering an app's full network activities within 18 seconds, achieving over 98% coverage of its identifiable packets and 0.742% false detection rate on app identification. Further running the technique on over 200,000 real-world Android apps (including 78.23% potentially harmful apps) leads to the discovery of surprising new types of traffic invariants, including fake device information, hardcoded time values, session IDs and credentials, as well as complicated trigger conditions for an app's network activities, such as human involvement, Intent trigger and server-side instructions. Our findings demonstrate that many network activities cannot easily be invoked through automatic UI exploration and code-analysis based approaches present a promising alternative.

Wang, Wenhao, Chen, Guoxing, Pan, Xiaorui, Zhang, Yinqian, Wang, XiaoFeng, Bindschaedler, Vincent, Tang, Haixu, Gunter, Carl A.. 2017. Leaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGX. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. :2421–2434.

Side-channel risks of Intel SGX have recently attracted great attention. Under the spotlight is the newly discovered page-fault attack, in which an OS-level adversary induces page faults to observe the page-level access patterns of a protected process running in an SGX enclave. With almost all proposed defense focusing on this attack, little is known about whether such efforts indeed raise the bar for the adversary, whether a simple variation of the attack renders all protection ineffective, not to mention an in-depth understanding of other attack surfaces in the SGX system. In the paper, we report the first step toward systematic analyses of side-channel threats that SGX faces, focusing on the risks associated with its memory management. Our research identifies 8 potential attack vectors, ranging from TLB to DRAM modules. More importantly, we highlight the common misunderstandings about SGX memory side channels, demonstrating that high frequent AEXs can be avoided when recovering EdDSA secret key through a new page channel and fine-grained monitoring of enclave programs (at the level of 64B) can be done through combining both cache and cross-enclave DRAM channels. Our findings reveal the gap between the ongoing security research on SGX and its side-channel weaknesses, redefine the side-channel threat model for secure enclaves, and can provoke a discussion on when to use such a system and how to use it securely.

You, Wei, Zong, Peiyuan, Chen, Kai, Wang, XiaoFeng, Liao, Xiaojing, Bian, Pan, Liang, Bin. 2017. SemFuzz: Semantics-Based Automatic Generation of Proof-of-Concept Exploits. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. :2139–2154.

Patches and related information about software vulnerabilities are often made available to the public, aiming to facilitate timely fixes. Unfortunately, the slow paces of system updates (30 days on average) often present to the attackers enough time to recover hidden bugs for attacking the unpatched systems. Making things worse is the potential to automatically generate exploits on input-validation flaws through reverse-engineering patches, even though such vulnerabilities are relatively rare (e.g., 5% among all Linux kernel vulnerabilities in last few years). Less understood, however, are the implications of other bug-related information (e.g., bug descriptions in CVE), particularly whether utilization of such information can facilitate exploit generation, even on other vulnerability types that have never been automatically attacked. In this paper, we seek to use such information to generate proof-of-concept (PoC) exploits for the vulnerability types never automatically attacked. Unlike an input validation flaw that is often patched by adding missing sanitization checks, fixing other vulnerability types is more complicated, usually involving replacement of the whole chunk of code. Without understanding of the code changed, automatic exploit becomes less likely. To address this challenge, we present SemFuzz, a novel technique leveraging vulnerability-related text (e.g., CVE reports and Linux git logs) to guide automatic generation of PoC exploits. Such an end-to-end approach is made possible by natural-language processing (NLP) based information extraction and a semantics-based fuzzing process guided by such information. Running over 112 Linux kernel flaws reported in the past five years, SemFuzz successfully triggered 18 of them, and further discovered one zero-day and one undisclosed vulnerabilities. These flaws include use-after-free, memory corruption, information leak, etc., indicating that more complicated flaws can also be automatically attacked. This finding calls into question the way vulnerability-related information is shared today.

Demetriou, Soteris, Zhang, Nan, Lee, Yeonjoon, Wang, XiaoFeng, Gunter, Carl A., Zhou, Xiaoyong, Grace, Michael. 2017. HanGuard: SDN-driven Protection of Smart Home WiFi Devices from Malicious Mobile Apps. Proceedings of the 10th ACM Conference on Security and Privacy in Wireless and Mobile Networks. :122–133.

A new development of smart-home systems is to use mobile apps to control IoT devices across a Home Area Network (HAN). As verified in our study, those systems tend to rely on the Wi-Fi router to authenticate other devices. This treatment exposes them to the attack from malicious apps, particularly those running on authorized phones, which the router does not have information to control. Mitigating this threat cannot solely rely on IoT manufacturers, which may need to change the hardware on the devices to support encryption, increasing the cost of the device, or software developers who we need to trust to implement security correctly. In this work, we present a new technique to control the communication between the IoT devices and their apps in a unified, backward-compatible way. Our approach, called HanGuard, does not require any changes to the IoT devices themselves, the IoT apps or the OS of the participating phones. HanGuard uses an SDN-like approach to offer fine-grained protection: each phone runs a non-system userspace Monitor app to identify the party that attempts to access the protected IoT device and inform the router through a control plane of its access decision; the router enforces the decision on the data plane after verifying whether the phone should be allowed to talk to the device. We implemented our design over both Android and iOS (\textbackslashtextgreater 95% of mobile OS market share) and a popular router. Our study shows that HanGuard is both efficient and effective in practice.

Wang, Shuai, Wang, Wenhao, Bao, Qinkun, Wang, Pei, Wang, XiaoFeng, Wu, Dinghao. 2017. Binary Code Retrofitting and Hardening Using SGX. Proceedings of the 2017 Workshop on Forming an Ecosystem Around Software Transformation. :43–49.

Trusted Execution Environment (TEE) is designed to deliver a safe execution environment for software systems. Intel Software Guard Extensions (SGX) provides isolated memory regions (i.e., SGX enclaves) to protect code and data from adversaries in the untrusted world. While existing research has proposed techniques to execute entire executable files inside enclave instances by providing rich sets of OS facilities, one notable limitation of these techniques is the unavoidably large size of Trusted Computing Base (TCB), which can potentially break the principle of least privilege. In this work, we describe techniques that provide practical and efficient protection of security sensitive code components in legacy binary code. Our technique dissects input binaries into multiple components which are further built into SGX enclave instances. We also leverage deliberately-designed binary editing techniques to retrofit the input binary code and preserve the original program semantics. Our tentative evaluations on hardening AES encryption and decryption procedures demonstrate the practicability and efficiency of the proposed technique.

Liao, Xiaojing, Alrwais, Sumayah, Yuan, Kan, Xing, Luyi, Wang, XiaoFeng, Hao, Shuang, Beyah, Raheem. 2016. Lurking Malice in the Cloud: Understanding and Detecting Cloud Repository As a Malicious Service. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :1541–1552.

The popularity of cloud hosting services also brings in new security challenges: it has been reported that these services are increasingly utilized by miscreants for their malicious online activities. Mitigating this emerging threat, posed by such "bad repositories" (simply Bar), is challenging due to the different hosting strategy to traditional hosting service, the lack of direct observations of the repositories by those outside the cloud, the reluctance of the cloud provider to scan its customers' repositories without their consent, and the unique evasion strategies employed by the adversary. In this paper, we took the first step toward understanding and detecting this emerging threat. Using a small set of "seeds" (i.e., confirmed Bars), we identified a set of collective features from the websites they serve (e.g., attempts to hide Bars), which uniquely characterize the Bars. These features were utilized to build a scanner that detected over 600 Bars on leading cloud platforms like Amazon, Google, and 150K sites, including popular ones like groupon.com, using them. Highlights of our study include the pivotal roles played by these repositories on malicious infrastructures and other important discoveries include how the adversary exploited legitimate cloud repositories and why the adversary uses Bars in the first place that has never been reported. These findings bring such malicious services to the spotlight and contribute to a better understanding and ultimately eliminating this new threat.

Liao, Xiaojing, Yuan, Kan, Wang, XiaoFeng, Li, Zhou, Xing, Luyi, Beyah, Raheem. 2016. Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :755–766.

To adapt to the rapidly evolving landscape of cyber threats, security professionals are actively exchanging Indicators of Compromise (IOC) (e.g., malware signatures, botnet IPs) through public sources (e.g. blogs, forums, tweets, etc.). Such information, often presented in articles, posts, white papers etc., can be converted into a machine-readable OpenIOC format for automatic analysis and quick deployment to various security mechanisms like an intrusion detection system. With hundreds of thousands of sources in the wild, the IOC data are produced at a high volume and velocity today, which becomes increasingly hard to manage by humans. Efforts to automatically gather such information from unstructured text, however, is impeded by the limitations of today's Natural Language Processing (NLP) techniques, which cannot meet the high standard (in terms of accuracy and coverage) expected from the IOCs that could serve as direct input to a defense system. In this paper, we present iACE, an innovation solution for fully automated IOC extraction. Our approach is based upon the observation that the IOCs in technical articles are often described in a predictable way: being connected to a set of context terms (e.g., "download") through stable grammatical relations. Leveraging this observation, iACE is designed to automatically locate a putative IOC token (e.g., a zip file) and its context (e.g., "malware", "download") within the sentences in a technical article, and further analyze their relations through a novel application of graph mining techniques. Once the grammatical connection between the tokens is found to be in line with the way that the IOC is commonly presented, these tokens are extracted to generate an OpenIOC item that describes not only the indicator (e.g., a malicious zip file) but also its context (e.g., download from an external source). Running on 71,000 articles collected from 45 leading technical blogs, this new approach demonstrates a remarkable performance: it generated 900K OpenIOC items with a precision of 95% and a coverage over 90%, which is way beyond what the state-of-the-art NLP technique and industry IOC tool can achieve, at a speed of thousands of articles per hour. Further, by correlating the IOCs mined from the articles published over a 13-year span, our study sheds new light on the links across hundreds of seemingly unrelated attack instances, particularly their shared infrastructure resources, as well as the impacts of such open-source threat intelligence on security protection and evolution of attack strategies.

Liao, Xiaojing, Yuan, Kan, Wang, XiaoFeng, Li, Zhou, Xing, Luyi, Beyah, Raheem. 2016. Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :755–766.

To adapt to the rapidly evolving landscape of cyber threats, security professionals are actively exchanging Indicators of Compromise (IOC) (e.g., malware signatures, botnet IPs) through public sources (e.g. blogs, forums, tweets, etc.). Such information, often presented in articles, posts, white papers etc., can be converted into a machine-readable OpenIOC format for automatic analysis and quick deployment to various security mechanisms like an intrusion detection system. With hundreds of thousands of sources in the wild, the IOC data are produced at a high volume and velocity today, which becomes increasingly hard to manage by humans. Efforts to automatically gather such information from unstructured text, however, is impeded by the limitations of today's Natural Language Processing (NLP) techniques, which cannot meet the high standard (in terms of accuracy and coverage) expected from the IOCs that could serve as direct input to a defense system. In this paper, we present iACE, an innovation solution for fully automated IOC extraction. Our approach is based upon the observation that the IOCs in technical articles are often described in a predictable way: being connected to a set of context terms (e.g., "download") through stable grammatical relations. Leveraging this observation, iACE is designed to automatically locate a putative IOC token (e.g., a zip file) and its context (e.g., "malware", "download") within the sentences in a technical article, and further analyze their relations through a novel application of graph mining techniques. Once the grammatical connection between the tokens is found to be in line with the way that the IOC is commonly presented, these tokens are extracted to generate an OpenIOC item that describes not only the indicator (e.g., a malicious zip file) but also its context (e.g., download from an external source). Running on 71,000 articles collected from 45 leading technical blogs, this new approach demonstrates a remarkable performance: it generated 900K OpenIOC items with a precision of 95% and a coverage over 90%, which is way beyond what the state-of-the-art NLP technique and industry IOC tool can achieve, at a speed of thousands of articles per hour. Further, by correlating the IOCs mined from the articles published over a 13-year span, our study sheds new light on the links across hundreds of seemingly unrelated attack instances, particularly their shared infrastructure resources, as well as the impacts of such open-source threat intelligence on security protection and evolution of attack strategies.

Alrwais, Sumayah, Yuan, Kan, Alowaisheq, Eihal, Liao, Xiaojing, Oprea, Alina, Wang, XiaoFeng, Li, Zhou. 2016. Catching Predators at Watering Holes: Finding and Understanding Strategically Compromised Websites. Proceedings of the 32Nd Annual Conference on Computer Security Applications. :153–166.

Unlike a random, run-of-the-mill website infection, in a strategic web attack, the adversary carefully chooses the target frequently visited by an organization or a group of individuals to compromise, for the purpose of gaining a step closer to the organization or collecting information from the group. This type of attacks, called "watering hole", have been increasingly utilized by APT actors to get into the internal networks of big companies and government agencies or monitor politically oriented groups. With its importance, little has been done so far to understand how the attack works, not to mention any concrete step to counter this threat. In this paper, we report our first step toward better understanding this emerging threat, through systematically discovering and analyzing new watering hole instances and attack campaigns. This was made possible by a carefully designed methodology, which repeatedly monitors a large number potential watering hole targets to detect unusual changes that could be indicative of strategic compromises. Running this system on the HTTP traffic generated from visits to 61K websites for over 5 years, we are able to discover and confirm 17 watering holes and 6 campaigns never reported before. Given so far there are merely 29 watering holes reported by blogs and technical reports, the findings we made contribute to the research on this attack vector, by adding 59% more attack instances and information about how they work to the public knowledge. Analyzing the new watering holes allows us to gain deeper understanding of these attacks, such as repeated compromises of political websites, their long lifetimes, unique evasion strategy (leveraging other compromised sites to serve attack payloads) and new exploit techniques (no malware delivery, web only information gathering). Also, our study brings to light interesting new observations, including the discovery of a recent JSONP attack on an NGO website that has been widely reported and apparently forced the attack to stop.