Visible to the public Biblio

Filters: Author is Meng, Dan  [Clear All Filters]
2023-01-13
Zhao, Lutan, Li, Peinan, HOU, RUI, Huang, Michael C., Qian, Xuehai, Zhang, Lixin, Meng, Dan.  2022.  HyBP: Hybrid Isolation-Randomization Secure Branch Predictor. 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). :346—359.
Recently exposed vulnerabilities reveal the necessity to improve the security of branch predictors. Branch predictors record history about the execution of different processes, and such information from different processes are stored in the same structure and thus accessible to each other. This leaves the attackers with the opportunities for malicious training and malicious perception. Physical or logical isolation mechanisms such as using dedicated tables and flushing during context-switch can provide security but incur non-trivial costs in space and/or execution time. Randomization mechanisms incurs the performance cost in a different way: those with higher securities add latency to the critical path of the pipeline, while the simpler alternatives leave vulnerabilities to more sophisticated attacks.This paper proposes HyBP, a practical hybrid protection and effective mechanism for building secure branch predictors. The design applies the physical isolation and randomization in the right component to achieve the best of both worlds. We propose to protect the smaller tables with physically isolation based on (thread, privilege) combination; and protect the large tables with randomization. Surprisingly, the physical isolation also significantly enhances the security of the last-level tables by naturally filtering out accesses, reducing the information flow to these bigger tables. As a result, key changes can happen less frequently and be performed conveniently at context switches. Moreover, we propose a latency hiding design for a strong cipher by precomputing the "code book" with a validated, cryptographically strong cipher. Overall, our design incurs a performance penalty of 0.5% compared to 5.1% of physical isolation under the default context switching interval in Linux.
2022-03-01
Wang, Xingbin, Zhao, Boyan, HOU, RUI, Awad, Amro, Tian, Zhihong, Meng, Dan.  2021.  NASGuard: A Novel Accelerator Architecture for Robust Neural Architecture Search (NAS) Networks. 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). :776–789.
Due to the wide deployment of deep learning applications in safety-critical systems, robust and secure execution of deep learning workloads is imperative. Adversarial examples, where the inputs are carefully designed to mislead the machine learning model is among the most challenging attacks to detect and defeat. The most dominant approach for defending against adversarial examples is to systematically create a network architecture that is sufficiently robust. Neural Architecture Search (NAS) has been heavily used as the de facto approach to design robust neural network models, by using the accuracy of detecting adversarial examples as a key metric of the neural network's robustness. While NAS has been proven effective in improving the robustness (and accuracy in general), the NAS-generated network models run noticeably slower on typical DNN accelerators than the hand-crafted networks, mainly because DNN accelerators are not optimized for robust NAS-generated models. In particular, the inherent multi-branch nature of NAS-generated networks causes unacceptable performance and energy overheads.To bridge the gap between the robustness and performance efficiency of deep learning applications, we need to rethink the design of AI accelerators to enable efficient execution of robust (auto-generated) neural networks. In this paper, we propose a novel hardware architecture, NASGuard, which enables efficient inference of robust NAS networks. NASGuard leverages a heuristic multi-branch mapping model to improve the efficiency of the underlying computing resources. Moreover, NASGuard addresses the load imbalance problem between the computation and memory-access tasks from multi-branch parallel computing. Finally, we propose a topology-aware performance prediction model for data prefetching, to fully exploit the temporal and spatial localities of robust NAS-generated architectures. We have implemented NASGuard with Verilog RTL. The evaluation results show that NASGuard achieves an average speedup of 1.74× over the baseline DNN accelerator.
2021-10-04
Wang, Kai, Yuan, Fengkai, HOU, RUI, Ji, Zhenzhou, Meng, Dan.  2020.  Capturing and Obscuring Ping-Pong Patterns to Mitigate Continuous Attacks. 2020 Design, Automation Test in Europe Conference Exhibition (DATE). :1408–1413.
In this paper, we observed Continuous Attacks are one kind of common side channel attack scenarios, where an adversary frequently probes the same target cache lines in a short time. Continuous Attacks cause target cache lines to go through multiple load-evict processes, exhibiting Ping-Pong Patterns. Identifying and obscuring Ping-Pong Patterns effectively interferes with the attacker's probe and mitigates Continuous Attacks. Based on the observations, this paper proposes Ping-Pong Regulator to identify multiple Ping-Pong Patterns and block them with different strategies (Preload or Lock). The Preload proactively loads target lines into the cache, causing the attacker to mistakenly infer that the victim has accessed these lines; the Lock fixes the attacked lines' directory entries on the last level cache directory until they are evicted out of caches, making an attacker's observation of the locked lines is always the L2 cache miss. The experimental evaluation demonstrates that the Ping-Pong Regulator efficiently identifies and secures attacked lines, induces negligible performance impacts and storage overhead, and does not require any software support.
2021-05-05
Zhu, Jianping, HOU, RUI, Wang, XiaoFeng, Wang, Wenhao, Cao, Jiangfeng, Zhao, Boyan, Wang, Zhongpu, Zhang, Yuhui, Ying, Jiameng, Zhang, Lixin et al..  2020.  Enabling Rack-scale Confidential Computing using Heterogeneous Trusted Execution Environment. 2020 IEEE Symposium on Security and Privacy (SP). :1450—1465.

With its huge real-world demands, large-scale confidential computing still cannot be supported by today's Trusted Execution Environment (TEE), due to the lack of scalable and effective protection of high-throughput accelerators like GPUs, FPGAs, and TPUs etc. Although attempts have been made recently to extend the CPU-like enclave to GPUs, these solutions require change to the CPU or GPU chips, may introduce new security risks due to the side-channel leaks in CPU-GPU communication and are still under the resource constraint of today's CPU TEE.To address these problems, we present the first Heterogeneous TEE design that can truly support large-scale compute or data intensive (CDI) computing, without any chip-level change. Our approach, called HETEE, is a device for centralized management of all computing units (e.g., GPUs and other accelerators) of a server rack. It is uniquely designed to work with today's data centres and clouds, leveraging modern resource pooling technologies to dynamically compartmentalize computing tasks, and enforce strong isolation and reduce TCB through hardware support. More specifically, HETEE utilizes the PCIe ExpressFabric to allocate its accelerators to the server node on the same rack for a non-sensitive CDI task, and move them back into a secure enclave in response to the demand for confidential computing. Our design runs a thin TCB stack for security management on a security controller (SC), while leaving a large set of software (e.g., AI runtime, GPU driver, etc.) to the integrated microservers that operate enclaves. An enclaves is physically isolated from others through hardware and verified by the SC at its inception. Its microserver and computing units are restored to a secure state upon termination.We implemented HETEE on a real hardware system, and evaluated it with popular neural network inference and training tasks. Our evaluations show that HETEE can easily support the CDI tasks on the real-world scale and incurred a maximal throughput overhead of 2.17% for inference and 0.95% for training on ResNet152.

2019-05-08
Zhang, Dongxue, Zheng, Yang, Wen, Yu, Xu, Yujue, Wang, Jingchuo, Yu, Yang, Meng, Dan.  2018.  Role-based Log Analysis Applying Deep Learning for Insider Threat Detection. Proceedings of the 1st Workshop on Security-Oriented Designs of Computer Architectures and Processors. :18–20.
Insider threats have shown their great destructive power in information security and financial stability and have received widespread attention from governments and organizations. Traditional intrusion detection systems fail to be effective in insider attacks due to the lack of extensive knowledge for insider behavior patterns. Instead, a more sophisticated method is required to have a deeper understanding for activities that insiders communicate with the information system. In this paper, we design a classifier, a neural network model utilizing Long Short Term Memory (LSTM) to model user log as a natural language sequence and achieve role-based classification. LSTM Model can learn behavior patterns of different users by automatically extracting feature and detect anomalies when log patterns deviate from the trained model. To illustrate the effective of classification model, we design two experiments based on cmu dataset. Experimental evaluations have shown that our model can successfully distinguish different behavior pattern and detect malicious behavior.
2019-02-08
Yang, Chun, Wen, Yu, Guo, Jianbin, Song, Haitao, Li, Linfeng, Che, Haoyang, Meng, Dan.  2018.  A Convolutional Neural Network Based Classifier for Uncompressed Malware Samples. Proceedings of the 1st Workshop on Security-Oriented Designs of Computer Architectures and Processors. :15-17.

This paper proposes a deep learning based method for efficient malware classification. Specially, we convert the malware classification problem into the image classification problem, which can be addressed through leveraging convolutional neural networks (CNNs). For many malware families, the images belonging to the same family have similar contours and textures, so we convert the Binary files of malware samples to uncompressed gray-scale images which possess complete information of the original malware without artificial feature extraction. We then design classifier based on Tensorflow framework of Google by combining the deep learning (DL) and malware detection technology. Experimental results show that the uncompressed gray-scale images of the malware are relatively easy to distinguish and the CNN based classifier can achieve a high success rate of 98.2%