Visible to the public Biblio

Filters: Author is Zhao, Boyan  [Clear All Filters]
2022-03-01
Wang, Xingbin, Zhao, Boyan, HOU, RUI, Awad, Amro, Tian, Zhihong, Meng, Dan.  2021.  NASGuard: A Novel Accelerator Architecture for Robust Neural Architecture Search (NAS) Networks. 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). :776–789.
Due to the wide deployment of deep learning applications in safety-critical systems, robust and secure execution of deep learning workloads is imperative. Adversarial examples, where the inputs are carefully designed to mislead the machine learning model is among the most challenging attacks to detect and defeat. The most dominant approach for defending against adversarial examples is to systematically create a network architecture that is sufficiently robust. Neural Architecture Search (NAS) has been heavily used as the de facto approach to design robust neural network models, by using the accuracy of detecting adversarial examples as a key metric of the neural network's robustness. While NAS has been proven effective in improving the robustness (and accuracy in general), the NAS-generated network models run noticeably slower on typical DNN accelerators than the hand-crafted networks, mainly because DNN accelerators are not optimized for robust NAS-generated models. In particular, the inherent multi-branch nature of NAS-generated networks causes unacceptable performance and energy overheads.To bridge the gap between the robustness and performance efficiency of deep learning applications, we need to rethink the design of AI accelerators to enable efficient execution of robust (auto-generated) neural networks. In this paper, we propose a novel hardware architecture, NASGuard, which enables efficient inference of robust NAS networks. NASGuard leverages a heuristic multi-branch mapping model to improve the efficiency of the underlying computing resources. Moreover, NASGuard addresses the load imbalance problem between the computation and memory-access tasks from multi-branch parallel computing. Finally, we propose a topology-aware performance prediction model for data prefetching, to fully exploit the temporal and spatial localities of robust NAS-generated architectures. We have implemented NASGuard with Verilog RTL. The evaluation results show that NASGuard achieves an average speedup of 1.74× over the baseline DNN accelerator.
2021-05-05
Zhu, Jianping, HOU, RUI, Wang, XiaoFeng, Wang, Wenhao, Cao, Jiangfeng, Zhao, Boyan, Wang, Zhongpu, Zhang, Yuhui, Ying, Jiameng, Zhang, Lixin et al..  2020.  Enabling Rack-scale Confidential Computing using Heterogeneous Trusted Execution Environment. 2020 IEEE Symposium on Security and Privacy (SP). :1450—1465.

With its huge real-world demands, large-scale confidential computing still cannot be supported by today's Trusted Execution Environment (TEE), due to the lack of scalable and effective protection of high-throughput accelerators like GPUs, FPGAs, and TPUs etc. Although attempts have been made recently to extend the CPU-like enclave to GPUs, these solutions require change to the CPU or GPU chips, may introduce new security risks due to the side-channel leaks in CPU-GPU communication and are still under the resource constraint of today's CPU TEE.To address these problems, we present the first Heterogeneous TEE design that can truly support large-scale compute or data intensive (CDI) computing, without any chip-level change. Our approach, called HETEE, is a device for centralized management of all computing units (e.g., GPUs and other accelerators) of a server rack. It is uniquely designed to work with today's data centres and clouds, leveraging modern resource pooling technologies to dynamically compartmentalize computing tasks, and enforce strong isolation and reduce TCB through hardware support. More specifically, HETEE utilizes the PCIe ExpressFabric to allocate its accelerators to the server node on the same rack for a non-sensitive CDI task, and move them back into a secure enclave in response to the demand for confidential computing. Our design runs a thin TCB stack for security management on a security controller (SC), while leaving a large set of software (e.g., AI runtime, GPU driver, etc.) to the integrated microservers that operate enclaves. An enclaves is physically isolated from others through hardware and verified by the SC at its inception. Its microserver and computing units are restored to a secure state upon termination.We implemented HETEE on a real hardware system, and evaluated it with popular neural network inference and training tasks. Our evaluations show that HETEE can easily support the CDI tasks on the real-world scale and incurred a maximal throughput overhead of 2.17% for inference and 0.95% for training on ResNet152.