Visible to the public Biblio

Filters: Keyword is KU  [Clear All Filters]
2022-03-07
Vaidya, Ruturaj, Kulkarni, Prasad A., Jantz, Michael R..  2021.  Explore Capabilities and Effectiveness of Reverse Engineering Tools to Provide Memory Safety for Binary Programs. Information Security Practice and Experience. :11–31.
Any technique to ensure memory safety requires knowledge of (a) precise array bounds and (b) the data types accessed by memory load/store and pointer move instructions (called, owners) in the program. While this information can be effectively derived by compiler-level approaches much of this information may be lost during the compilation process and become unavailable to binary-level tools. In this work we conduct the first detailed study on how accurately can this information be extracted or reconstructed by current state-of-the-art static reverse engineering (RE) platforms for binaries compiled with and without debug symbol information. Furthermore, it is also unclear how the imprecision in array bounds and instruction owner information that is obtained by the RE tools impacts the ability of techniques to detect illegal memory accesses at run-time. We study this issue by designing, building, and deploying a novel binary-level technique to assess the properties and effectiveness of the information provided by the static RE algorithms in the first stage to guide the run-time instrumentation to detect illegal memory accesses in the decoupled second stage. Our work explores the limitations and challenges for static binary analysis tools to develop accurate binary-level techniques to detect memory errors.
2020-03-09
Waqar Ali, Heechul Yun.  2019.  RT-Gang: Real-Time Gang Scheduling Framework for Safety-Critical Systems. Real-Time and Embedded Technology and Applications Symposium (RTAS). :143-155.

In this paper, we present RT-Gang: a novel real-time gang scheduling framework that enforces a one-gang-at-a-time policy. We find that, in a multicore platform, co-scheduling multiple parallel real-time tasks would require highly pessimistic worst-case execution time (WCET) and schedulability analysis - even when there are enough cores - due to contention in shared hardware resources such as cache and DRAM controller. In RT-Gang, all threads of a parallel real-time task form a real-time gang and the scheduler globally enforces the one-gang-at-a-time scheduling policy to guarantee tight and accurate task WCET. To minimize under-utilization, we integrate a state-of-the-art memory bandwidth throttling framework to allow safe execution of best-effort tasks. Specifically, any idle cores, if exist, are used to schedule best-effort tasks but their maximum memory bandwidth usages are strictly throttled to tightly bound interference to real-time gang tasks. We implement RT-Gang in the Linux kernel and evaluate it on two representative embedded multicore platforms using both synthetic and real-world DNN workloads. The results show that RT-Gang dramatically improves system predictability and the overhead is negligible.

Farzad Farshchi, Qijing Huang, Heechul Yun.  2019.  Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim. Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications.

NVDLA is an open-source deep neural network (DNN) accelerator which has received a lot of attention by the community since its introduction by Nvidia. It is a full-featured hardware IP and can serve as a good reference for conducting research and development of SoCs with integrated accelerators. However, an expensive FPGA board is required to do experiments with this IP in a real SoC. Moreover, since NVDLA is clocked at a lower frequency on an FPGA, it would be hard to do accurate performance analysis with such a setup. To overcome these limitations, we integrate NVDLA into a real RISC-V SoC on the Amazon could FPGA using FireSim, a cycle-exact FPGA-accelerated simulator. We then evaluate the performance of NVDLA by running YOLOv3 object-detection algorithm. Our results show that NVDLA can sustain 7.5 fps when running YOLOv3. We further analyze the performance by showing that sharing the last-level cache with NVDLA can result in up to 1.56x speedup. We then identify that sharing the memory system with the accelerator can result in unpredictable execution time for the real-time tasks running on this platform. We believe this is an important issue that must be addressed in order for on-chip DNN accelerators to be incorporated in real-time embedded systems.

Michael Bechtel, Heechul Yun.  2019.  Denial-of-Service Attacks on Shared Cache in Multicore: Analysis and Prevention. Real-Time and Embedded Technology and Applications Symposium (RTAS). :357-367.

In this paper we investigate the feasibility of denial-of-service (DoS) attacks on shared caches in multicore platforms. With carefully engineered attacker tasks, we are able to cause more than 300X execution time increases on a victim task running on a dedicated core on a popular embedded multicore platform, regardless of whether we partition its shared cache or not. Based on careful experimentation on real and simulated multicore platforms, we identify an internal hardware structure of a non-blocking cache, namely the cache writeback buffer, as a potential target of shared cache DoS attacks. We propose an OS-level solution to prevent such DoS attacks by extending a state-of-the-art memory bandwidth regulation mechanism. We implement the proposed mechanism in Linux on a real multicore platform and show its effectiveness in protecting against cache DoS attacks.

Jacob Fustos, Farzad Farshchi, Heechul Yun.  2019.  SpectreGuard: An Efficient Data-Centric Defense Mechanism against Spectre Attacks. Proceedings of the 56th Annual Design Automation Conference 2019.

Speculative execution is an essential performance enhancing technique in modern processors, but it has been shown to be insecure. In this paper, we propose SpectreGuard, a novel defense mechanism against Spectre attacks. In our approach, sensitive memory blocks (e.g., secret keys) are marked using simple OS/library API, which are then selectively protected by hardware from Spectre attacks via low-cost micro-architecture extension. This technique allows microprocessors to maintain high performance, while restoring the control to software developers to make security and performance trade-offs.

2018-10-12
Heechul Yun, Michael Bechtel, Elise McEllhiney, Minje Kim.  2018.  DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car. IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). :11-21.

We present DeepPicar, a low-cost deep neural network based autonomous car platform. DeepPicar is a small scale replication of a real self-driving car called DAVE-2 by NVIDIA. DAVE-2 uses a deep convolutional neural network (CNN), which takes images from a front-facing camera as input and produces car steering angles as output. DeepPicar uses the same network architecture—9 layers, 27 million connections and 250K parameters—and can drive itself in real-time using a web camera and a Raspberry Pi 3 quad-core platform. Using DeepPicar, we analyze the Pi 3’s computing capabilities to support end-to-end deep learning based real-time control of autonomous vehicles. We also systematically compare other contemporary embedded computing platforms using the DeepPicar’s CNN-based real-time control workload. We find that all tested platforms, including the Pi 3, are capable of supporting the CNN-based real-time control, from 20 Hz up to 100 Hz, depending on hardware platform. However, we find that shared resource contention remains an important issue that must be considered in applying CNN models on shared memory based embedded computing platforms; we observe up to 11.6X execution time increase in the CNN based control loop due to shared resource contention. To protect the CNN workload, we also evaluate state-of-the-art cache partitioning and memory bandwidth throttling techniques on the Pi 3. We find that cache partitioning is ineffective, while memory bandwidth throttling is an effective solution.

2018-07-16
Yang, Lei, Li, Fengjun.  2018.  Cloud-Assisted Privacy-Preserving Classification for IoT Applications. IEEE Conference on Communications and Network Security.

The explosive proliferation of Internet of Things (IoT) devices is generating an incomprehensible amount of data. Machine learning plays an imperative role in aggregating this data and extracting valuable information for improving operational and decision-making processes. In particular, emerging machine intelligence platforms that host pre-trained machine learning models are opening up new opportunities for IoT industries. While those platforms facilitate customers to analyze IoT data and deliver faster and accurate insights, end users and machine learning service providers (MLSPs) have raised concerns regarding security and privacy of IoT data as well as the pre-trained machine learning models for certain applications such as healthcare, smart energy, etc. In this paper, we propose a cloud-assisted, privacy-preserving machine learning classification scheme over encrypted data for IoT devices. Our scheme is based on a three-party model coupled with a two-stage decryption Paillier-based cryptosystem, which allows a cloud server to interact with MLSPs on behalf of the resource-constrained IoT devices in a privacy-preserving manner, and shift load of computation-intensive classification operations from them. The detailed security analysis and the extensive simulations with different key lengths and number of features and classes demonstrate that our scheme can effectively reduce the overhead for IoT devices in machine learning classification applications.