Biblio
Filters: Author is Li, Xiaowei [Clear All Filters]
Real-Time Meets Approximate Computing: An Elastic CNN Inference Accelerator with Adaptive Trade-off Between QoS and QoR. Proceedings of the 54th Annual Design Automation Conference 2017. :33:1–33:6.
.
2017. Due to the recent progress in deep learning and neural acceleration architectures, specialized deep neural network or convolutional neural network (CNNs) accelerators are expected to provide an energy-efficient solution for real-time vision/speech processing. recognition and a wide spectrum of approximate computing applications. In addition to their wide applicability scope, we also found that the fascinating feature of deterministic performance and high energy-efficiency, makes such deep learning (DL) accelerators ideal candidates as application-processor IPs in embedded SoCs concerned with real-time processing. However, unlike traditional accelerator designs, DL accelerators introduce a new aspect of design trade-off between real-time processing (QoS) and computation approximation (QoR) into embedded systems. This work proposes an elastic CNN acceleration architecture that automatically adapts to the hard QoS constraint by exploiting the error-resilience in typical approximate computing workloads For the first time, the proposed design, including network tuning-and-mapping software and reconfigurable accelerator hardware, aims to reconcile the design constraint of QoS and Quality of Result (QoR). which are respectively the key concerns in real-time and approximate computing. It is shown in experiments that the proposed architecture enables the embedded system to work flexibly in an expanded operating space, significantly enhances its real-time ability. and maximizes the energy-efficiency of system within the user-specified QoS-QoR constraint through self-reconfiguration.