Title | Real-Time Meets Approximate Computing: An Elastic CNN Inference Accelerator with Adaptive Trade-off Between QoS and QoR |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | WANG, YING, Li, Huawei, Li, Xiaowei |
Conference Name | Proceedings of the 54th Annual Design Automation Conference 2017 |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4927-7 |
Keywords | Neural Network Resilience, pubcrawl, resilience, Resiliency |
Abstract | Due to the recent progress in deep learning and neural acceleration architectures, specialized deep neural network or convolutional neural network (CNNs) accelerators are expected to provide an energy-efficient solution for real-time vision/speech processing. recognition and a wide spectrum of approximate computing applications. In addition to their wide applicability scope, we also found that the fascinating feature of deterministic performance and high energy-efficiency, makes such deep learning (DL) accelerators ideal candidates as application-processor IPs in embedded SoCs concerned with real-time processing. However, unlike traditional accelerator designs, DL accelerators introduce a new aspect of design trade-off between real-time processing (QoS) and computation approximation (QoR) into embedded systems. This work proposes an elastic CNN acceleration architecture that automatically adapts to the hard QoS constraint by exploiting the error-resilience in typical approximate computing workloads For the first time, the proposed design, including network tuning-and-mapping software and reconfigurable accelerator hardware, aims to reconcile the design constraint of QoS and Quality of Result (QoR). which are respectively the key concerns in real-time and approximate computing. It is shown in experiments that the proposed architecture enables the embedded system to work flexibly in an expanded operating space, significantly enhances its real-time ability. and maximizes the energy-efficiency of system within the user-specified QoS-QoR constraint through self-reconfiguration. |
URL | http://doi.acm.org/10.1145/3061639.3062307 |
DOI | 10.1145/3061639.3062307 |
Citation Key | wang_real-time_2017 |