Visible to the public A Lightweight Error-Resiliency Mechanism for Deep Neural Networks

TitleA Lightweight Error-Resiliency Mechanism for Deep Neural Networks
Publication TypeConference Paper
Year of Publication2021
AuthorsGoldstein, Brunno F., Ferreira, Victor C., Srinivasan, Sudarshan, Das, Dipankar, Nery, Alexandre S., Kundu, Sandip, França, Felipe M. G.
Conference Name2021 22nd International Symposium on Quality Electronic Design (ISQED)
KeywordsBit error rate, error resiliency, Hardware, Neural Network Accelerators, neural network resiliency, Neural networks, Power supplies, pubcrawl, Quantization (signal), Real-time Systems, reliability, Reliability engineering, resilience, Resiliency
AbstractIn recent years, Deep Neural Networks (DNNs) have made inroads into a number of applications involving pattern recognition - from facial recognition to self-driving cars. Some of these applications, such as self-driving cars, have real-time requirements, where specialized DNN hardware accelerators help meet those requirements. Since DNN execution time is dominated by convolution, Multiply-and-Accumulate (MAC) units are at the heart of these accelerators. As hardware accelerators push the performance limits with strict power constraints, reliability is often compromised. In particular, power-constrained DNN accelerators are more vulnerable to transient and intermittent hardware faults due to particle hits, manufacturing variations, and fluctuations in power supply voltage and temperature. Methods such as hardware replication have been used to deal with these reliability problems in the past. Unfortunately, the duplication approach is untenable in a power constrained environment. This paper introduces a low-cost error-resiliency scheme that targets MAC units employed in conventional DNN accelerators. We evaluate the reliability improvements from the proposed architecture using a set of 6 CNNs over varying bit error rates (BER) and demonstrate that our proposed solution can achieve more than 99% of fault coverage with a 5-bits arithmetic code, complying with the ASIL-D level of ISO26262 standards with a negligible area and power overhead. Additionally, we evaluate the proposed detection mechanism coupled with a word masking correction scheme, demonstrating no loss of accuracy up to a BER of 10-2.
DOI10.1109/ISQED51717.2021.9424287
Citation Keygoldstein_lightweight_2021