Biblio
Filters: Author is Kundu, Sandip [Clear All Filters]
A Lightweight Error-Resiliency Mechanism for Deep Neural Networks. 2021 22nd International Symposium on Quality Electronic Design (ISQED). :311–316.
.
2021. In recent years, Deep Neural Networks (DNNs) have made inroads into a number of applications involving pattern recognition - from facial recognition to self-driving cars. Some of these applications, such as self-driving cars, have real-time requirements, where specialized DNN hardware accelerators help meet those requirements. Since DNN execution time is dominated by convolution, Multiply-and-Accumulate (MAC) units are at the heart of these accelerators. As hardware accelerators push the performance limits with strict power constraints, reliability is often compromised. In particular, power-constrained DNN accelerators are more vulnerable to transient and intermittent hardware faults due to particle hits, manufacturing variations, and fluctuations in power supply voltage and temperature. Methods such as hardware replication have been used to deal with these reliability problems in the past. Unfortunately, the duplication approach is untenable in a power constrained environment. This paper introduces a low-cost error-resiliency scheme that targets MAC units employed in conventional DNN accelerators. We evaluate the reliability improvements from the proposed architecture using a set of 6 CNNs over varying bit error rates (BER) and demonstrate that our proposed solution can achieve more than 99% of fault coverage with a 5-bits arithmetic code, complying with the ASIL-D level of ISO26262 standards with a negligible area and power overhead. Additionally, we evaluate the proposed detection mechanism coupled with a word masking correction scheme, demonstrating no loss of accuracy up to a BER of 10-2.
MILR: Mathematically Induced Layer Recovery for Plaintext Space Error Correction of CNNs. 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :75–87.
.
2021. The increased use of Convolutional Neural Networks (CNN) in mission-critical systems has increased the need for robust and resilient networks in the face of both naturally occurring faults as well as security attacks. The lack of robustness and resiliency can lead to unreliable inference results. Current methods that address CNN robustness require hardware modification, network modification, or network duplication. This paper proposes MILR a software-based CNN error detection and error correction system that enables recovery from single and multi-bit errors. The recovery capabilities are based on mathematical relationships between the inputs, outputs, and parameters(weights) of the layers; exploiting these relationships allows the recovery of erroneous parameters (iveights) throughout a layer and the network. MILR is suitable for plaintext-space error correction (PSEC) given its ability to correct whole-weight and even whole-layer errors in CNNs.