Unreliable memory operation on a convolutional neural network processor

Submitted by grigby1 on Thu, 06/07/2018 - 3:06pm

Title	Unreliable memory operation on a convolutional neural network processor
Publication Type	Conference Paper
Year of Publication	2017
Authors	Marques, J., Andrade, J., Falcao, G.
Conference Name	2017 IEEE International Workshop on Signal Processing Systems (SiPS)
Keywords	bit protection, bit-cells, classification challenges, CNN resilience, convolutional neural network processor, data elements, Degradation, detection challenges, DRAM chips, embedded dynamic RAM system, Embedded systems, error probability degradation, fault diagnosis, fault mitigation strategies, Fault tolerance, fault tolerant computing, feature maps memory space, hardware memories, inference capabilities, Kernel, Memory management, MNIST dataset, neural nets, Neural Network Resilience, power aware computing, pubcrawl, Random access memory, reliability, resilience, Resiliency, severe fault-injection rates, size 28.0 nm, software fault tolerance, storage management chips, Training, unreliable memory operation
Abstract	The evolution of convolutional neural networks (CNNs) into more complex forms of organization, with additional layers, larger convolutions and increasing connections, established the state-of-the-art in terms of accuracy errors for detection and classification challenges in images. Moreover, as they evolved to a point where Gigabytes of memory are required for their operation, we have reached a stage where it becomes fundamental to understand how their inference capabilities can be impaired if data elements somehow become corrupted in memory. This paper introduces fault-injection in these systems by simulating failing bit-cells in hardware memories brought on by relaxing the 100% reliable operation assumption. We analyze the behavior of these networks calculating inference under severe fault-injection rates and apply fault mitigation strategies to improve on the CNNs resilience. For the MNIST dataset, we show that 8x less memory is required for the feature maps memory space, and that in sub-100% reliable operation, fault-injection rates up to 10-1 (with most significant bit protection) can withstand only a 1% error probability degradation. Furthermore, considering the offload of the feature maps memory to an embedded dynamic RAM (eDRAM) system, using technology nodes from 65 down to 28 nm, up to 73 80% improved power efficiency can be obtained.
URL	https://ieeexplore.ieee.org/document/8110024/
DOI	10.1109/SiPS.2017.8110024
Citation Key	marques_unreliable_2017

Groups:

Science of Security VO