Unreliable memory operation on a convolutional neural network processor
Title | Unreliable memory operation on a convolutional neural network processor |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Marques, J., Andrade, J., Falcao, G. |
Conference Name | 2017 IEEE International Workshop on Signal Processing Systems (SiPS) |
Keywords | bit protection, bit-cells, classification challenges, CNN resilience, convolutional neural network processor, data elements, Degradation, detection challenges, DRAM chips, embedded dynamic RAM system, Embedded systems, error probability degradation, fault diagnosis, fault mitigation strategies, Fault tolerance, fault tolerant computing, feature maps memory space, hardware memories, inference capabilities, Kernel, Memory management, MNIST dataset, neural nets, Neural Network Resilience, power aware computing, pubcrawl, Random access memory, reliability, resilience, Resiliency, severe fault-injection rates, size 28.0 nm, software fault tolerance, storage management chips, Training, unreliable memory operation |
Abstract | The evolution of convolutional neural networks (CNNs) into more complex forms of organization, with additional layers, larger convolutions and increasing connections, established the state-of-the-art in terms of accuracy errors for detection and classification challenges in images. Moreover, as they evolved to a point where Gigabytes of memory are required for their operation, we have reached a stage where it becomes fundamental to understand how their inference capabilities can be impaired if data elements somehow become corrupted in memory. This paper introduces fault-injection in these systems by simulating failing bit-cells in hardware memories brought on by relaxing the 100% reliable operation assumption. We analyze the behavior of these networks calculating inference under severe fault-injection rates and apply fault mitigation strategies to improve on the CNNs resilience. For the MNIST dataset, we show that 8x less memory is required for the feature maps memory space, and that in sub-100% reliable operation, fault-injection rates up to 10-1 (with most significant bit protection) can withstand only a 1% error probability degradation. Furthermore, considering the offload of the feature maps memory to an embedded dynamic RAM (eDRAM) system, using technology nodes from 65 down to 28 nm, up to 73 80% improved power efficiency can be obtained. |
URL | https://ieeexplore.ieee.org/document/8110024/ |
DOI | 10.1109/SiPS.2017.8110024 |
Citation Key | marques_unreliable_2017 |
- Reliability
- Kernel
- Memory management
- MNIST dataset
- neural nets
- Neural Network Resilience
- power aware computing
- pubcrawl
- Random access memory
- inference capabilities
- resilience
- Resiliency
- severe fault-injection rates
- size 28.0 nm
- software fault tolerance
- storage management chips
- Training
- unreliable memory operation
- embedded dynamic RAM system
- bit-cells
- classification challenges
- CNN resilience
- convolutional neural network processor
- data elements
- Degradation
- detection challenges
- DRAM chips
- bit protection
- embedded systems
- error probability degradation
- fault diagnosis
- fault mitigation strategies
- fault tolerance
- fault tolerant computing
- feature maps memory space
- hardware memories