Title | Exploiting Variable Precision Computation Array for Scalable Neural Network Accelerators |
Publication Type | Conference Paper |
Year of Publication | 2020 |
Authors | Yang, Shaofei, Liu, Longjun, Li, Baoting, Sun, Hongbin, Zheng, Nanning |
Conference Name | 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) |
Keywords | Accelerator, Computational efficiency, convolution, deep neural networks, Dynamic Quantization, encoding, Energy Efficiency Computing Array, Microsoft Windows, neural network resiliency, Neural networks, parallel processing, pubcrawl, Quantization (signal), resilience, Resiliency |
Abstract | In this paper, we present a flexible Variable Precision Computation Array (VPCA) component for different accelerators, which leverages a sparsification scheme for activations and a low bits serial-parallel combination computation unit for improving the efficiency and resiliency of accelerators. The VPCA can dynamically decompose the width of activation/weights (from 32bit to 3bit in different accelerators) into 2-bits serial computation units while the 2bits computing units can be combined in parallel computing for high throughput. We propose an on-the-fly compressing and calculating strategy SLE-CLC (single lane encoding, cross lane calculation), which could further improve performance of 2-bit parallel computing. The experiments results on image classification datasets show VPCA can outperforms DaDianNao, Stripes, Loom-2bit by 4.67x, 2.42x, 1.52x without other overhead on convolution layers. |
DOI | 10.1109/AICAS48895.2020.9073832 |
Citation Key | yang_exploiting_2020 |