Visible to the public Exploiting Variable Precision Computation Array for Scalable Neural Network Accelerators

TitleExploiting Variable Precision Computation Array for Scalable Neural Network Accelerators
Publication TypeConference Paper
Year of Publication2020
AuthorsYang, Shaofei, Liu, Longjun, Li, Baoting, Sun, Hongbin, Zheng, Nanning
Conference Name2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
KeywordsAccelerator, Computational efficiency, convolution, deep neural networks, Dynamic Quantization, encoding, Energy Efficiency Computing Array, Microsoft Windows, neural network resiliency, Neural networks, parallel processing, pubcrawl, Quantization (signal), resilience, Resiliency
AbstractIn this paper, we present a flexible Variable Precision Computation Array (VPCA) component for different accelerators, which leverages a sparsification scheme for activations and a low bits serial-parallel combination computation unit for improving the efficiency and resiliency of accelerators. The VPCA can dynamically decompose the width of activation/weights (from 32bit to 3bit in different accelerators) into 2-bits serial computation units while the 2bits computing units can be combined in parallel computing for high throughput. We propose an on-the-fly compressing and calculating strategy SLE-CLC (single lane encoding, cross lane calculation), which could further improve performance of 2-bit parallel computing. The experiments results on image classification datasets show VPCA can outperforms DaDianNao, Stripes, Loom-2bit by 4.67x, 2.42x, 1.52x without other overhead on convolution layers.
DOI10.1109/AICAS48895.2020.9073832
Citation Keyyang_exploiting_2020