Exploiting Variable Precision Computation Array for Scalable Neural Network Accelerators

Submitted by grigby1 on Tue, 11/08/2022 - 11:51am

Title	Exploiting Variable Precision Computation Array for Scalable Neural Network Accelerators
Publication Type	Conference Paper
Year of Publication	2020
Authors	Yang, Shaofei, Liu, Longjun, Li, Baoting, Sun, Hongbin, Zheng, Nanning
Conference Name	2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
Keywords	Accelerator, Computational efficiency, convolution, deep neural networks, Dynamic Quantization, encoding, Energy Efficiency Computing Array, Microsoft Windows, neural network resiliency, Neural networks, parallel processing, pubcrawl, Quantization (signal), resilience, Resiliency
Abstract	In this paper, we present a flexible Variable Precision Computation Array (VPCA) component for different accelerators, which leverages a sparsification scheme for activations and a low bits serial-parallel combination computation unit for improving the efficiency and resiliency of accelerators. The VPCA can dynamically decompose the width of activation/weights (from 32bit to 3bit in different accelerators) into 2-bits serial computation units while the 2bits computing units can be combined in parallel computing for high throughput. We propose an on-the-fly compressing and calculating strategy SLE-CLC (single lane encoding, cross lane calculation), which could further improve performance of 2-bit parallel computing. The experiments results on image classification datasets show VPCA can outperforms DaDianNao, Stripes, Loom-2bit by 4.67x, 2.42x, 1.52x without other overhead on convolution layers.
DOI	10.1109/AICAS48895.2020.9073832
Citation Key	yang_exploiting_2020

Groups:

Science of Security VO