Visible to the public Energy-Efficient Deep Neural Networks Implementation on a Scalable Heterogeneous FPGA Cluster

TitleEnergy-Efficient Deep Neural Networks Implementation on a Scalable Heterogeneous FPGA Cluster
Publication TypeConference Paper
Year of Publication2021
AuthorsHu, Yanbu, Shao, Cuiping, Li, Huiyun
Conference Name2021 IEEE 15th International Conference on Anti-counterfeiting, Security, and Identification (ASID)
Keywordsdeep neural networks implementation, Energy efficiency, Energy-efficiency, graphics processing units, high throughput, pubcrawl, Resource management, Scalability, scalable heterogeneous FPGA cluster, Scalable Security, security, Task Analysis, Throughput, Time complexity
AbstractIn recent years, with the rapid development of DNN, the algorithm complexity in a series of fields such as computer vision and natural language processing is increasing rapidly. FPGA-based DNN accelerators have demonstrated superior flexibility and performance, with higher energy efficiency compared to high-performance devices such as GPU. However, the computing resources of a single FPGA are limited and it is difficult to flexibly meet the requirements of high throughput and high energy efficiency of different computing scales. Therefore, this paper proposes a DNN implementation method based on the scalable heterogeneous FPGA cluster to adapt to different tasks and achieve high throughput and energy efficiency. Firstly, the method divides a single enormous task into multiple modules and running each module on different FPGA as the pipeline structure between multiple boards. Secondly, a task deployment method based on dichotomy is proposed to maximize the balance of task execution time of different pipeline stages to improve throughput and energy efficiency. Thirdly, optimize DNN computing module according to the relationship between computing power and bandwidth, and improve energy efficiency by reducing waste of ineffective resources and improving resource utilization. The experiment results on Alexnet and VGG-16 demonstrate that we use Zynq 7035 cluster can at most achieves x25.23 energy efficiency of optimized AMD AIO processor. Compared with previous works of single FPGA and FPGA cluster, the energy efficiency is improved by 59.5% and 18.8%, respectively.
DOI10.1109/ASID52932.2021.9651719
Citation Keyhu_energy-efficient_2021