Boda: A Holistic Approach for Implementing Neural Network Computations

Submitted by grigby1 on Mon, 06/11/2018 - 3:33pm

Title	Boda: A Holistic Approach for Implementing Neural Network Computations
Publication Type	Conference Paper
Year of Publication	2017
Authors	Moskewicz, Matthew W., Jannesari, Ali, Keutzer, Kurt
Conference Name	Proceedings of the Computing Frontiers Conference
Publisher	ACM
Conference Location	New York, NY, USA
ISBN Number	978-1-4503-4487-6
Keywords	code generation, composability, Computer vision, convolution, Metrics, mobile computing, network coding, Neural networks, pubcrawl, resilience, Resiliency
Abstract	Neural networks (NNs) are currently a very popular topic in machine learning for both research and practice. GPUs are the dominant computing platform for research efforts and are also gaining popularity as a deployment platform for applications such as autonomous vehicles. As a result, GPU vendors such as NVIDIA have spent enormous effort to write special-purpose NN libraries. On other hardware targets, especially mobile GPUs, such vendor libraries are not generally available. Thus, the development of portable, open, high-performance, energy-efficient GPU code for NN operations would enable broader deployment of NN-based algorithms. A root problem is that high efficiency GPU programming suffers from high complexity, low productivity, and low portability. To address this, this work presents a framework to enable productive, high-efficiency GPU programming for NN computations across hardware platforms and programming models. In particular, the framework provides specific support for metaprogramming and autotuning of operations over ND-Arrays. To show the correctness and value of our framework and approach, we implement a selection of NN operations, covering the core operations needed for deploying three common image-processing neural networks. We target three different hardware platforms: NVIDIA, AMD, and Qualcomm GPUs. On NVIDIA GPUs, we show both portability between OpenCL and CUDA as well competitive performance compared to the vendor library. On Qualcomm GPUs, we show that our framework enables productive development of target-specific optimizations, and achieves reasonable absolute performance. Finally, On AMD GPUs, we show initial results that indicate our framework can yield reasonable performance on a new platform with minimal effort.
URL	http://doi.acm.org/10.1145/3075564.3077382
DOI	10.1145/3075564.3077382
Citation Key	moskewicz_boda:_2017

Groups:

Science of Security VO