Visible to the public Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale

TitleProgramming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale
Publication TypeConference Paper
Year of Publication2016
AuthorsHuang, Muhuan, Wu, Di, Yu, Cody Hao, Fang, Zhenman, Interlandi, Matteo, Condie, Tyson, Cong, Jason
Conference NameProceedings of the Seventh ACM Symposium on Cloud Computing
Date PublishedOctober 2016
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4525-5
KeywordsFPGA-as-a-service, heterogeneous datacenter, pubcrawl, pubcrawl170201, science of security
Abstract

With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microsoft's FPGA deployment in its Bing search engine and Intel's 16.7 billion acquisition of Altera, integrating FPGAs into datacenters is considered one of the most promising approaches to sustain future datacenter growth. However, it is quite challenging for existing big data computing systems--like Apache Spark and Hadoop--to access the performance and energy benefits of FPGA accelerators. In this paper we design and implement Blaze to provide programming and runtime support for enabling easy and efficient deployments of FPGA accelerators in datacenters. In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators. Our Blaze runtime implements an FaaS framework to efficiently share FPGA accelerators among multiple heterogeneous threads on a single node, and extends Hadoop YARN with accelerator-centric scheduling to efficiently share them among multiple computing tasks in the cluster. Experimental results using four representative big data applications demonstrate that Blaze greatly reduces the programming efforts to access FPGA accelerators in systems like Apache Spark and YARN, and improves the system throughput by 1.7x to 3x (and energy efficiency by 1.5x to 2.7x) compared to a conventional CPU-only cluster.

URLhttps://dl.acm.org/doi/10.1145/2987550.2987569
DOI10.1145/2987550.2987569
Citation Keyhuang_programming_2016