Visible to the public Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data

TitleZwift: A Programming Framework for High Performance Text Analytics on Compressed Data
Publication TypeConference Paper
Year of Publication2018
AuthorsZhang, Feng, Zhai, Jidong, Shen, Xipeng, Mutlu, Onur, Chen, Wenguang
Conference NameProceedings of the 2018 International Conference on Supercomputing
PublisherACM
ISBN Number978-1-4503-5783-8
Keywordscompilers, composability, Domain Specific Languages, human factors, Metrics, pubcrawl, Scalability, text analytics
AbstractToday's rapidly growing document volumes pose pressing challenges to modern document analytics frameworks, in both space usage and processing time. Recently, a promising method, called text analytics directly on compressed data (TADOC), was proposed for improving both the time and space efficiency of text analytics. The main idea of the technique is to enable direct document analytics on compressed data. This paper focuses on the programming challenges for developing efficient TADOC programs. It presents Zwift, the first programming framework for TADOC, which consists of a Domain Specific Language, a compiler and runtime, and a utility library. Experiments show that Zwift significantly improves programming productivity, while effectively unleashing the power of TADOC, producing code that reduces storage usage by 90.8% and execution time by 41.0% on six text analytics problems.
DOI10.1145/3205289.3205325
Citation Keyzhang_zwift:_2018