Title | Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Zhang, Feng, Zhai, Jidong, Shen, Xipeng, Mutlu, Onur, Chen, Wenguang |
Conference Name | Proceedings of the 2018 International Conference on Supercomputing |
Publisher | ACM |
ISBN Number | 978-1-4503-5783-8 |
Keywords | compilers, composability, Domain Specific Languages, human factors, Metrics, pubcrawl, Scalability, text analytics |
Abstract | Today's rapidly growing document volumes pose pressing challenges to modern document analytics frameworks, in both space usage and processing time. Recently, a promising method, called text analytics directly on compressed data (TADOC), was proposed for improving both the time and space efficiency of text analytics. The main idea of the technique is to enable direct document analytics on compressed data. This paper focuses on the programming challenges for developing efficient TADOC programs. It presents Zwift, the first programming framework for TADOC, which consists of a Domain Specific Language, a compiler and runtime, and a utility library. Experiments show that Zwift significantly improves programming productivity, while effectively unleashing the power of TADOC, producing code that reduces storage usage by 90.8% and execution time by 41.0% on six text analytics problems. |
DOI | 10.1145/3205289.3205325 |
Citation Key | zhang_zwift:_2018 |