Towards Reliable Interactive Data Cleaning: A User Survey and Recommendations
Title | Towards Reliable Interactive Data Cleaning: A User Survey and Recommendations |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Krishnan, Sanjay, Haas, Daniel, Franklin, Michael J., Wu, Eugene |
Conference Name | Proceedings of the Workshop on Human-In-the-Loop Data Analytics |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4207-0 |
Keywords | pubcrawl170201 |
Abstract | Data cleaning is frequently an iterative process tailored to the requirements of a specific analysis task. The design and implementation of iterative data cleaning tools presents novel challenges, both technical and organizational, to the community. In this paper, we present results from a user survey (N = 29) of data analysts and infrastructure engineers from industry and academia. We highlight three important themes: (1) the iterative nature of data cleaning, (2) the lack of rigor in evaluating the correctness of data cleaning, and (3) the disconnect between the analysts who query the data and the infrastructure engineers who design the cleaning pipelines. We conclude by presenting a number of recommendations for future work in which we envision an interactive data cleaning system that accounts for the observed challenges. |
URL | http://doi.acm.org/10.1145/2939502.2939511 |
DOI | 10.1145/2939502.2939511 |
Citation Key | krishnan_towards_2016 |