Visible to the public Towards Reliable Interactive Data Cleaning: A User Survey and Recommendations

TitleTowards Reliable Interactive Data Cleaning: A User Survey and Recommendations
Publication TypeConference Paper
Year of Publication2016
AuthorsKrishnan, Sanjay, Haas, Daniel, Franklin, Michael J., Wu, Eugene
Conference NameProceedings of the Workshop on Human-In-the-Loop Data Analytics
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4207-0
Keywordspubcrawl170201
Abstract

Data cleaning is frequently an iterative process tailored to the requirements of a specific analysis task. The design and implementation of iterative data cleaning tools presents novel challenges, both technical and organizational, to the community. In this paper, we present results from a user survey (N = 29) of data analysts and infrastructure engineers from industry and academia. We highlight three important themes: (1) the iterative nature of data cleaning, (2) the lack of rigor in evaluating the correctness of data cleaning, and (3) the disconnect between the analysts who query the data and the infrastructure engineers who design the cleaning pipelines. We conclude by presenting a number of recommendations for future work in which we envision an interactive data cleaning system that accounts for the observed challenges.

URLhttp://doi.acm.org/10.1145/2939502.2939511
DOI10.1145/2939502.2939511
Citation Keykrishnan_towards_2016