Visible to the public Biblio

Filters: Author is Haas, Daniel  [Clear All Filters]
2017-03-07
Krishnan, Sanjay, Haas, Daniel, Franklin, Michael J., Wu, Eugene.  2016.  Towards Reliable Interactive Data Cleaning: A User Survey and Recommendations. Proceedings of the Workshop on Human-In-the-Loop Data Analytics. :9:1–9:5.

Data cleaning is frequently an iterative process tailored to the requirements of a specific analysis task. The design and implementation of iterative data cleaning tools presents novel challenges, both technical and organizational, to the community. In this paper, we present results from a user survey (N = 29) of data analysts and infrastructure engineers from industry and academia. We highlight three important themes: (1) the iterative nature of data cleaning, (2) the lack of rigor in evaluating the correctness of data cleaning, and (3) the disconnect between the analysts who query the data and the infrastructure engineers who design the cleaning pipelines. We conclude by presenting a number of recommendations for future work in which we envision an interactive data cleaning system that accounts for the observed challenges.