Constraint-Variance Tolerant Data Repairing
Title | Constraint-Variance Tolerant Data Repairing |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Song, Shaoxu, Zhu, Han, Wang, Jianmin |
Conference Name | Proceedings of the 2016 International Conference on Management of Data |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-3531-7 |
Keywords | data repairing, denial constraints, pubcrawl170201 |
Abstract | Integrity constraints, guiding the cleaning of dirty data, are often found to be imprecise as well. Existing studies consider the inaccurate constraints that are oversimplified, and thus refine the constraints via inserting more predicates (attributes). We note that imprecise constraints may not only be oversimplified so that correct data are erroneously identified as violations, but also could be overrefined that the constraints overfit the data and fail to identify true violations. In the latter case, deleting excessive predicates applies. To address the oversimplified and overrefined constraint inaccuracies, in this paper, we propose to repair data by allowing a small variation (with both predicate insertion and deletion) on the constraints. A novel th-tolerant repair model is introduced, which returns a (minimum) data repair that satisfies at least one variant of the constraints (with constraint variation no greater than th compared to the given constraints). To efficiently repair data among various constraint variants, we propose a single round, sharing enabled approach. Results on real data sets demonstrate that our proposal can capture more accurate data repairs compared to the existing methods with/without constraint repairs. |
URL | http://doi.acm.org/10.1145/2882903.2882955 |
DOI | 10.1145/2882903.2882955 |
Citation Key | song_constraint-variance_2016 |