Visible to the public Qualitative Cleaning of Uncertain Data

TitleQualitative Cleaning of Uncertain Data
Publication TypeConference Paper
Year of Publication2016
AuthorsKoehler, Henning, Link, Sebastian
Conference NameProceedings of the 25th ACM International on Conference on Information and Knowledge Management
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4073-1
Keywordsdatabase repair, possibility theory, pubcrawl170201, vertex cover
Abstract

We propose a new view on data cleaning: Not data itself but the degrees of uncertainty attributed to data are dirty. Applying possibility theory, tuples are assigned degrees of possibility with which they occur, and constraints are assigned degrees of certainty that say to which tuples they apply. Classical data cleaning modifies some minimal set of tuples. Instead, we marginally reduce their degrees of possibility. This reduction leads to a new qualitative version of the vertex cover problem. Qualitative vertex cover can be mapped to a linear-weighted constraint satisfaction problem. However, any off-the-shelf solver cannot solve the problem more efficiently than classical vertex cover. Instead, we utilize the degrees of possibility and certainty to develop a dedicated algorithm that is fixed parameter tractable in the size of the qualitative vertex cover. Experiments show that our algorithm is faster than solvers for the classical vertex cover problem by several orders of magnitude, and performance improves with higher numbers of uncertainty degrees.

URLhttp://doi.acm.org/10.1145/2983323.2983679
DOI10.1145/2983323.2983679
Citation Keykoehler_qualitative_2016