Biblio
We propose a new view on data cleaning: Not data itself but the degrees of uncertainty attributed to data are dirty. Applying possibility theory, tuples are assigned degrees of possibility with which they occur, and constraints are assigned degrees of certainty that say to which tuples they apply. Classical data cleaning modifies some minimal set of tuples. Instead, we marginally reduce their degrees of possibility. This reduction leads to a new qualitative version of the vertex cover problem. Qualitative vertex cover can be mapped to a linear-weighted constraint satisfaction problem. However, any off-the-shelf solver cannot solve the problem more efficiently than classical vertex cover. Instead, we utilize the degrees of possibility and certainty to develop a dedicated algorithm that is fixed parameter tractable in the size of the qualitative vertex cover. Experiments show that our algorithm is faster than solvers for the classical vertex cover problem by several orders of magnitude, and performance improves with higher numbers of uncertainty degrees.