Visible to the public Deletion Propagation for Multiple Key Preserving Conjunctive Queries: Approximations and Complexity

TitleDeletion Propagation for Multiple Key Preserving Conjunctive Queries: Approximations and Complexity
Publication TypeConference Paper
Year of Publication2019
AuthorsCai, Zhipeng, Miao, Dongjing, Li, Yingshu
Conference Name2019 IEEE 35th International Conference on Data Engineering (ICDE)
Date PublishedApril 2019
PublisherIEEE
ISBN Number978-1-5386-7474-1
Keywordsapproximation, Approximation algorithms, approximations, Complexity theory, computational complexity, Computer science, conjunctive queries, conjunctive query, data cleaning, data deletion, data lineage, Databases, Debugging, deletion propagation, dichotomy, dynamic programming, Heuristic algorithms, investigated problem, key preserving, multiple key preserving conjunctive queries, multiple project-free, multiple queries, nontrivial set, polynomials, privacy, problem funda-mental, pubcrawl, quality management, query feedback, query processing, realistic case, repairing data, Scalability, single query case, standard deletion propagation problem, tuples, view propagation, view side-effect
Abstract

This paper studies the deletion propagation problem in terms of minimizing view side-effect. It is a problem funda-mental to data lineage and quality management which could be a key step in analyzing view propagation and repairing data. The investigated problem is a variant of the standard deletion propagation problem, where given a source database D, a set of key preserving conjunctive queries Q, and the set of views V obtained by the queries in Q, we try to identify a set T of tuples from D whose elimination prevents all the tuples in a given set of deletions on views ^V while preserving any other results. The complexity of this problem has been well studied for the case with only a single query. Dichotomies, even trichotomies, for different settings are developed. However, no results on multiple queries are given which is a more realistic case. We study the complexity and approximations of optimizing the side-effect on the views, i.e., find T to minimize the additional damage on V after removing all the tuples of ^V. We focus on the class of key-preserving conjunctive queries which is a dichotomy for the single query case. It is surprising to find that except the single query case, this problem is NP-hard to approximate within any constant even for a non-trivial set of multiple project-free conjunctive queries in terms of view side-effect. The proposed algorithm shows that it can be approximated within a bound depending on the number of tuples of both V and ^V. We identify a class of polynomial tractable inputs, and provide a dynamic programming algorithm to solve the problem. Besides data lineage, study on this problem could also provide important foundations for the computational issues in data repairing. Furthermore, we introduce some related applications of this problem, especially for query feedback based data cleaning.

URLhttps://ieeexplore.ieee.org/document/8731474
DOI10.1109/ICDE.2019.00052
Citation Keycai_deletion_2019