Deletion Propagation for Multiple Key Preserving Conjunctive Queries: Approximations and Complexity
Title | Deletion Propagation for Multiple Key Preserving Conjunctive Queries: Approximations and Complexity |
Publication Type | Conference Paper |
Year of Publication | 2019 |
Authors | Cai, Zhipeng, Miao, Dongjing, Li, Yingshu |
Conference Name | 2019 IEEE 35th International Conference on Data Engineering (ICDE) |
Date Published | April 2019 |
Publisher | IEEE |
ISBN Number | 978-1-5386-7474-1 |
Keywords | approximation, Approximation algorithms, approximations, Complexity theory, computational complexity, Computer science, conjunctive queries, conjunctive query, data cleaning, data deletion, data lineage, Databases, Debugging, deletion propagation, dichotomy, dynamic programming, Heuristic algorithms, investigated problem, key preserving, multiple key preserving conjunctive queries, multiple project-free, multiple queries, nontrivial set, polynomials, privacy, problem funda-mental, pubcrawl, quality management, query feedback, query processing, realistic case, repairing data, Scalability, single query case, standard deletion propagation problem, tuples, view propagation, view side-effect |
Abstract | This paper studies the deletion propagation problem in terms of minimizing view side-effect. It is a problem funda-mental to data lineage and quality management which could be a key step in analyzing view propagation and repairing data. The investigated problem is a variant of the standard deletion propagation problem, where given a source database D, a set of key preserving conjunctive queries Q, and the set of views V obtained by the queries in Q, we try to identify a set T of tuples from D whose elimination prevents all the tuples in a given set of deletions on views ^V while preserving any other results. The complexity of this problem has been well studied for the case with only a single query. Dichotomies, even trichotomies, for different settings are developed. However, no results on multiple queries are given which is a more realistic case. We study the complexity and approximations of optimizing the side-effect on the views, i.e., find T to minimize the additional damage on V after removing all the tuples of ^V. We focus on the class of key-preserving conjunctive queries which is a dichotomy for the single query case. It is surprising to find that except the single query case, this problem is NP-hard to approximate within any constant even for a non-trivial set of multiple project-free conjunctive queries in terms of view side-effect. The proposed algorithm shows that it can be approximated within a bound depending on the number of tuples of both V and ^V. We identify a class of polynomial tractable inputs, and provide a dynamic programming algorithm to solve the problem. Besides data lineage, study on this problem could also provide important foundations for the computational issues in data repairing. Furthermore, we introduce some related applications of this problem, especially for query feedback based data cleaning. |
URL | https://ieeexplore.ieee.org/document/8731474 |
DOI | 10.1109/ICDE.2019.00052 |
Citation Key | cai_deletion_2019 |
- query processing
- multiple project-free
- multiple queries
- nontrivial set
- polynomials
- privacy
- problem funda-mental
- pubcrawl
- quality management
- query feedback
- multiple key preserving conjunctive queries
- realistic case
- repairing data
- Scalability
- single query case
- standard deletion propagation problem
- tuples
- view propagation
- view side-effect
- data lineage
- Approximation algorithms
- approximations
- Complexity theory
- computational complexity
- computer science
- conjunctive queries
- conjunctive query
- data cleaning
- data deletion
- approximation
- Databases
- debugging
- deletion propagation
- dichotomy
- dynamic programming
- Heuristic algorithms
- investigated problem
- key preserving