Biblio
This paper studies the deletion propagation problem in terms of minimizing view side-effect. It is a problem funda-mental to data lineage and quality management which could be a key step in analyzing view propagation and repairing data. The investigated problem is a variant of the standard deletion propagation problem, where given a source database D, a set of key preserving conjunctive queries Q, and the set of views V obtained by the queries in Q, we try to identify a set T of tuples from D whose elimination prevents all the tuples in a given set of deletions on views △V while preserving any other results. The complexity of this problem has been well studied for the case with only a single query. Dichotomies, even trichotomies, for different settings are developed. However, no results on multiple queries are given which is a more realistic case. We study the complexity and approximations of optimizing the side-effect on the views, i.e., find T to minimize the additional damage on V after removing all the tuples of △V. We focus on the class of key-preserving conjunctive queries which is a dichotomy for the single query case. It is surprising to find that except the single query case, this problem is NP-hard to approximate within any constant even for a non-trivial set of multiple project-free conjunctive queries in terms of view side-effect. The proposed algorithm shows that it can be approximated within a bound depending on the number of tuples of both V and △V. We identify a class of polynomial tractable inputs, and provide a dynamic programming algorithm to solve the problem. Besides data lineage, study on this problem could also provide important foundations for the computational issues in data repairing. Furthermore, we introduce some related applications of this problem, especially for query feedback based data cleaning.
The rise of sensor-equipped smart phones has enabled a variety of classification based applications that provide personalized services based on user data extracted from sensor readings. However, malicious applications aggressively collect sensitive information from inherent user data without permissions. Furthermore, they can mine sensitive information from user data just in the classification process. These privacy threats raise serious privacy concerns. In this paper, we introduce two new privacy concerns which are inherent-data privacy and latent-data privacy. We propose a framework that enables a data-obfuscation mechanism to be developed easily. It preserves latent-data privacy while guaranteeing satisfactory service quality. The proposed framework preserves privacy against powerful adversaries who have knowledge of users' access pattern and the data-obfuscation mechanism. We validate our framework towards a real classification-orientated dataset. The experiment results confirm that our framework is superior to the basic obfuscation mechanism.