Visible to the public Biblio

Filters: Author is Li, Yingshu  [Clear All Filters]
2020-07-10
Cai, Zhipeng, Miao, Dongjing, Li, Yingshu.  2019.  Deletion Propagation for Multiple Key Preserving Conjunctive Queries: Approximations and Complexity. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :506—517.

This paper studies the deletion propagation problem in terms of minimizing view side-effect. It is a problem funda-mental to data lineage and quality management which could be a key step in analyzing view propagation and repairing data. The investigated problem is a variant of the standard deletion propagation problem, where given a source database D, a set of key preserving conjunctive queries Q, and the set of views V obtained by the queries in Q, we try to identify a set T of tuples from D whose elimination prevents all the tuples in a given set of deletions on views △V while preserving any other results. The complexity of this problem has been well studied for the case with only a single query. Dichotomies, even trichotomies, for different settings are developed. However, no results on multiple queries are given which is a more realistic case. We study the complexity and approximations of optimizing the side-effect on the views, i.e., find T to minimize the additional damage on V after removing all the tuples of △V. We focus on the class of key-preserving conjunctive queries which is a dichotomy for the single query case. It is surprising to find that except the single query case, this problem is NP-hard to approximate within any constant even for a non-trivial set of multiple project-free conjunctive queries in terms of view side-effect. The proposed algorithm shows that it can be approximated within a bound depending on the number of tuples of both V and △V. We identify a class of polynomial tractable inputs, and provide a dynamic programming algorithm to solve the problem. Besides data lineage, study on this problem could also provide important foundations for the computational issues in data repairing. Furthermore, we introduce some related applications of this problem, especially for query feedback based data cleaning.

2018-01-10
He, Zaobo, Cai, Zhipeng, Sun, Yunchuan, Li, Yingshu, Cheng, Xiuzhen.  2017.  Customized Privacy Preserving for Inherent Data and Latent Data. Personal Ubiquitous Comput.. 21:43–54.
The huge amount of sensory data collected from mobile devices has offered great potentials to promote more significant services based on user data extracted from sensor readings. However, releasing user data could also seriously threaten user privacy. It is possible to directly collect sensitive information from released user data without user permissions. Furthermore, third party users can also infer sensitive information contained in released data in a latent manner by utilizing data mining techniques. In this paper, we formally define these two types of threats as inherent data privacy and latent data privacy and construct a data-sanitization strategy that can optimize the tradeoff between data utility and customized two types of privacy. The key novel idea lies that the developed strategy can combat against powerful third party users with broad knowledge about users and launching optimal inference attacks. We show that our strategy does not reduce the benefit brought by user data much, while sensitive information can still be protected. To the best of our knowledge, this is the first work that preserves both inherent data privacy and latent data privacy.
2017-06-05
He, Zaobo, Cai, Zhipeng, Li, Yingshu.  2016.  Customized Privacy Preserving for Classification Based Applications. Proceedings of the 1st ACM Workshop on Privacy-Aware Mobile Computing. :37–42.

The rise of sensor-equipped smart phones has enabled a variety of classification based applications that provide personalized services based on user data extracted from sensor readings. However, malicious applications aggressively collect sensitive information from inherent user data without permissions. Furthermore, they can mine sensitive information from user data just in the classification process. These privacy threats raise serious privacy concerns. In this paper, we introduce two new privacy concerns which are inherent-data privacy and latent-data privacy. We propose a framework that enables a data-obfuscation mechanism to be developed easily. It preserves latent-data privacy while guaranteeing satisfactory service quality. The proposed framework preserves privacy against powerful adversaries who have knowledge of users' access pattern and the data-obfuscation mechanism. We validate our framework towards a real classification-orientated dataset. The experiment results confirm that our framework is superior to the basic obfuscation mechanism.