A Utility Maximization Framework for Privacy Preservation of User Generated Content
Title | A Utility Maximization Framework for Privacy Preservation of User Generated Content |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Fang, Yi, Godavarthy, Archana, Lu, Haibing |
Conference Name | Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4497-5 |
Keywords | Collaboration, data deletion, Human Behavior, privacy preservation, pubcrawl, Scalability, user generated content |
Abstract | The prodigious amount of user-generated content continues to grow at an enormous rate. While it greatly facilitates the flow of information and ideas among people and communities, it may pose great threat to our individual privacy. In this paper, we demonstrate that the private traits of individuals can be inferred from user-generated content by using text classification techniques. Specifically, we study three private attributes on Twitter users: religion, political leaning, and marital status. The ground truth labels of the private traits can be readily collected from the Twitter bio field. Based on the tweets posted by the users and their corresponding bios, we show that text classification yields a high accuracy of identification of these personal attributes, which poses a great privacy risk on user-generated content. We further propose a constrained utility maximization framework for preserving user privacy. The goal is to maximize the utility of data when modifying the user-generated content, while degrading the prediction performance of the adversary. The KL divergence is minimized between the prior knowledge about the private attribute and the posterior probability after seeing the user-generated data. Based on this proposed framework, we investigate several specific data sanitization operations for privacy preservation: add, delete, or replace words in the tweets. We derive the exact transformation of the data under each operation. The experiments demonstrate the effectiveness of the proposed framework. |
URL | http://doi.acm.org/10.1145/2970398.2970417 |
DOI | 10.1145/2970398.2970417 |
Citation Key | fang_utility_2016 |