Visible to the public A Utility Maximization Framework for Privacy Preservation of User Generated Content

TitleA Utility Maximization Framework for Privacy Preservation of User Generated Content
Publication TypeConference Paper
Year of Publication2016
AuthorsFang, Yi, Godavarthy, Archana, Lu, Haibing
Conference NameProceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4497-5
KeywordsCollaboration, data deletion, Human Behavior, privacy preservation, pubcrawl, Scalability, user generated content
Abstract

The prodigious amount of user-generated content continues to grow at an enormous rate. While it greatly facilitates the flow of information and ideas among people and communities, it may pose great threat to our individual privacy. In this paper, we demonstrate that the private traits of individuals can be inferred from user-generated content by using text classification techniques. Specifically, we study three private attributes on Twitter users: religion, political leaning, and marital status. The ground truth labels of the private traits can be readily collected from the Twitter bio field. Based on the tweets posted by the users and their corresponding bios, we show that text classification yields a high accuracy of identification of these personal attributes, which poses a great privacy risk on user-generated content. We further propose a constrained utility maximization framework for preserving user privacy. The goal is to maximize the utility of data when modifying the user-generated content, while degrading the prediction performance of the adversary. The KL divergence is minimized between the prior knowledge about the private attribute and the posterior probability after seeing the user-generated data. Based on this proposed framework, we investigate several specific data sanitization operations for privacy preservation: add, delete, or replace words in the tweets. We derive the exact transformation of the data under each operation. The experiments demonstrate the effectiveness of the proposed framework.

URLhttp://doi.acm.org/10.1145/2970398.2970417
DOI10.1145/2970398.2970417
Citation Keyfang_utility_2016