A Utility Maximization Framework for Privacy Preservation of User Generated Content

Submitted by grigby1 on Mon, 06/05/2017 - 12:29pm

Title	A Utility Maximization Framework for Privacy Preservation of User Generated Content
Publication Type	Conference Paper
Year of Publication	2016
Authors	Fang, Yi, Godavarthy, Archana, Lu, Haibing
Conference Name	Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
Publisher	ACM
Conference Location	New York, NY, USA
ISBN Number	978-1-4503-4497-5
Keywords	Collaboration, data deletion, Human Behavior, privacy preservation, pubcrawl, Scalability, user generated content
Abstract	The prodigious amount of user-generated content continues to grow at an enormous rate. While it greatly facilitates the flow of information and ideas among people and communities, it may pose great threat to our individual privacy. In this paper, we demonstrate that the private traits of individuals can be inferred from user-generated content by using text classification techniques. Specifically, we study three private attributes on Twitter users: religion, political leaning, and marital status. The ground truth labels of the private traits can be readily collected from the Twitter bio field. Based on the tweets posted by the users and their corresponding bios, we show that text classification yields a high accuracy of identification of these personal attributes, which poses a great privacy risk on user-generated content. We further propose a constrained utility maximization framework for preserving user privacy. The goal is to maximize the utility of data when modifying the user-generated content, while degrading the prediction performance of the adversary. The KL divergence is minimized between the prior knowledge about the private attribute and the posterior probability after seeing the user-generated data. Based on this proposed framework, we investigate several specific data sanitization operations for privacy preservation: add, delete, or replace words in the tweets. We derive the exact transformation of the data under each operation. The experiments demonstrate the effectiveness of the proposed framework.
URL	http://doi.acm.org/10.1145/2970398.2970417
DOI	10.1145/2970398.2970417
Citation Key	fang_utility_2016

Groups:

Science of Security VO