Biblio
The main goal of this work is to create a model of trust which can be considered as a reference for developing applications oriented on collaborative annotation. Such a model includes design parameters inferred from online communities operated on collaborative content. This study aims to create a static model, but it could be dynamic or more than one model depending on the context of an application. An analysis on Genius as a peer production community was done to understand user behaviors. This study characterizes user interactions based on the differentiation between Lightweight Peer Production (LWPP) and Heavyweight Peer Production (HWPP). It was found that more LWPP- interactions take place in the lower levels of this system. As the level in the role system increases, there will be more HWPP-interactions. This can be explained as LWPP-interacions are straightforward, while HWPP-interations demand more agility by the user. These provide more opportunities and therefore attract other users for further interactions.
The prodigious amount of user-generated content continues to grow at an enormous rate. While it greatly facilitates the flow of information and ideas among people and communities, it may pose great threat to our individual privacy. In this paper, we demonstrate that the private traits of individuals can be inferred from user-generated content by using text classification techniques. Specifically, we study three private attributes on Twitter users: religion, political leaning, and marital status. The ground truth labels of the private traits can be readily collected from the Twitter bio field. Based on the tweets posted by the users and their corresponding bios, we show that text classification yields a high accuracy of identification of these personal attributes, which poses a great privacy risk on user-generated content. We further propose a constrained utility maximization framework for preserving user privacy. The goal is to maximize the utility of data when modifying the user-generated content, while degrading the prediction performance of the adversary. The KL divergence is minimized between the prior knowledge about the private attribute and the posterior probability after seeing the user-generated data. Based on this proposed framework, we investigate several specific data sanitization operations for privacy preservation: add, delete, or replace words in the tweets. We derive the exact transformation of the data under each operation. The experiments demonstrate the effectiveness of the proposed framework.