Biblio

List
Filter

Found 160 results

Filters: Keyword is Differential privacy [Clear All Filters]

2018-02-15

Vu, Xuan-Son, Jiang, Lili, Brändström, Anders, Elmroth, Erik. 2017. Personality-based Knowledge Extraction for Privacy-preserving Data Analysis. Proceedings of the Knowledge Capture Conference. :44:1–44:4.

In this paper, we present a differential privacy preserving approach, which extracts personality-based knowledge to serve privacy guarantee data analysis on personal sensitive data. Based on the approach, we further implement an end-to-end privacy guarantee system, KaPPA, to provide researchers iterative data analysis on sensitive data. The key challenge for differential privacy is determining a reasonable amount of privacy budget to balance privacy preserving and data utility. Most of the previous work applies unified privacy budget to all individual data, which leads to insufficient privacy protection for some individuals while over-protecting others. In KaPPA, the proposed personality-based privacy preserving approach automatically calculates privacy budget for each individual. Our experimental evaluations show a significant trade-off of sufficient privacy protection and data utility.

Phan, N., Wu, X., Hu, H., Dou, D.. 2017. Adaptive Laplace Mechanism: Differential Privacy Preservation in Deep Learning. 2017 IEEE International Conference on Data Mining (ICDM). :385–394.

In this paper, we focus on developing a novel mechanism to preserve differential privacy in deep neural networks, such that: (1) The privacy budget consumption is totally independent of the number of training steps; (2) It has the ability to adaptively inject noise into features based on the contribution of each to the output; and (3) It could be applied in a variety of different deep neural networks. To achieve this, we figure out a way to perturb affine transformations of neurons, and loss functions used in deep neural networks. In addition, our mechanism intentionally adds "more noise" into features which are "less relevant" to the model output, and vice-versa. Our theoretical analysis further derives the sensitivities and error bounds of our mechanism. Rigorous experiments conducted on MNIST and CIFAR-10 datasets show that our mechanism is highly effective and outperforms existing solutions.

2018-02-06

Heifetz, A., Mugunthan, V., Kagal, L.. 2017. Shade: A Differentially-Private Wrapper for Enterprise Big Data. 2017 IEEE International Conference on Big Data (Big Data). :1033–1042.

Enterprises usually provide strong controls to prevent cyberattacks and inadvertent leakage of data to external entities. However, in the case where employees and data scientists have legitimate access to analyze and derive insights from the data, there are insufficient controls and employees are usually permitted access to all information about the customers of the enterprise including sensitive and private information. Though it is important to be able to identify useful patterns of one's customers for better customization and service, customers' privacy must not be sacrificed to do so. We propose an alternative — a framework that will allow privacy preserving data analytics over big data. In this paper, we present an efficient and scalable framework for Apache Spark, a cluster computing framework, that provides strong privacy guarantees for users even in the presence of an informed adversary, while still providing high utility for analysts. The framework, titled Shade, includes two mechanisms — SparkLAP, which provides Laplacian perturbation based on a user's query and SparkSAM, which uses the contents of the database itself in order to calculate the perturbation. We show that the performance of Shade is substantially better than earlier differential privacy systems without loss of accuracy, particularly when run on datasets small enough to fit in memory, and find that SparkSAM can even exceed performance of an identical nonprivate Spark query.

Shi, Y., Piao, C., Zheng, L.. 2017. Differential-Privacy-Based Correlation Analysis in Railway Freight Service Applications. 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :35–39.

With the development of modern logistics industry railway freight enterprises as the main traditional logistics enterprises, the service mode is facing many problems. In the era of big data, for railway freight enterprises, coordinated development and sharing of information resources have become the requirements of the times, while how to protect the privacy of citizens has become one of the focus issues of the public. To prevent the disclosure or abuse of the citizens' privacy information, the citizens' privacy needs to be preserved in the process of information opening and sharing. However, most of the existing privacy preserving models cannot to be used to resist attacks with continuously growing background knowledge. This paper presents the method of applying differential privacy to protect associated data, which can be shared in railway freight service association information. First, the original service data need to slice by optimal shard length, then differential method and apriori algorithm is used to add Laplace noise in the Candidate sets. Thus the citizen's privacy information can be protected even if the attacker gets strong background knowledge. Last, sharing associated data to railway information resource partners. The steps and usefulness of the discussed privacy preservation method is illustrated by an example.

Hassoon, I. A., Tapus, N., Jasim, A. C.. 2017. Enhance Privacy in Big Data and Cloud via Diff-Anonym Algorithm. 2017 16th RoEduNet Conference: Networking in Education and Research (RoEduNet). :1–5.

The main issue with big data in cloud is the processed or used always need to be by third party. It is very important for the owners of data or clients to trust and to have the guarantee of privacy for the information stored in cloud or analyzed as big data. The privacy models studied in previous research showed that privacy infringement for big data happened because of limitation, privacy guarantee rate or dissemination of accurate data which is obtainable in the data set. In addition, there are various privacy models. In order to determine the best and the most appropriate model to be applied in the future, which also guarantees big data privacy, it is necessary to invest in research and study. In the next part, we surfed some of the privacy models in order to determine the advantages and disadvantages of each model in privacy assurance for big data in cloud. The present study also proposes combined Diff-Anonym algorithm (K-anonymity and differential models) to provide data anonymity with guarantee to keep balance between ambiguity of private data and clarity of general data.

2018-01-23

Acar, A., Celik, Z. B., Aksu, H., Uluagac, A. S., McDaniel, P.. 2017. Achieving Secure and Differentially Private Computations in Multiparty Settings. 2017 IEEE Symposium on Privacy-Aware Computing (PAC). :49–59.

Sharing and working on sensitive data in distributed settings from healthcare to finance is a major challenge due to security and privacy concerns. Secure multiparty computation (SMC) is a viable panacea for this, allowing distributed parties to make computations while the parties learn nothing about their data, but the final result. Although SMC is instrumental in such distributed settings, it does not provide any guarantees not to leak any information about individuals to adversaries. Differential privacy (DP) can be utilized to address this; however, achieving SMC with DP is not a trivial task, either. In this paper, we propose a novel Secure Multiparty Distributed Differentially Private (SM-DDP) protocol to achieve secure and private computations in a multiparty environment. Specifically, with our protocol, we simultaneously achieve SMC and DP in distributed settings focusing on linear regression on horizontally distributed data. That is, parties do not see each others’ data and further, can not infer information about individuals from the final constructed statistical model. Any statistical model function that allows independent calculation of local statistics can be computed through our protocol. The protocol implements homomorphic encryption for SMC and functional mechanism for DP to achieve the desired security and privacy guarantees. In this work, we first introduce the theoretical foundation for the SM-DDP protocol and then evaluate its efficacy and performance on two different datasets. Our results show that one can achieve individual-level privacy through the proposed protocol with distributed DP, which is independently applied by each party in a distributed fashion. Moreover, our results also show that the SM-DDP protocol incurs minimal computational overhead, is scalable, and provides security and privacy guarantees.

2018-01-16

Bindschaedler, Vincent, Rane, Shantanu, Brito, Alejandro E., Rao, Vanishree, Uzun, Ersin. 2017. Achieving Differential Privacy in Secure Multiparty Data Aggregation Protocols on Star Networks. Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. :115–125.

We consider the problem of privacy-preserving data aggregation in a star network topology, i.e., several untrusting participants connected to a single aggregator. We require that the participants do not discover each other's data, and the service provider remains oblivious to each participant's individual contribution. Furthermore, the final result is to be published in a differentially private manner, i.e., the result should not reveal the contribution of any single participant to a (possibly external) adversary who knows the contributions of all other participants. In other words, we require a secure multiparty computation protocol that also incorporates a differentially private mechanism. Previous solutions have resorted to caveats such as postulating a trusted dealer to distribute keys to the participants, or introducing additional entities to withhold the decryption key from the aggregator, or relaxing the star topology by allowing pairwise communication amongst the participants. In this paper, we show how to obtain a noisy (differentially private) aggregation result using Shamir secret sharing and additively homomorphic encryption without these mitigating assumptions. More importantly, while we assume semi-honest participants, we allow the aggregator to be stronger than semi-honest, specifically in the sense that he can try to reduce the noise in the differentially private result. To respect the differential privacy requirement, collusions of mutually untrusting entities need to be analyzed differently from traditional secure multiparty computation: It is not sufficient that such collusions do not reveal the data of honest participants; we must also ensure that the colluding entities cannot undermine differential privacy by reducing the amount of noise in the final result. Our protocols avoid this by requiring that no entity – neither the aggregator nor any participant – knows how much noise a participant contributes to the final result. We also ensure that if a cheating aggregator tries to influence the noise term in the differentially private output, he can be detected with overwhelming probability.

2018-01-10

Wu, Xiaotong, Dou, Wanchun, Ni, Qiang. 2017. Game Theory Based Privacy Preserving Analysis in Correlated Data Publication. Proceedings of the Australasian Computer Science Week Multiconference. :73:1–73:10.

Privacy preserving on data publication has been an important research field over the past few decades. One of the fundamental challenges in privacy preserving data publication is the trade-off problem between privacy and utility of the single and independent data set. However, recent research works have shown that the advanced privacy mechanism, i.e., differential privacy, is vulnerable when multiple data sets are correlated. In this case, the trade-off problem between privacy and utility is evolved into a game problem, in which the payoff of each player is dependent not only on his privacy parameter, but also on his neighbors' privacy parameters. In this paper, we firstly present the definition of correlated differential privacy to evaluate the real privacy level of a single data set influenced by the other data sets. Then, we construct a game model of multiple players, who each publishes the data set sanitized by differential privacy. Next, we analyze the existence and uniqueness of the pure Nash Equilibrium and demonstrate the sufficient conditions in the game. Finally, we refer to a notion, i.e., the price of anarchy, to evaluate efficiency of the pure Nash Equilibrium.

Ping, Haoyue, Stoyanovich, Julia, Howe, Bill. 2017. DataSynthesizer: Privacy-Preserving Synthetic Datasets. Proceedings of the 29th International Conference on Scientific and Statistical Database Management. :42:1–42:5.

To facilitate collaboration over sensitive data, we present DataSynthesizer, a tool that takes a sensitive dataset as input and generates a structurally and statistically similar synthetic dataset with strong privacy guarantees. The data owners need not release their data, while potential collaborators can begin developing models and methods with some confidence that their results will work similarly on the real dataset. The distinguishing feature of DataSynthesizer is its usability — the data owner does not have to specify any parameters to start generating and sharing data safely and effectively. DataSynthesizer consists of three high-level modules — DataDescriber, DataGenerator and ModelInspector. The first, DataDescriber, investigates the data types, correlations and distributions of the attributes in the private dataset, and produces a data summary, adding noise to the distributions to preserve privacy. DataGenerator samples from the summary computed by DataDescriber and outputs synthetic data. ModelInspector shows an intuitive description of the data summary that was computed by DataDescriber, allowing the data owner to evaluate the accuracy of the summarization process and adjust any parameters, if desired. We describe DataSynthesizer and illustrate its use in an urban science context, where sharing sensitive, legally encumbered data between agencies and with outside collaborators is reported as the primary obstacle to data-driven governance. The code implementing all parts of this work is publicly available at https://github.com/DataResponsibly/DataSynthesizer.

Zhang, Jun, Cormode, Graham, Procopiuc, Cecilia M., Srivastava, Divesh, Xiao, Xiaokui. 2017. PrivBayes: Private Data Release via Bayesian Networks. ACM Trans. Database Syst.. 42:25:1–25:41.

Privacy-preserving data publishing is an important problem that has been the focus of extensive study. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection without making restrictive assumptions about the adversary. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods require injecting a prohibitive amount of noise compared to the signal in the data, which renders the published data next to useless. To address the deficiency of the existing methods, this paper presents PrivBayes, a differentially private method for releasing high-dimensional data. Given a dataset D, PrivBayes first constructs a Bayesian network N, which (i) provides a succinct model of the correlations among the attributes in D and (ii) allows us to approximate the distribution of data in D using a set P of low-dimensional marginals of D. After that, PrivBayes injects noise into each marginal in P to ensure differential privacy and then uses the noisy marginals and the Bayesian network to construct an approximation of the data distribution in D. Finally, PrivBayes samples tuples from the approximate distribution to construct a synthetic dataset, and then releases the synthetic data. Intuitively, PrivBayes circumvents the curse of dimensionality, as it injects noise into the low-dimensional marginals in P instead of the high-dimensional dataset D. Private construction of Bayesian networks turns out to be significantly challenging, and we introduce a novel approach that uses a surrogate function for mutual information to build the model more accurately. We experimentally evaluate PrivBayes on real data and demonstrate that it significantly outperforms existing solutions in terms of accuracy.

He, Zaobo, Cai, Zhipeng, Sun, Yunchuan, Li, Yingshu, Cheng, Xiuzhen. 2017. Customized Privacy Preserving for Inherent Data and Latent Data. Personal Ubiquitous Comput.. 21:43–54.

The huge amount of sensory data collected from mobile devices has offered great potentials to promote more significant services based on user data extracted from sensor readings. However, releasing user data could also seriously threaten user privacy. It is possible to directly collect sensitive information from released user data without user permissions. Furthermore, third party users can also infer sensitive information contained in released data in a latent manner by utilizing data mining techniques. In this paper, we formally define these two types of threats as inherent data privacy and latent data privacy and construct a data-sanitization strategy that can optimize the tradeoff between data utility and customized two types of privacy. The key novel idea lies that the developed strategy can combat against powerful third party users with broad knowledge about users and launching optimal inference attacks. We show that our strategy does not reduce the benefit brought by user data much, while sensitive information can still be protected. To the best of our knowledge, this is the first work that preserves both inherent data privacy and latent data privacy.

2017-10-13

Barthe, Gilles, Farina, Gian Pietro, Gaboardi, Marco, Arias, Emilio Jesus Gallego, Gordon, Andy, Hsu, Justin, Strub, Pierre-Yves. 2016. Differentially Private Bayesian Programming. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :68–79.

We present PrivInfer, an expressive framework for writing and verifying differentially private Bayesian machine learning algorithms. Programs in PrivInfer are written in a rich functional probabilistic programming language with constructs for performing Bayesian inference. Then, differential privacy of programs is established using a relational refinement type system, in which refinements on probability types are indexed by a metric on distributions. Our framework leverages recent developments in Bayesian inference, probabilistic programming languages, and in relational refinement types. We demonstrate the expressiveness of PrivInfer by verifying privacy for several examples of private Bayesian inference.

2017-10-10

Yuan, Ganzhao, Yang, Yin, Zhang, Zhenjie, Hao, Zhifeng. 2016. Convex Optimization for Linear Query Processing Under Approximate Differential Privacy. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :2005–2014.

Differential privacy enables organizations to collect accurate aggregates over sensitive data with strong, rigorous guarantees on individuals' privacy. Previous work has found that under differential privacy, computing multiple correlated aggregates as a batch, using an appropriate strategy, may yield higher accuracy than computing each of them independently. However, finding the best strategy that maximizes result accuracy is non-trivial, as it involves solving a complex constrained optimization program that appears to be non-convex. Hence, in the past much effort has been devoted in solving this non-convex optimization program. Existing approaches include various sophisticated heuristics and expensive numerical solutions. None of them, however, guarantees to find the optimal solution of this optimization problem. This paper points out that under (ε, ཬ)-differential privacy, the optimal solution of the above constrained optimization problem in search of a suitable strategy can be found, rather surprisingly, by solving a simple and elegant convex optimization program. Then, we propose an efficient algorithm based on Newton's method, which we prove to always converge to the optimal solution with linear global convergence rate and quadratic local convergence rate. Empirical evaluations demonstrate the accuracy and efficiency of the proposed solution.

Bassily, Raef, Nissim, Kobbi, Smith, Adam, Steinke, Thomas, Stemmer, Uri, Ullman, Jonathan. 2016. Algorithmic Stability for Adaptive Data Analysis. Proceedings of the Forty-eighth Annual ACM Symposium on Theory of Computing. :1046–1059.

Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated a general formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution P and a set of n independent samples x is drawn from P. We seek an algorithm that, given x as input, accurately answers a sequence of adaptively chosen ``queries'' about the unknown distribution P. How many samples n must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions towards resolving this question: We give upper bounds on the number of samples n that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015; NIPS, 2015). We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and an important class of optimization queries (alternatively, risk minimization queries). As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that the stability notion guaranteed by differential privacy implies low generalization error. We also show that weaker stability guarantees such as bounded KL divergence and total variation distance lead to correspondingly weaker generalization guarantees.

2017-09-15

Ghassemi, Mohsen, Sarwate, Anand D., Wright, Rebecca N.. 2016. Differentially Private Online Active Learning with Applications to Anomaly Detection. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. :117–128.

In settings where data instances are generated sequentially or in streaming fashion, online learning algorithms can learn predictors using incremental training algorithms such as stochastic gradient descent. In some security applications such as training anomaly detectors, the data streams may consist of private information or transactions and the output of the learning algorithms may reveal information about the training data. Differential privacy is a framework for quantifying the privacy risk in such settings. This paper proposes two differentially private strategies to mitigate privacy risk when training a classifier for anomaly detection in an online setting. The first is to use a randomized active learning heuristic to screen out uninformative data points in the stream. The second is to use mini-batching to improve classifier performance. Experimental results show how these two strategies can trade off privacy, label complexity, and generalization performance.

2017-08-22

Jansen, Rob, Johnson, Aaron. 2016. Safely Measuring Tor. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :1553–1567.

Tor is a popular network for anonymous communication. The usage and operation of Tor is not well-understood, however, because its privacy goals make common measurement approaches ineffective or risky. We present PrivCount, a system for measuring the Tor network designed with user privacy as a primary goal. PrivCount securely aggregates measurements across Tor relays and over time to produce differentially private outputs. PrivCount improves on prior approaches by enabling flexible exploration of many diverse kinds of Tor measurements while maintaining accuracy and privacy for each. We use PrivCount to perform a measurement study of Tor of sufficient breadth and depth to inform accurate models of Tor users and traffic. Our results indicate that Tor has 710,000 users connected but only 550,000 active at a given time, that Web traffic now constitutes 91% of data bytes on Tor, and that the strictness of relays' connection policies significantly affects the type of application data they forward.

2017-07-24

Su, Dong, Cao, Jianneng, Li, Ninghui, Bertino, Elisa, Jin, Hongxia. 2016. Differentially Private K-Means Clustering. Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy. :26–37.

There are two broad approaches for differentially private data analysis. The interactive approach aims at developing customized differentially private algorithms for various data mining tasks. The non-interactive approach aims at developing differentially private algorithms that can output a synopsis of the input dataset, which can then be used to support various data mining tasks. In this paper we study the effectiveness of the two approaches on differentially private k-means clustering. We develop techniques to analyze the empirical error behaviors of the existing interactive and non-interactive approaches. Based on the analysis, we propose an improvement of DPLloyd which is a differentially private version of the Lloyd algorithm. We also propose a non-interactive approach EUGkM which publishes a differentially private synopsis for k-means clustering. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of our improvement on DPLloyd and the proposed EUGkM algorithm.

2017-05-22

Zhu, Xue, Sun, Yuqing. 2016. Differential Privacy for Collaborative Filtering Recommender Algorithm. Proceedings of the 2016 ACM on International Workshop on Security And Privacy Analytics. :9–16.

Collaborative filtering plays an essential role in a recommender system, which recommends a list of items to a user by learning behavior patterns from user rating matrix. However, if an attacker has some auxiliary knowledge about a user purchase history, he/she can infer more information about this user. This brings great threats to user privacy. Some methods adopt differential privacy algorithms in collaborative filtering by adding noises to a rating matrix. Although they provide theoretically private results, the influence on recommendation accuracy are not discussed. In this paper, we solve the privacy problem in recommender system in a different way by applying the differential privacy method into the procedure of recommendation. We design two differentially private recommender algorithms with sampling, named Differentially Private Item Based Recommendation with sampling (DP-IR for short) and Differentially Private User Based Recommendation with sampling(DP-UR for short). Both algorithms are based on the exponential mechanism with a carefully designed quality function. Theoretical analyses on privacy of these algorithms are presented. We also investigate the accuracy of the proposed method and give theoretical results. Experiments are performed on real datasets to verify our methods.

Barthe, Gilles, Fong, Noémie, Gaboardi, Marco, Grégoire, Benjamin, Hsu, Justin, Strub, Pierre-Yves. 2016. Advanced Probabilistic Couplings for Differential Privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :55–67.

Differential privacy is a promising formal approach to data privacy, which provides a quantitative bound on the privacy cost of an algorithm that operates on sensitive information. Several tools have been developed for the formal verification of differentially private algorithms, including program logics and type systems. However, these tools do not capture fundamental techniques that have emerged in recent years, and cannot be used for reasoning about cutting-edge differentially private algorithms. Existing techniques fail to handle three broad classes of algorithms: 1) algorithms where privacy depends on accuracy guarantees, 2) algorithms that are analyzed with the advanced composition theorem, which shows slower growth in the privacy cost, 3) algorithms that interactively accept adaptive inputs. We address these limitations with a new formalism extending apRHL, a relational program logic that has been used for proving differential privacy of non-interactive algorithms, and incorporating aHL, a (non-relational) program logic for accuracy properties. We illustrate our approach through a single running example, which exemplifies the three classes of algorithms and explores new variants of the Sparse Vector technique, a well-studied algorithm from the privacy literature. We implement our logic in EasyCrypt, and formally verify privacy. We also introduce a novel coupling technique called optimal subset coupling that may be of independent interest.

Day, Wei-Yen, Li, Ninghui, Lyu, Min. 2016. Publishing Graph Degree Distribution with Node Differential Privacy. Proceedings of the 2016 International Conference on Management of Data. :123–138.

Graph data publishing under node-differential privacy (node-DP) is challenging due to the huge sensitivity of queries. However, since a node in graph data oftentimes represents a person, node-DP is necessary to achieve personal data protection. In this paper, we investigate the problem of publishing the degree distribution of a graph under node-DP by exploring the projection approach to reduce the sensitivity. We propose two approaches based on aggregation and cumulative histogram to publish the degree distribution. The experiments demonstrate that our approaches greatly reduce the error of approximating the true degree distribution and have significant improvement over existing works. We also present the introspective analysis for understanding the factors of publishing the degree distribution with node-DP.

Nguyen, Hiep H., Imine, Abdessamad, Rusinowitch, Michaël. 2016. Detecting Communities Under Differential Privacy. Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society. :83–93.

Complex networks usually expose community structure with groups of nodes sharing many links with the other nodes in the same group and relatively few with the nodes of the rest. This feature captures valuable information about the organization and even the evolution of the network. Over the last decade, a great number of algorithms for community detection have been proposed to deal with the increasingly complex networks. However, the problem of doing this in a private manner is rarely considered. In this paper, we solve this problem under differential privacy, a prominent privacy concept for releasing private data. We analyze the major challenges behind the problem and propose several schemes to tackle them from two perspectives: input perturbation and algorithm perturbation. We choose Louvain method as the back-end community detection for input perturbation schemes and propose the method LouvainDP which runs Louvain algorithm on a noisy super-graph. For algorithm perturbation, we design ModDivisive using exponential mechanism with the modularity as the score. We have thoroughly evaluated our techniques on real graphs of different sizes and verified that ModDivisive steadily gives the best modularity and avg.F1Score on large graphs while LouvainDP outperforms the remaining input perturbation competitors in certain settings.

Qin, Zhan, Yang, Yin, Yu, Ting, Khalil, Issa, Xiao, Xiaokui, Ren, Kui. 2016. Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :192–203.

In local differential privacy (LDP), each user perturbs her data locally before sending the noisy data to a data collector. The latter then analyzes the data to obtain useful statistics. Unlike the setting of centralized differential privacy, in LDP the data collector never gains access to the exact values of sensitive data, which protects not only the privacy of data contributors but also the collector itself against the risk of potential data leakage. Existing LDP solutions in the literature are mostly limited to the case that each user possesses a tuple of numeric or categorical values, and the data collector computes basic statistics such as counts or mean values. To the best of our knowledge, no existing work tackles more complex data mining tasks such as heavy hitter discovery over set-valued data. In this paper, we present a systematic study of heavy hitter mining under LDP. We first review existing solutions, extend them to the heavy hitter estimation, and explain why their effectiveness is limited. We then propose LDPMiner, a two-phase mechanism for obtaining accurate heavy hitters with LDP. The main idea is to first gather a candidate set of heavy hitters using a portion of the privacy budget, and focus the remaining budget on refining the candidate set in a second phase, which is much more efficient budget-wise than obtaining the heavy hitters directly from the whole dataset. We provide both in-depth theoretical analysis and extensive experiments to compare LDPMiner against adaptations of previous solutions. The results show that LDPMiner significantly improves over existing methods. More importantly, LDPMiner successfully identifies the majority true heavy hitters in practical settings.

To, Hien, Nguyen, Kien, Shahabi, Cyrus. 2016. Differentially Private Publication of Location Entropy. Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. :35:1–35:10.

Location entropy (LE) is a popular metric for measuring the popularity of various locations (e.g., points-of-interest). Unlike other metrics computed from only the number of (unique) visits to a location, namely frequency, LE also captures the diversity of the users' visits, and is thus more accurate than other metrics. Current solutions for computing LE require full access to the past visits of users to locations, which poses privacy threats. This paper discusses, for the first time, the problem of perturbing location entropy for a set of locations according to differential privacy. The problem is challenging because removing a single user from the dataset will impact multiple records of the database; i.e., all the visits made by that user to various locations. Towards this end, we first derive non-trivial, tight bounds for both local and global sensitivity of LE, and show that to satisfy ε-differential privacy, a large amount of noise must be introduced, rendering the published results useless. Hence, we propose a thresholding technique to limit the number of users' visits, which significantly reduces the perturbation error but introduces an approximation error. To achieve better utility, we extend the technique by adopting two weaker notions of privacy: smooth sensitivity (slightly weaker) and crowd-blending (strictly weaker). Extensive experiments on synthetic and real-world datasets show that our proposed techniques preserve original data distribution without compromising location privacy.

Cuff, Paul, Yu, Lanqing. 2016. Differential Privacy As a Mutual Information Constraint. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :43–54.

Differential privacy is a precise mathematical constraint meant to ensure privacy of individual pieces of information in a database even while queries are being answered about the aggregate. Intuitively, one must come to terms with what differential privacy does and does not guarantee. For example, the definition prevents a strong adversary who knows all but one entry in the database from further inferring about the last one. This strong adversary assumption can be overlooked, resulting in misinterpretation of the privacy guarantee of differential privacy. Herein we give an equivalent definition of privacy using mutual information that makes plain some of the subtleties of differential privacy. The mutual-information differential privacy is in fact sandwiched between ε-differential privacy and (ε,δ)-differential privacy in terms of its strength. In contrast to previous works using unconditional mutual information, differential privacy is fundamentally related to conditional mutual information, accompanied by a maximization over the database distribution. The conceptual advantage of using mutual information, aside from yielding a simpler and more intuitive definition of differential privacy, is that its properties are well understood. Several properties of differential privacy are easily verified for the mutual information alternative, such as composition theorems.