Biblio

List
Filter

Found 160 results

Filters: Keyword is Differential privacy [Clear All Filters]

2020-01-06

Cormode, Graham, Jha, Somesh, Kulkarni, Tejas, Li, Ninghui, Srivastava, Divesh, Wang, Tianhao. 2018. Privacy at Scale: Local Differential Privacy in Practice. Proceedings of the 2018 International Conference on Management of Data. :1655–1658.

Local differential privacy (LDP), where users randomly perturb their inputs to provide plausible deniability of their data without the need for a trusted party, has been adopted recently by several major technology organizations, including Google, Apple and Microsoft. This tutorial aims to introduce the key technical underpinnings of these deployed systems, to survey current research that addresses related problems within the LDP model, and to identify relevant open problems and research directions for the community.

Zhang, Zhikun, Wang, Tianhao, Li, Ninghui, He, Shibo, Chen, Jiming. 2018. CALM: Consistent Adaptive Local Marginal for Marginal Release Under Local Differential Privacy. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :212–229.

Marginal tables are the workhorse of capturing the correlations among a set of attributes. We consider the problem of constructing marginal tables given a set of user's multi-dimensional data while satisfying Local Differential Privacy (LDP), a privacy notion that protects individual user's privacy without relying on a trusted third party. Existing works on this problem perform poorly in the high-dimensional setting; even worse, some incur very expensive computational overhead. In this paper, we propose CALM, Consistent Adaptive Local Marginal, that takes advantage of the careful challenge analysis and performs consistently better than existing methods. More importantly, CALM can scale well with large data dimensions and marginal sizes. We conduct extensive experiments on several real world datasets. Experimental results demonstrate the effectiveness and efficiency of CALM over existing methods.

2019-12-16

Mazloom, Sahar, Gordon, S. Dov. 2018. Secure Computation with Differentially Private Access Patterns. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :490-507.

We explore a new security model for secure computation on large datasets. We assume that two servers have been employed to compute on private data that was collected from many users, and, in order to improve the efficiency of their computation, we establish a new tradeoff with privacy. Specifically, instead of claiming that the servers learn nothing about the input values, we claim that what they do learn from the computation preserves the differential privacy of the input. Leveraging this relaxation of the security model allows us to build a protocol that leaks some information in the form of access patterns to memory, while also providing a formal bound on what is learned from the leakage. We then demonstrate that this leakage is useful in a broad class of computations. We show that computations such as histograms, PageRank and matrix factorization, which can be performed in common graph-parallel frameworks such as MapReduce or Pregel, benefit from our relaxation. We implement a protocol for securely executing graph-parallel computations, and evaluate the performance on the three examples just mentioned above. We demonstrate marked improvement over prior implementations for these computations.

Hou, Ming, Li, Dequan, Wu, Xiongjun, Shen, Xiuyu. 2019. Differential Privacy of Online Distributed Optimization under Adversarial Nodes. 2019 Chinese Control Conference (CCC). :2172-2177.

Nowadays, many applications involve big data and big data analysis methods appear in many fields. As a preliminary attempt to solve the challenge of big data analysis, this paper presents a distributed online learning algorithm based on differential privacy. Since online learning can effectively process sensitive data, we introduce the concept of differential privacy in distributed online learning algorithms, with the aim at ensuring data privacy during online learning to prevent adversarial nodes from inferring any important data information. In particular, for different adversary models, we consider different type graphs to tolerate a limited number of adversaries near each regular node or tolerate a global limited number of adversaries.

Sun, Lin, Zhang, Lan, Ye, Xiaojun. 2018. Randomized Bit Vector: Privacy-Preserving Encoding Mechanism. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. :1263–1272.

Recently, many methods have been proposed to prevent privacy leakage in record linkage by encoding record pair data into another anonymous space. Nevertheless, they cannot perform well in some circumstances due to high computational complexities, low privacy guarantees or loss of data utility. In this paper, we propose distance-aware encoding mechanisms to compare numerical values in the anonymous space. We first embed numerical values into Hamming space by a low-computational encoding algorithm with randomized bit vector. To provide rigorous privacy guarantees, we use the random response based on differential privacy to keep global indistinguishability of original data and use Laplace noises via pufferfish mechanism to provide local indistinguishability. Besides, we provide an approach for embedding and privacy-related parameters selection to improve data utility. Experiments on datasets from different data distributions and application contexts validate that our approaches can be used efficiently in privacy-preserving record linkage tasks compared with previous works and have excellent performance even under very small privacy budgets.

Ding, Xiaofeng, Zhang, Xiaodong, Bao, Zhifeng, Jin, Hai. 2018. Privacy-Preserving Triangle Counting in Large Graphs. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. :1283–1292.

Triangle count is a critical parameter in mining relationships among people in social networks. However, directly publishing the findings obtained from triangle counts may bring potential privacy concern, which raises great challenges and opportunities for privacy-preserving triangle counting. In this paper, we choose to use differential privacy to protect triangle counting for large scale graphs. To reduce the large sensitivity caused in large graphs, we propose a novel graph projection method that can be used to obtain an upper bound for sensitivity in different distributions. In particular, we publish the triangle counts satisfying the node-differential privacy with two kinds of histograms: the triangle count distribution and the cumulative distribution. Moreover, we extend the research on privacy preserving triangle counting to one of its applications, the local clustering coefficient. Experimental results show that the cumulative distribution can fit the real statistical information better, and our proposed mechanism has achieved better accuracy for triangle counts while maintaining the requirement of differential privacy.

2019-12-11

Laud, Peeter, Pettai, Martin, Randmets, Jaak. 2018. Sensitivity Analysis of SQL Queries. Proceedings of the 13th Workshop on Programming Languages and Analysis for Security. :2–12.

The sensitivity of a function is the maximum change of its output for a unit change of its input. In this paper we present a method for determining the sensitivity of SQL queries, seen as functions from databases to datasets, where the change is measured in the number of rows that differ. Given a query, a database schema and a number, our method constructs a formula that is satisfiable only if the sensitivity of the query is bigger than this number. Our method is composable, and can also be applied to SQL workflows. Our results can be used to calibrate the amount of noise that has to be added to the output of the query to obtain a certain level of differential privacy.

2019-06-24

Cao, H., Liu, S., Guan, Z., Wu, L., Deng, H., Du, X.. 2018. An Efficient Privacy-Preserving Algorithm Based on Randomized Response in IoT-Based Smart Grid. 2018 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computing, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). :881–886.

In this paper, we propose a new randomized response algorithm that can achieve differential-privacy and utility guarantees for consumer's behaviors, and process a batch of data at each time. Firstly, differing from traditional differential private approach-es, we add randomized response noise into the behavior signa-tures matrix to achieve an acceptable utility-privacy tradeoff. Secondly, a behavior signature modeling method based on sparse coding is proposed. After some lightweight trainings us-ing the energy consumption data, the dictionary will be associat-ed with the behavior characteristics of the electric appliances. At last, through the experimental results verification, we find that our Algorithm can preserve consumer's privacy without comprising utility.

Wang, J., Zhang, X., Zhang, H., Lin, H., Tode, H., Pan, M., Han, Z.. 2018. Data-Driven Optimization for Utility Providers with Differential Privacy of Users' Energy Profile. 2018 IEEE Global Communications Conference (GLOBECOM). :1–6.

Smart meters migrate conventional electricity grid into digitally enabled Smart Grid (SG), which is more reliable and efficient. Fine-grained energy consumption data collected by smart meters helps utility providers accurately predict users' demands and significantly reduce power generation cost, while it imposes severe privacy risks on consumers and may discourage them from using those “espionage meters". To enjoy the benefits of smart meter measured data without compromising the users' privacy, in this paper, we try to integrate distributed differential privacy (DDP) techniques into data-driven optimization, and propose a novel scheme that not only minimizes the cost for utility providers but also preserves the DDP of users' energy profiles. Briefly, we add differential private noises to the users' energy consumption data before the smart meters send it to the utility provider. Due to the uncertainty of the users' demand distribution, the utility provider aggregates a given set of historical users' differentially private data, estimates the users' demands, and formulates the data- driven cost minimization based on the collected noisy data. We also develop algorithms for feasible solutions, and verify the effectiveness of the proposed scheme through simulations using the simulated energy consumption data generated from the utility company's real data analysis.

2019-03-28

Ambassa, P. L., Kayem, A. V. D. M., Wolthusen, S. D., Meinel, C.. 2018. Privacy Risks in Resource Constrained Smart Micro-Grids. 2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA). :527-532.

In rural/remote areas, resource constrained smart micro-grid (RCSMG) architectures can offer a cost-effective power management and supply alternative to national power grid connections. RCSMG architectures handle communications over distributed lossy networks to minimize operation costs. However, the unreliable nature of lossy networks makes privacy an important consideration. Existing anonymisation works on data perturbation work mainly by distortion with additive noise. Apply these solutions to RCSMGs is problematic, because deliberate noise additions must be distinguishable both from system and adversarial generated noise. In this paper, we present a brief survey of privacy risks in RCSMGs centered on inference, and propose a method of mitigating these risks. The lesson here is that while RCSMGs give users more control over power management and distribution, good anonymisation is essential to protecting personal information on RCSMGs.

2019-02-22

Meiser, Sebastian, Mohammadi, Esfandiar. 2018. Tight on Budget?: Tight Bounds for r-Fold Approximate Differential Privacy Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :247-264.

Many applications, such as anonymous communication systems, privacy-enhancing database queries, or privacy-enhancing machine-learning methods, require robust guarantees under thousands and sometimes millions of observations. The notion of r-fold approximate differential privacy (ADP) offers a well-established framework with a precise characterization of the degree of privacy after r observations of an attacker. However, existing bounds for r-fold ADP are loose and, if used for estimating the required degree of noise for an application, can lead to over-cautious choices for perturbation randomness and thus to suboptimal utility or overly high costs. We present a numerical and widely applicable method for capturing the privacy loss of differentially private mechanisms under composition, which we call privacy buckets. With privacy buckets we compute provable upper and lower bounds for ADP for a given number of observations. We compare our bounds with state-of-the-art bounds for r-fold ADP, including Kairouz, Oh, and Viswanath's composition theorem (KOV), concentrated differential privacy and the moments accountant. While KOV proved optimal bounds for heterogeneous adaptive k-fold composition, we show that for concrete sequences of mechanisms tighter bounds can be derived by taking the mechanisms' structure into account. We compare previous bounds for the Laplace mechanism, the Gauss mechanism, for a timing leakage reduction mechanism, and for the stochastic gradient descent and we significantly improve over their results (except that we match the KOV bound for the Laplace mechanism, for which it seems tight). Our lower bounds almost meet our upper bounds, showing that no significantly tighter bounds are possible.

2018-11-28

Tyagi, Nirvan, Gilad, Yossi, Leung, Derek, Zaharia, Matei, Zeldovich, Nickolai. 2017. Stadium: A Distributed Metadata-Private Messaging System. Proceedings of the 26th Symposium on Operating Systems Principles. :423–440.

Private communication over the Internet remains a challenging problem. Even if messages are encrypted, it is hard to deliver them without revealing metadata about which pairs of users are communicating. Scalable anonymity systems, such as Tor, are susceptible to traffic analysis attacks that leak metadata. In contrast, the largest-scale systems with metadata privacy require passing all messages through a small number of providers, requiring a high operational cost for each provider and limiting their deployability in practice. This paper presents Stadium, a point-to-point messaging system that provides metadata and data privacy while scaling its work efficiently across hundreds of low-cost providers operated by different organizations. Much like Vuvuzela, the current largest-scale metadata-private system, Stadium achieves its provable guarantees through differential privacy and the addition of noisy cover traffic. The key challenge in Stadium is limiting the information revealed from the many observable traffic links of a highly distributed system, without requiring an overwhelming amount of noise. To solve this challenge, Stadium introduces techniques for distributed noise generation and differentially private routing as well as a verifiable parallel mixnet design where the servers collaboratively check that others follow the protocol. We show that Stadium can scale to support 4x more users than Vuvuzela using servers that cost an order of magnitude less to operate than Vuvuzela nodes.

2018-09-28

Jung, Taebo, Jung, Kangsoo, Park, Sehwa, Park, Seog. 2017. A noise parameter configuration technique to mitigate detour inference attack on differential privacy. 2017 IEEE International Conference on Big Data and Smart Computing (BigComp). :186–192.

Nowadays, data has become more important as the core resource for the information society. However, with the development of data analysis techniques, the privacy violation such as leakage of sensitive data and personal identification exposure are also increasing. Differential privacy is the technique to satisfy the requirement that any additional information should not be disclosed except information from the database itself. It is well known for protecting the privacy from arbitrary attack. However, recent research argues that there is a several ways to infer sensitive information from data although the differential privacy is applied. One of this inference method is to use the correlation between the data. In this paper, we investigate the new privacy threats using attribute correlation which are not covered by traditional studies and propose a privacy preserving technique that configures the differential privacy's noise parameter to solve this new threat. In the experiment, we show the weaknesses of traditional differential privacy method and validate that the proposed noise parameter configuration method provide a sufficient privacy protection and maintain an accuracy of data utility.

Tsou, Y., Chen, H., Chen, J., Huang, Y., Wang, P.. 2017. Differential privacy-based data de-identification protection and risk evaluation system. 2017 International Conference on Information and Communication Technology Convergence (ICTC). :416–421.

As more and more technologies to store and analyze massive amount of data become available, it is extremely important to make privacy-sensitive data de-identified so that further analysis can be conducted by different parties. For example, data needs to go through data de-identification process before being transferred to institutes for further value added analysis. As such, privacy protection issues associated with the release of data and data mining have become a popular field of study in the domain of big data. As a strict and verifiable definition of privacy, differential privacy has attracted noteworthy attention and widespread research in recent years. Nevertheless, differential privacy is not practical for most applications due to its performance of synthetic dataset generation for data query. Moreover, the definition of data protection by randomized noise in native differential privacy is abstract to users. Therefore, we design a pragmatic DP-based data de-identification protection and risk of data disclosure estimation system, in which a DP-based noise addition mechanism is applied to generate synthetic datasets. Furthermore, the risk of data disclosure to these synthetic datasets can be evaluated before releasing to buyers/consumers.

Alnemari, A., Romanowski, C. J., Raj, R. K.. 2017. An Adaptive Differential Privacy Algorithm for Range Queries over Healthcare Data. 2017 IEEE International Conference on Healthcare Informatics (ICHI). :397–402.

Differential privacy is an approach that preserves patient privacy while permitting researchers access to medical data. This paper presents mechanisms proposed to satisfy differential privacy while answering a given workload of range queries. Representing input data as a vector of counts, these methods partition the vector according to relationships between the data and the ranges of the given queries. After partitioning the vector into buckets, the counts of each bucket are estimated privately and split among the bucket's positions to answer the given query set. The performance of the proposed method was evaluated using different workloads over several attributes. The results show that partitioning the vector based on the data can produce more accurate answers, while partitioning the vector based on the given workload improves privacy. This paper's two main contributions are: (1) improving earlier work on partitioning mechanisms by building a greedy algorithm to partition the counts' vector efficiently, and (2) its adaptive algorithm considers the sensitivity of the given queries before providing results.

Hu, J., Shi, W., Liu, H., Yan, J., Tian, Y., Wu, Z.. 2017. Preserving Friendly-Correlations in Uncertain Graphs Using Differential Privacy. 2017 International Conference on Networking and Network Applications (NaNA). :24–29.

It is a challenging problem to preserve the friendly-correlations between individuals when publishing social-network data. To alleviate this problem, uncertain graph has been presented recently. The main idea of uncertain graph is converting an original graph into an uncertain form, where the correlations between individuals is an associated probability. However, the existing methods of uncertain graph lack rigorous guarantees of privacy and rely on the assumption of adversary's knowledge. In this paper we first introduced a general model for constructing uncertain graphs. Then, we proposed an algorithm under the model which is based on differential privacy and made an analysis of algorithm's privacy. Our algorithm provides rigorous guarantees of privacy and against the background knowledge attack. Finally, the algorithm we proposed satisfied differential privacy and showed feasibility in the experiments. And then, we compare our algorithm with (k, ε)-obfuscation algorithm in terms of data utility, the importance of nodes for network in our algorithm is similar to (k, ε)-obfuscation algorithm.

Li-Xin, L., Yong-Shan, D., Jia-Yan, W.. 2017. Differential Privacy Data Protection Method Based on Clustering. 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :11–16.

To enhance privacy protection and improve data availability, a differential privacy data protection method ICMD-DP is proposed. Based on insensitive clustering algorithm, ICMD-DP performs differential privacy on the results of ICMD (insensitive clustering method for mixed data). The combination of clustering and differential privacy realizes the differentiation of query sensitivity from single record to group record. At the meanwhile, it reduces the risk of information loss and information disclosure. In addition, to satisfy the requirement of maintaining differential privacy for mixed data, ICMD-DP uses different methods to calculate the distance and centroid of categorical and numerical attributes. Finally, experiments are given to illustrate the availability of the method.

Cao, H., Liu, S., Zhao, R., Gu, H., Bao, J., Zhu, L.. 2017. A Privacy Preserving Model for Energy Internet Base on Differential Privacy. 2017 IEEE International Conference on Energy Internet (ICEI). :204–209.

Comparing with the traditional grid, energy internet will collect data widely and connect more broader. The analysis of electrical data use of Non-intrusive Load Monitoring (NILM) can infer user behavior privacy. Consideration both data security and availability is a problem must be addressed. Due to its rigid and provable privacy guarantee, Differential Privacy has proverbially reached and applied to privacy preserving data release and data mining. Because of its high sensitivity, increases the noise directly will led to data unavailable. In this paper, we propose a differentially private mechanism to protect energy internet privacy. Our focus is the aggregated data be released by data owner after added noise in disaggregated data. The theoretically proves and experiments show that our scheme can achieve the purpose of privacy-preserving and data availability.

Li, Z., Li, S.. 2017. Random forest algorithm under differential privacy. 2017 IEEE 17th International Conference on Communication Technology (ICCT). :1901–1905.

Trying to solve the risk of data privacy disclosure in classification process, a Random Forest algorithm under differential privacy named DPRF-gini is proposed in the paper. In the process of building decision tree, the algorithm first disturbed the process of feature selection and attribute partition by using exponential mechanism, and then meet the requirement of differential privacy by adding Laplace noise to the leaf node. Compared with the original algorithm, Empirical results show that protection of data privacy is further enhanced while the accuracy of the algorithm is slightly reduced.

Lu, Z., Shen, H.. 2017. A New Lower Bound of Privacy Budget for Distributed Differential Privacy. 2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). :25–32.

Distributed data aggregation via summation (counting) helped us to learn the insights behind the raw data. However, such computing suffered from a high privacy risk of malicious collusion attacks. That is, the colluding adversaries infer a victim's privacy from the gaps between the aggregation outputs and their source data. Among the solutions against such collusion attacks, Distributed Differential Privacy (DDP) shows a significant effect of privacy preservation. Specifically, a DDP scheme guarantees the global differential privacy (the presence or absence of any data curator barely impacts the aggregation outputs) by ensuring local differential privacy at the end of each data curator. To guarantee an overall privacy performance of a distributed data aggregation system against malicious collusion attacks, part of the existing work on such DDP scheme aim to provide an estimated lower bound of privacy budget for the global differential privacy. However, there are two main problems: low data utility from using a large global function sensitivity; unknown privacy guarantee when the aggregation sensitivity of the whole system is less than the sum of the data curator's aggregation sensitivity. To address these problems while ensuring distributed differential privacy, we provide a new lower bound of privacy budget, which works with an unconditional aggregation sensitivity of the whole distributed system. Moreover, we study the performance of our privacy bound in different scenarios of data updates. Both theoretical and experimental evaluations show that our privacy bound offers better global privacy performance than the existing work.

2018-09-05

Li, C., Palanisamy, B., Joshi, J.. 2017. Differentially Private Trajectory Analysis for Points-of-Interest Recommendation. 2017 IEEE International Congress on Big Data (BigData Congress). :49–56.

Ubiquitous deployment of low-cost mobile positioning devices and the widespread use of high-speed wireless networks enable massive collection of large-scale trajectory data of individuals moving on road networks. Trajectory data mining finds numerous applications including understanding users' historical travel preferences and recommending places of interest to new visitors. Privacy-preserving trajectory mining is an important and challenging problem as exposure of sensitive location information in the trajectories can directly invade the location privacy of the users associated with the trajectories. In this paper, we propose a differentially private trajectory analysis algorithm for points-of-interest recommendation to users that aims at maximizing the accuracy of the recommendation results while protecting the privacy of the exposed trajectories with differential privacy guarantees. Our algorithm first transforms the raw trajectory dataset into a bipartite graph with nodes representing the users and the points-of-interest and the edges representing the visits made by the users to the locations, and then extracts the association matrix representing the bipartite graph to inject carefully calibrated noise to meet έ-differential privacy guarantees. A post-processing of the perturbed association matrix is performed to suppress noise prior to performing a Hyperlink-Induced Topic Search (HITS) on the transformed data that generates an ordered list of recommended points-of-interest. Extensive experiments on a real trajectory dataset show that our algorithm is efficient, scalable and demonstrates high recommendation accuracy while meeting the required differential privacy guarantees.

2018-06-07

Wu, Xi, Li, Fengan, Kumar, Arun, Chaudhuri, Kamalika, Jha, Somesh, Naughton, Jeffrey. 2017. Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics. Proceedings of the 2017 ACM International Conference on Management of Data. :1307–1322.

While significant progress has been made separately on analytics systems for scalable stochastic gradient descent (SGD) and private SGD, none of the major scalable analytics frameworks have incorporated differentially private SGD. There are two inter-related issues for this disconnect between research and practice: (1) low model accuracy due to added noise to guarantee privacy, and (2) high development and runtime overhead of the private algorithms. This paper takes a first step to remedy this disconnect and proposes a private SGD algorithm to address both issues in an integrated manner. In contrast to the white-box approach adopted by previous work, we revisit and use the classical technique of output perturbation to devise a novel “bolt-on” approach to private SGD. While our approach trivially addresses (2), it makes (1) even more challenging. We address this challenge by providing a novel analysis of the L2-sensitivity of SGD, which allows, under the same privacy guarantees, better convergence of SGD when only a constant number of passes can be made over the data. We integrate our algorithm, as well as other state-of-the-art differentially private SGD, into Bismarck, a popular scalable SGD-based analytics system on top of an RDBMS. Extensive experiments show that our algorithm can be easily integrated, incurs virtually no overhead, scales well, and most importantly, yields substantially better (up to 4X) test accuracy than the state-of-the-art algorithms on many real datasets.

2018-02-27

Zhao, J.. 2017. Composition Properties of Bayesian Differential Privacy. 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC). :1–5.

Differential privacy is a rigorous privacy standard that has been applied to a range of data analysis tasks. To broaden the application scenarios of differential privacy when data records have dependencies, the notion of Bayesian differential privacy has been recently proposed. However, it is unknown whether Bayesian differential privacy preserves three nice properties of differential privacy: sequential composability, parallel composability, and post-processing. In this paper, we provide an affirmative answer to this question; i.e., Bayesian differential privacy still have these properties. The idea behind sequential composability is that if we have m algorithms Y1, Y2,łdots, Ym, where Y$\mathscrl$ is independently $ε\mathscrl$-Bayesian differential private for $\mathscrl$ = 1,2,łdots, m, then by feeding the result of Y1 into Y2, the result of Y2 into Y3, and so on, we will finally have an $Σ$m$\mathscrl$=;1 $ε\mathscrl$-Bayesian differential private algorithm. For parallel composability, we consider the situation where a database is partitioned into m disjoint subsets. The $\mathscrl$-th subset is input to a Bayesian differential private algorithm Y$\mathscrl$, for $\mathscrl$= 1, 2,łdots, m. Then the parallel composition of Y1, Y2,łdots, Ym will be maxm$\mathscrl$=;1=1 $ε\mathscrl$-Bayesian differential private. The postprocessing property means that a data analyst, without additional knowledge abo- t the private database, cannot compute a function of the output of a Bayesian differential private algorithm and reduce its privacy guarantee.

He, Xi, Machanavajjhala, Ashwin, Flynn, Cheryl, Srivastava, Divesh. 2017. Composing Differential Privacy and Secure Computation: A Case Study on Scaling Private Record Linkage. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. :1389–1406.

Private record linkage (PRL) is the problem of identifying pairs of records that are similar as per an input matching rule from databases held by two parties that do not trust one another. We identify three key desiderata that a PRL solution must ensure: (1) perfect precision and high recall of matching pairs, (2) a proof of end-to-end privacy, and (3) communication and computational costs that scale subquadratically in the number of input records. We show that all of the existing solutions for PRL? including secure 2-party computation (S2PC), and their variants that use non-private or differentially private (DP) blocking to ensure subquadratic cost – violate at least one of the three desiderata. In particular, S2PC techniques guarantee end-to-end privacy but have either low recall or quadratic cost. In contrast, no end-to-end privacy guarantee has been formalized for solutions that achieve subquadratic cost. This is true even for solutions that compose DP and S2PC: DP does not permit the release of any exact information about the databases, while S2PC algorithms for PRL allow the release of matching records. In light of this deficiency, we propose a novel privacy model, called output constrained differential privacy, that shares the strong privacy protection of DP, but allows for the truthful release of the output of a certain function applied to the data. We apply this to PRL, and show that protocols satisfying this privacy model permit the disclosure of the true matching records, but their execution is insensitive to the presence or absence of a single non-matching record. We find that prior work that combine DP and S2PC techniques even fail to satisfy this end-to-end privacy model. Hence, we develop novel protocols that provably achieve this end-to-end privacy guarantee, together with the other two desiderata of PRL. Our empirical evaluation also shows that our protocols obtain high recall, scale near linearly in the size of the input databases and the output set of matching pairs, and have communication and computational costs that are at least 2 orders of magnitude smaller than S2PC baselines.

2018-02-15

Bittner, Daniel M., Sarwate, Anand D., Wright, Rebecca N.. 2017. Differentially Private Noisy Search with Applications to Anomaly Detection (Abstract). Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. :53–53.

We consider the problem of privacy-sensitive anomaly detection - screening to detect individuals, behaviors, areas, or data samples of high interest. What defines an anomaly is context-specific; for example, a spoofed rather than genuine user attempting to log in to a web site, a fraudulent credit card transaction, or a suspicious traveler in an airport. The unifying assumption is that the number of anomalous points is quite small with respect to the population, so that deep screening of all individual data points would potentially be time-intensive, costly, and unnecessarily invasive of privacy. Such privacy violations can raise concerns due sensitive nature of data being used, raise fears about violations of data use agreements, and make people uncomfortable with anomaly detection methods. Anomaly detection is well studied, but methods to provide anomaly detection along with privacy are less well studied. Our overall goal in this research is to provide a framework for identifying anomalous data while guaranteeing quantifiable privacy in a rigorous sense. Once identified, such anomalies could warrant further data collection and investigation, depending on the context and relevant policies. In this research, we focus on privacy protection during the deployment of anomaly detection. Our main contribution is a differentially private access mechanism for finding anomalies using a search algorithm based on adaptive noisy group testing. To achieve this, we take as our starting point the notion of group testing [1], which was most famously used to screen US military draftees for syphilis during World War II. In group testing, individuals are tested in groups to limit the number of tests. Using multiple rounds of screenings, a small number of positive individuals can be detected very efficiently. Group testing has the added benefit of providing privacy to individuals through plausible deniability - since the group tests use aggregate data, individual contributions to the test are masked by the group. We follow on these concepts by demonstrating a search model utilizing adaptive queries on aggregated group data. Our work takes the first steps toward strengthening and formalizing these privacy concepts by achieving differential privacy [2]. Differential privacy is a statistical measure of disclosure risk that captures the intuition that an individual's privacy is protected if the results of a computation have at most a very small and quantifiable dependence on that individual's data. In the last decade, there hpractical adoption underway by high-profile companies such as Apple, Google, and Uber. In order to make differential privacy meaningful in the context of a task that seeks to specifically identify some (anomalous) individuals, we introduce the notion of anomaly-restricted differential privacy. Using ideas from information theory, we show that noise can be added to group query results in a way that provides differential privacy for non-anomalous individuals and still enables efficient and accurate detection of the anomalous individuals. Our method ensures that using differentially private aggregation of groups of points, providing privacy to individuals within the group while refining the group selection to the point that we can probabilistically narrow attention to a small numbers of individuals or samples for further attention. To summarize: We introduce a new notion of anomaly-restriction differential privacy, which may be of independent interest. We provide a noisy group-based search algorithm that satisfies the anomaly-restricted differential privacy definition. We provide both theoretical and empirical analysis of our noisy search algorithm, showing that it performs well in some cases, and exhibits the usual privacy/accuracy tradeoff of differentially private mechanisms. Potential anomaly detection applications for our work might include spatial search for outliers: this would rely on new sensing technologies that can perform queries in aggregate to reveal and isolate anomalous outliers. For example, this could lead to privacy-sensitive methods for searching for outlying cell phone activity patterns or Internet activity patterns in a geographic location.