CAREER: An Integrated Approach For Efficient Privacy Preserving Distributed Data Analytics

Submitted by Murat Kantarcioglu on Thu, 04/26/2018 - 3:36pm

Project Details

Lead PI

Murat Kantarcioglu

Performance Period

Feb 01, 2009 - Jan 31, 2016

Institution(s)

University of Texas at Dallas

Award Number

0845803

Outcomes Report URL

https://www.research.gov/research-portal/appmanager/base/desktop?_nfpb=true&_win...

Increasingly, different organizations need to securely share their private data to execute many critical tasks. Recently, several different approaches based on secure multi-party computation (SMC) and data sanitization techniques have emerged to enable privacy preserving distributed data analytics. Although SMC based privacy-preserving protocols allow the participating parties to learn only the final (accurate) result, they do not scale well for large amounts of data. On the other hand, sanitization based techniques allow organizations to reveal privacy sensitive data under some privacy guarantees by distorting the data. In many cases, significant data distortion that is needed to preserve privacy could lead to inaccurate results. Due to the limitations of the current approaches, efficient and accurate privacy-preserving solutions are needed for handling large distributed data sets. To address this challenge, we design and develop a novel framework where sanitization and SMC techniques are integrated to develop efficient privacy-preserving solutions under resource constraints. Basically, we use the data sanitization techniques to get initial approximate results and carry out SMC operations selectively to increase the accuracy. Since we use existing techniques in a black box fashion, our approach is orthogonal to any new sanitization or SMC techniques. Our new techniques will substantially decrease the cost of executing privacy-preserving distributed data analytics protocols. This will have a direct economic impact by opening the way for new applications (e.g., e-health and e-government applications) that are at present considered infeasible due to the lack of necessary privacy-preserving solutions that can work efficiently on large data sets.

Murat Kantarcioglu