The Value of Collaboration in Convex Machine Learning with Differential Privacy

Submitted by aekwall on Mon, 01/11/2021 - 1:41pm

Title	The Value of Collaboration in Convex Machine Learning with Differential Privacy
Publication Type	Conference Paper
Year of Publication	2020
Authors	Wu, N., Farokhi, F., Smith, D., Kaafar, M. A.
Conference Name	2020 IEEE Symposium on Security and Privacy (SP)
Date Published	may
Keywords	Collaboration, composability, Computational modeling, convex machine learning, credit card fraud detection, Data models, data privacy, Differential privacy, differentially-private gradient queries, differentially-private gradients, distributed datasets, distributed private data, financial data processing, financial datasets, fitness cost, fraud, gradient methods, Human Behavior, learning (artificial intelligence), loan interest rates, machine learning, multiple data owners, nonoverlapping training datasets, Prediction algorithms, privacy, privacy budget, privacy-aware data owners, privacy-aware learning algorithms, pubcrawl, regression, regression analysis, Resiliency, Scalability, Stochastic gradient algorithm., stochastic gradient descent, Support vector machines, Training
Abstract	In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlapping training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines.
DOI	10.1109/SP40000.2020.00025
Citation Key	wu_value_2020

Groups:

Science of Security VO