Towards Making Systems Forget with Machine Unlearning

Submitted by Katie Dey on Mon, 08/06/2018 - 1:24pm

Title	Towards Making Systems Forget with Machine Unlearning
Publication Type	Conference Paper
Year of Publication	2015
Authors	Y. Cao, J. Yang
Conference Name	2015 IEEE Symposium on Security and Privacy
Date Published	May
Keywords	Adversarial Machine Learning, Articles of Interest, C3E 2019, Cognitive Security, complex data propagation network, Computational modeling, data lineage, Data models, data privacy, Extraction, feature extraction, feature modeling, feature selection, Forgetting System, forgetting systems, learning (artificial intelligence), Learning systems, machine learning algorithms, machine unlearning, privacy risks, recommendation engine, recommender systems, security of data, security perspective, statistical query learning, summation form, Training data, usability perspective
Abstract	Today's systems produce a rapidly exploding amount of data, and the data further derives more data, forming a complex data propagation network that we call the data's lineage. There are many reasons that users want systems to forget certain data including its lineage. From a privacy perspective, users who become concerned with new privacy risks of a system often want the system to forget their data and lineage. From a security perspective, if an attacker pollutes an anomaly detector by injecting manually crafted data into the training data set, the detector must forget the injected data to regain security. From a usability perspective, a user can remove noise and incorrect entries so that a recommendation engine gives useful recommendations. Therefore, we envision forgetting systems, capable of forgetting certain data and their lineages, completely and quickly. This paper focuses on making learning systems forget, the process of which we call machine unlearning, or simply unlearning. We present a general, efficient unlearning approach by transforming learning algorithms used by a system into a summation form. To forget a training data sample, our approach simply updates a small number of summations - asymptotically faster than retraining from scratch. Our approach is general, because the summation form is from the statistical query learning in which many machine learning algorithms can be implemented. Our approach also applies to all stages of machine learning, including feature selection and modeling. Our evaluation, on four diverse learning systems and real-world workloads, shows that our approach is general, effective, fast, and easy to use.
DOI	10.1109/SP.2015.35
Citation Key	7163042

Groups:

Computational Cybersecurity in Compromised Environments (C3E)