"Towards Making Systems Forget with Machine Unlearning"
Title | "Towards Making Systems Forget with Machine Unlearning" |
Publication Type | Conference Paper |
Year of Publication | 2015 |
Authors | Y. Cao, J. Yang |
Conference Name | 2015 IEEE Symposium on Security and Privacy |
Date Published | May |
Publisher | IEEE |
ISBN Number | 978-1-4673-6949-7 |
Accession Number | 15308281 |
Keywords | Adversarial Machine Learning, complex data propagation network, Computational modeling, data lineage, Data models, data privacy, feature extraction, feature modeling, feature selection, Forgetting System, forgetting systems, learning (artificial intelligence), Learning systems, machine learning algorithms, machine unlearning, privacy risks, pubcrawl, pubcrawl170105, recommendation engine, recommender systems, security of data, security perspective, statistical query learning, summation form, Training data, usability perspective |
Abstract | Today's systems produce a rapidly exploding amount of data, and the data further derives more data, forming a complex data propagation network that we call the data's lineage. There are many reasons that users want systems to forget certain data including its lineage. From a privacy perspective, users who become concerned with new privacy risks of a system often want the system to forget their data and lineage. From a security perspective, if an attacker pollutes an anomaly detector by injecting manually crafted data into the training data set, the detector must forget the injected data to regain security. From a usability perspective, a user can remove noise and incorrect entries so that a recommendation engine gives useful recommendations. Therefore, we envision forgetting systems, capable of forgetting certain data and their lineages, completely and quickly. This paper focuses on making learning systems forget, the process of which we call machine unlearning, or simply unlearning. We present a general, efficient unlearning approach by transforming learning algorithms used by a system into a summation form. To forget a training data sample, our approach simply updates a small number of summations - asymptotically faster than retraining from scratch. Our approach is general, because the summation form is from the statistical query learning in which many machine learning algorithms can be implemented. Our approach also applies to all stages of machine learning, including feature selection and modeling. Our evaluation, on four diverse learning systems and real-world workloads, shows that our approach is general, effective, fast, and easy to use. |
URL | http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7163042&isnumber=7163005 |
DOI | 10.1109/SP.2015.35 |
Citation Key | 7163042 |
- machine learning algorithms
- usability perspective
- Training data
- summation form
- statistical query learning
- security perspective
- security of data
- recommender systems
- recommendation engine
- pubcrawl170105
- pubcrawl
- privacy risks
- machine unlearning
- Adversarial Machine Learning
- Learning systems
- learning (artificial intelligence)
- forgetting systems
- Forgetting System
- Feature Selection
- feature modeling
- feature extraction
- data privacy
- Data models
- data lineage
- Computational modeling
- complex data propagation network