Measuring data privacy preserving and machine learning
Title | Measuring data privacy preserving and machine learning |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Esquivel-Quiros, Luis Gustavo, Barrantes, Elena Gabriela, Darlington, Fernando Esponda |
Conference Name | 2018 7th International Conference On Software Process Improvement (CIMPS) |
Keywords | Computational modeling, data owners, data privacy, data privacy-preserving, data publisher, data publishing, learning (artificial intelligence), machine learning, machine learning models, machine learning techniques, Measurement, Metrics, Organizations, privacy, privacy levels, privacy measurement, privacy models and measurement, Privacy Preferences, privacy preservation metric, privacy violations, Privacy-preserving, pubcrawl, sensitive data, Software |
Abstract | The increasing publication of large amounts of data, theoretically anonymous, can lead to a number of attacks on the privacy of people. The publication of sensitive data without exposing the data owners is generally not part of the software developers concerns. The regulations for the data privacy-preserving create an appropriate scenario to focus on privacy from the perspective of the use or data exploration that takes place in an organization. The increasing number of sanctions for privacy violations motivates the systematic comparison of three known machine learning algorithms in order to measure the usefulness of the data privacy preserving. The scope of the evaluation is extended by comparing them with a known privacy preservation metric. Different parameter scenarios and privacy levels are used. The use of publicly available implementations, the presentation of the methodology, explanation of the experiments and the analysis allow providing a framework of work on the problem of the preservation of privacy. Problems are shown in the measurement of the usefulness of the data and its relationship with the privacy preserving. The findings motivate the need to create optimized metrics on the privacy preferences of the owners of the data since the risks of predicting sensitive attributes by means of machine learning techniques are not usually eliminated. In addition, it is shown that there may be a hundred percent, but it cannot be measured. As well as ensuring adequate performance of machine learning models that are of interest to the organization that data publisher. |
DOI | 10.1109/CIMPS.2018.8625613 |
Citation Key | esquivel-quiros_measuring_2018 |
- Organizations
- Software
- sensitive data
- pubcrawl
- Privacy-preserving
- privacy violations
- privacy preservation metric
- Privacy Preferences
- privacy models and measurement
- privacy measurement
- privacy levels
- privacy
- Computational modeling
- Metrics
- Measurement
- machine learning techniques
- machine learning models
- machine learning
- learning (artificial intelligence)
- data publishing
- data publisher
- data privacy-preserving
- data privacy
- data owners