Differential privacy-based data de-identification protection and risk evaluation system
Title | Differential privacy-based data de-identification protection and risk evaluation system |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Tsou, Y., Chen, H., Chen, J., Huang, Y., Wang, P. |
Conference Name | 2017 International Conference on Information and Communication Technology Convergence (ICTC) |
Date Published | oct |
ISBN Number | 978-1-5090-4032-2 |
Keywords | Big Data, composability, data de-identification process, data disclosure estimation system, data mining, data privacy, data protection, data query, de-identification, Differential privacy, differential privacy-based data de-identification protection, Human Behavior, native differential privacy, privacy protection issues, privacy-sensitive data, pubcrawl, query processing, Resiliency, risk evaluation system, Scalability, Synthetic Dataset, value added analysis |
Abstract | As more and more technologies to store and analyze massive amount of data become available, it is extremely important to make privacy-sensitive data de-identified so that further analysis can be conducted by different parties. For example, data needs to go through data de-identification process before being transferred to institutes for further value added analysis. As such, privacy protection issues associated with the release of data and data mining have become a popular field of study in the domain of big data. As a strict and verifiable definition of privacy, differential privacy has attracted noteworthy attention and widespread research in recent years. Nevertheless, differential privacy is not practical for most applications due to its performance of synthetic dataset generation for data query. Moreover, the definition of data protection by randomized noise in native differential privacy is abstract to users. Therefore, we design a pragmatic DP-based data de-identification protection and risk of data disclosure estimation system, in which a DP-based noise addition mechanism is applied to generate synthetic datasets. Furthermore, the risk of data disclosure to these synthetic datasets can be evaluated before releasing to buyers/consumers. |
URL | https://ieeexplore.ieee.org/document/8191015 |
DOI | 10.1109/ICTC.2017.8191015 |
Citation Key | tsou_differential_2017 |
- Human behavior
- value added analysis
- Synthetic Dataset
- Scalability
- risk evaluation system
- Resiliency
- query processing
- pubcrawl
- privacy-sensitive data
- privacy protection issues
- native differential privacy
- Big Data
- differential privacy-based data de-identification protection
- differential privacy
- de-identification
- data query
- Data protection
- data privacy
- Data mining
- data disclosure estimation system
- data de-identification process
- composability