Biblio | CPS-VO

Liu, Kai-Cheng, Kuo, Chuan-Wei, Liao, Wen-Chiuan, Wang, Pang-Chieh. 2018. Optimized Data de-Identification Using Multidimensional k-Anonymity. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :1610–1614.

In the globalized knowledge economy, big data analytics have been widely applied in diverse areas. A critical issue in big data analysis on personal information is the possible leak of personal privacy. Therefore, it is necessary to have an anonymization-based de-identification method to avoid undesirable privacy leak. Such method can prevent published data form being traced back to personal privacy. Prior empirical researches have provided approaches to reduce privacy leak risk, e.g. Maximum Distance to Average Vector (MDAV), Condensation Approach and Differential Privacy. However, previous methods inevitably generate synthetic data of different sizes and is thus unsuitable for general use. To satisfy the need of general use, k-anonymity can be chosen as a privacy protection mechanism in the de-identification process to ensure the data not to be distorted, because k-anonymity is strong in both protecting privacy and preserving data authenticity. Accordingly, this study proposes an optimized multidimensional method for anonymizing data based on both the priority weight-adjusted method and the mean difference recommending tree method (MDR tree method). The results of this study reveal that this new method generate more reliable anonymous data and reduce the information loss rate.

Liu, Kai-Cheng, Kuo, Chuan-Wei, Liao, Wen-Chiuan, Wang, Pang-Chieh. 2018. Optimized Data de-Identification Using Multidimensional k-Anonymity. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :1610–1614.

In the globalized knowledge economy, big data analytics have been widely applied in diverse areas. A critical issue in big data analysis on personal information is the possible leak of personal privacy. Therefore, it is necessary to have an anonymization-based de-identification method to avoid undesirable privacy leak. Such method can prevent published data form being traced back to personal privacy. Prior empirical researches have provided approaches to reduce privacy leak risk, e.g. Maximum Distance to Average Vector (MDAV), Condensation Approach and Differential Privacy. However, previous methods inevitably generate synthetic data of different sizes and is thus unsuitable for general use. To satisfy the need of general use, k-anonymity can be chosen as a privacy protection mechanism in the de-identification process to ensure the data not to be distorted, because k-anonymity is strong in both protecting privacy and preserving data authenticity. Accordingly, this study proposes an optimized multidimensional method for anonymizing data based on both the priority weight-adjusted method and the mean difference recommending tree method (MDR tree method). The results of this study reveal that this new method generate more reliable anonymous data and reduce the information loss rate.

Simmons, Andrew J., Curumsing, Maheswaree Kissoon, Vasa, Rajesh. 2018. An Interaction Model for De-Identification of Human Data Held by External Custodians. Proceedings of the 30th Australian Conference on Computer-Human Interaction. :23–26.

Reuse of pre-existing industry datasets for research purposes requires a multi-stakeholder solution that balances the researcher's analysis objectives with the need to engage the industry data custodian, whilst respecting the privacy rights of human data subjects. Current methods place the burden on the data custodian, whom may not be sufficiently trained to fully appreciate the nuances of data de-identification. Through modelling of functional, quality, and emotional goals, we propose a de-identification in the cloud approach whereby the researcher proposes analyses along with the extraction and de-identification operations, while engaging the industry data custodian with secure control over authorising the proposed analyses. We demonstrate our approach through implementation of a de-identification portal for sports club data.

Tsou, Y., Chen, H., Chen, J., Huang, Y., Wang, P.. 2017. Differential privacy-based data de-identification protection and risk evaluation system. 2017 International Conference on Information and Communication Technology Convergence (ICTC). :416–421.

As more and more technologies to store and analyze massive amount of data become available, it is extremely important to make privacy-sensitive data de-identified so that further analysis can be conducted by different parties. For example, data needs to go through data de-identification process before being transferred to institutes for further value added analysis. As such, privacy protection issues associated with the release of data and data mining have become a popular field of study in the domain of big data. As a strict and verifiable definition of privacy, differential privacy has attracted noteworthy attention and widespread research in recent years. Nevertheless, differential privacy is not practical for most applications due to its performance of synthetic dataset generation for data query. Moreover, the definition of data protection by randomized noise in native differential privacy is abstract to users. Therefore, we design a pragmatic DP-based data de-identification protection and risk of data disclosure estimation system, in which a DP-based noise addition mechanism is applied to generate synthetic datasets. Furthermore, the risk of data disclosure to these synthetic datasets can be evaluated before releasing to buyers/consumers.

Joo, Moon-Ho, Yoon, Sang-Pil, Kim, Sahng-Yoon, Kwon, Hun-Yeong. 2017. Research on Distribution of Responsibility for De-Identification Policy of Personal Information. Proceedings of the 18th Annual International Conference on Digital Government Research. :74–83.

With the coming of the age of big data, efforts to institutionalize de-identification of personal information to protect privacy but also at the same time, to allow the use of personal information, have been actively carried out and already, many countries are in the stage of implementing and establishing de-identification policies quite actively. But even with such efforts to protect and use personal information at the same time, the danger posed by re-identification based on de-identified information is real enough to warrant serious consideration for a management mechanism of such risks as well as a mechanism for distributing the responsibilities and liabilities that follow these risks in the event of accidents and incidents involving the invasion of privacy. So far, most countries implementing the de-identification policies are focusing on defining what de-identification is and the exemption requirements to allow free use of de-identified personal information; in fact, it seems that there is a lack of discussion and consideration on how to distribute the responsibility of the risks and liabilities involved in the process of de-identification of personal information. This study proposes to take a look at the various de-identification policies worldwide and contemplate on these policies in the perspective of risk-liability theory. Also, the constituencies of the de-identification policies will be identified in order to analyze the roles and responsibilities of each of these constituencies thereby providing the theoretical basis on which to initiate the discussions on the distribution of burden and responsibilities arising from the de-identification policies.

Prinosil, J., Krupka, A., Riha, K., Dutta, M. K., Singh, A.. 2015. Automatic hair color de-identification. 2015 International Conference on Green Computing and Internet of Things (ICGCIoT). :732–736.

A process of de-identification used for privacy protection in multimedia content should be applied not only for primary biometric traits (face, voice) but for soft biometric traits as well. This paper deals with a proposal of the automatic hair color de-identification method working with video records. The method involves image hair area segmentation, basic hair color recognition, and modification of hair color for real-looking de-identified images.

H. M. Ruan, M. H. Tsai, Y. N. Huang, Y. H. Liao, C. L. Lei. 2015. "Discovery of De-identification Policies Considering Re-identification Risks and Information Loss". 2015 10th Asia Joint Conference on Information Security. :69-76.

In data analysis, it is always a tough task to strike the balance between the privacy and the applicability of the data. Due to the demand for individual privacy, the data are being more or less obscured before being released or outsourced to avoid possible privacy leakage. This process is so called de-identification. To discuss a de-identification policy, the most important two aspects should be the re-identification risk and the information loss. In this paper, we introduce a novel policy searching method to efficiently find out proper de-identification policies according to acceptable re-identification risk while retaining the information resided in the data. With the UCI Machine Learning Repository as our real world dataset, the re-identification risk can therefore be able to reflect the true risk of the de-identified data under the de-identification policies. Moreover, using the proposed algorithm, one can then efficiently acquire policies with higher information entropy.

A. Rahmani, A. Amine, M. R. Hamou. 2015. "De-identification of Textual Data Using Immune System for Privacy Preserving in Big Data". 2015 IEEE International Conference on Computational Intelligence Communication Technology. :112-116.

With the growing observed success of big data use, many challenges appeared. Timeless, scalability and privacy are the main problems that researchers attempt to figure out. Privacy preserving is now a highly active domain of research, many works and concepts had seen the light within this theme. One of these concepts is the de-identification techniques. De-identification is a specific area that consists of finding and removing sensitive information either by replacing it, encrypting it or adding a noise to it using several techniques such as cryptography and data mining. In this report, we present a new model of de-identification of textual data using a specific Immune System algorithm known as CLONALG.

Gaff, Brian M., Sussman, Heather Egan, Geetter, Jennifer. 2014. Privacy and Big Data. Computer. 47:7-9.

Big data's explosive growth has prompted the US government to release new reports that address the issues--particularly related to privacy--resulting from this growth. The Web extra at http://youtu.be/j49eoe5g8-c is an audio recording from the Computing and the Law column, in which authors Brian M. Gaff, Heather Egan Sussman, and Jennifer Geetter discuss how big data's explosive growth has prompted the US government to release new reports that address the issues--particularly related to privacy--resulting from this growth.