Biblio
Reuse of pre-existing industry datasets for research purposes requires a multi-stakeholder solution that balances the researcher's analysis objectives with the need to engage the industry data custodian, whilst respecting the privacy rights of human data subjects. Current methods place the burden on the data custodian, whom may not be sufficiently trained to fully appreciate the nuances of data de-identification. Through modelling of functional, quality, and emotional goals, we propose a de-identification in the cloud approach whereby the researcher proposes analyses along with the extraction and de-identification operations, while engaging the industry data custodian with secure control over authorising the proposed analyses. We demonstrate our approach through implementation of a de-identification portal for sports club data.
As more and more technologies to store and analyze massive amount of data become available, it is extremely important to make privacy-sensitive data de-identified so that further analysis can be conducted by different parties. For example, data needs to go through data de-identification process before being transferred to institutes for further value added analysis. As such, privacy protection issues associated with the release of data and data mining have become a popular field of study in the domain of big data. As a strict and verifiable definition of privacy, differential privacy has attracted noteworthy attention and widespread research in recent years. Nevertheless, differential privacy is not practical for most applications due to its performance of synthetic dataset generation for data query. Moreover, the definition of data protection by randomized noise in native differential privacy is abstract to users. Therefore, we design a pragmatic DP-based data de-identification protection and risk of data disclosure estimation system, in which a DP-based noise addition mechanism is applied to generate synthetic datasets. Furthermore, the risk of data disclosure to these synthetic datasets can be evaluated before releasing to buyers/consumers.
A process of de-identification used for privacy protection in multimedia content should be applied not only for primary biometric traits (face, voice) but for soft biometric traits as well. This paper deals with a proposal of the automatic hair color de-identification method working with video records. The method involves image hair area segmentation, basic hair color recognition, and modification of hair color for real-looking de-identified images.
In data analysis, it is always a tough task to strike the balance between the privacy and the applicability of the data. Due to the demand for individual privacy, the data are being more or less obscured before being released or outsourced to avoid possible privacy leakage. This process is so called de-identification. To discuss a de-identification policy, the most important two aspects should be the re-identification risk and the information loss. In this paper, we introduce a novel policy searching method to efficiently find out proper de-identification policies according to acceptable re-identification risk while retaining the information resided in the data. With the UCI Machine Learning Repository as our real world dataset, the re-identification risk can therefore be able to reflect the true risk of the de-identified data under the de-identification policies. Moreover, using the proposed algorithm, one can then efficiently acquire policies with higher information entropy.
With the growing observed success of big data use, many challenges appeared. Timeless, scalability and privacy are the main problems that researchers attempt to figure out. Privacy preserving is now a highly active domain of research, many works and concepts had seen the light within this theme. One of these concepts is the de-identification techniques. De-identification is a specific area that consists of finding and removing sensitive information either by replacing it, encrypting it or adding a noise to it using several techniques such as cryptography and data mining. In this report, we present a new model of de-identification of textual data using a specific Immune System algorithm known as CLONALG.
Big data's explosive growth has prompted the US government to release new reports that address the issues--particularly related to privacy--resulting from this growth. The Web extra at http://youtu.be/j49eoe5g8-c is an audio recording from the Computing and the Law column, in which authors Brian M. Gaff, Heather Egan Sussman, and Jennifer Geetter discuss how big data's explosive growth has prompted the US government to release new reports that address the issues--particularly related to privacy--resulting from this growth.