Biblio | CPS-VO

Kotra, A., Eldosouky, A., Sengupta, S.. 2020. Every Anonymization Begins with k: A Game-Theoretic Approach for Optimized k Selection in k-Anonymization. 2020 International Conference on Advances in Computing and Communication Engineering (ICACCE). :1–6.

Privacy preservation is one of the greatest concerns when data is shared between different organizations. On the one hand, releasing data for research purposes is inevitable. On the other hand, sharing this data can jeopardize users' privacy. An effective solution, for the sharing organizations, is to use anonymization techniques to hide the users' sensitive information. One of the most popular anonymization techniques is k-Anonymization in which any data record is indistinguishable from at least k-1 other records. However, one of the fundamental challenges in choosing the value of k is the trade-off between achieving a higher privacy and the information loss associated with the anonymization. In this paper, the problem of choosing the optimal anonymization level for k-anonymization, under possible attacks, is studied when multiple organizations share their data to a common platform. In particular, two common types of attacks are considered that can target the k-anonymization technique. To this end, a novel game-theoretic framework is proposed to model the interactions between the sharing organizations and the attacker. The problem is formulated as a static game and its different Nash equilibria solutions are analytically derived. Simulation results show that the proposed framework can significantly improve the utility of the sharing organizations through optimizing the choice of k value.

Lin, G., Zhao, H., Zhao, L., Gan, X., Yao, Z.. 2020. Differential Privacy Information Publishing Algorithm based on Cluster Anonymity. 2020 International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE). :226—233.

With the development of Internet technology, the attacker gets more and more complex background knowledge, which makes the anonymous model susceptible to background attack. Although the differential privacy model can resist the background attack, it reduces the versatility of the data. In this paper, this paper proposes a differential privacy information publishing algorithm based on clustering anonymity. The algorithm uses the cluster anonymous algorithm based on KD tree to cluster the original data sets and gets anonymous tables by anonymous operation. Finally, the algorithm adds noise to the anonymous table to satisfy the definition of differential privacy. The algorithm is compared with the DCMDP (Density-Based Clustering Mechanism with Differential Privacy, DCMDP) algorithm under different privacy budgets. The experiments show that as the privacy budget increases, the algorithm reduces the information loss by about 80% of the published data.

Esmeel, T. K., Hasan, M. M., Kabir, M. N., Firdaus, A.. 2020. Balancing Data Utility versus Information Loss in Data-Privacy Protection using k-Anonymity. 2020 IEEE 8th Conference on Systems, Process and Control (ICSPC). :158—161.

Data privacy has been an important area of research in recent years. Dataset often consists of sensitive data fields, exposure of which may jeopardize interests of individuals associated with the data. In order to resolve this issue, privacy techniques can be used to hinder the identification of a person through anonymization of the sensitive data in the dataset to protect sensitive information, while the anonymized dataset can be used by the third parties for analysis purposes without obstruction. In this research, we investigated a privacy technique, k-anonymity for different values of on different number columns of the dataset. Next, the information loss due to k-anonymity is computed. The anonymized files go through the classification process by some machine-learning algorithms i.e., Naive Bayes, J48 and neural network in order to check a balance between data anonymity and data utility. Based on the classification accuracy, the optimal values of and are obtained, and thus, the optimal and can be used for k-anonymity algorithm to anonymize optimal number of columns of the dataset.

Podlesny, Nikolai J., Kayem, Anne V.D.M., Meinel, Christoph. 2019. Identifying Data Exposure Across Distributed High-Dimensional Health Data Silos through Bayesian Networks Optimised by Multigrid and Manifold. 2019 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :556—563.

We present a novel, and use case agnostic method of identifying and circumventing private data exposure across distributed and high-dimensional data repositories. Examples of distributed high-dimensional data repositories include medical research and treatment data, where oftentimes more than 300 describing attributes appear. As such, providing strong guarantees of data anonymity in these repositories is a hard constraint in adhering to privacy legislation. Yet, when applied to distributed high-dimensional data, existing anonymisation algorithms incur high levels of information loss and do not guarantee privacy defeating the purpose of anonymisation. In this paper, we address this issue by using Bayesian networks to handle data transformation for anonymisation. By evaluating every attribute combination to determine the privacy exposure risk, the conditional probability linking attribute pairs is computed. Pairs with a high conditional probability expose the risk of deanonymisation similar to quasi-identifiers and can be separated instead of deleted, as in previous algorithms. Attribute separation removes the risk of privacy exposure, and deletion avoidance results in a significant reduction in information loss. In other words, assimilating the conditional probability of outliers directly in the adjacency matrix in a greedy fashion is quick and thwarts de-anonymisation. Since identifying every privacy violating attribute combination is a W[2]-complete problem, we optimise the procedure with a multigrid solver method by evaluating the conditional probabilities between attribute pairs, and aggregating state space explosion of attribute pairs through manifold learning. Finally, incremental processing of new data is achieved through inexpensive, continuous (delta) learning.

Yuan, Jing, Ou, Yuyi, Gu, Guosheng. 2019. An Improved Privacy Protection Method Based on k-degree Anonymity in Social Network. 2019 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). :416–420.

To preserve the privacy of social networks, most existing methods are applied to satisfy different anonymity models, but there are some serious problems such as huge large information losses and great structural modifications of original social network. Therefore, an improved privacy protection method called k-subgraph is proposed, which is based on k-degree anonymous graph derived from k-anonymity to keep the network structure stable. The method firstly divides network nodes into several clusters by label propagation algorithm, and then reconstructs the sub-graph by means of moving edges to achieve k-degree anonymity. Experimental results show that our k-subgraph method can not only effectively improve the defense capability against malicious attacks based on node degrees, but also maintain stability of network structure. In addition, the cost of information losses due to anonymity is minimized ideally.

Liu, Kai-Cheng, Kuo, Chuan-Wei, Liao, Wen-Chiuan, Wang, Pang-Chieh. 2018. Optimized Data de-Identification Using Multidimensional k-Anonymity. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :1610–1614.

In the globalized knowledge economy, big data analytics have been widely applied in diverse areas. A critical issue in big data analysis on personal information is the possible leak of personal privacy. Therefore, it is necessary to have an anonymization-based de-identification method to avoid undesirable privacy leak. Such method can prevent published data form being traced back to personal privacy. Prior empirical researches have provided approaches to reduce privacy leak risk, e.g. Maximum Distance to Average Vector (MDAV), Condensation Approach and Differential Privacy. However, previous methods inevitably generate synthetic data of different sizes and is thus unsuitable for general use. To satisfy the need of general use, k-anonymity can be chosen as a privacy protection mechanism in the de-identification process to ensure the data not to be distorted, because k-anonymity is strong in both protecting privacy and preserving data authenticity. Accordingly, this study proposes an optimized multidimensional method for anonymizing data based on both the priority weight-adjusted method and the mean difference recommending tree method (MDR tree method). The results of this study reveal that this new method generate more reliable anonymous data and reduce the information loss rate.

Liu, Kai-Cheng, Kuo, Chuan-Wei, Liao, Wen-Chiuan, Wang, Pang-Chieh. 2018. Optimized Data de-Identification Using Multidimensional k-Anonymity. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :1610–1614.

In the globalized knowledge economy, big data analytics have been widely applied in diverse areas. A critical issue in big data analysis on personal information is the possible leak of personal privacy. Therefore, it is necessary to have an anonymization-based de-identification method to avoid undesirable privacy leak. Such method can prevent published data form being traced back to personal privacy. Prior empirical researches have provided approaches to reduce privacy leak risk, e.g. Maximum Distance to Average Vector (MDAV), Condensation Approach and Differential Privacy. However, previous methods inevitably generate synthetic data of different sizes and is thus unsuitable for general use. To satisfy the need of general use, k-anonymity can be chosen as a privacy protection mechanism in the de-identification process to ensure the data not to be distorted, because k-anonymity is strong in both protecting privacy and preserving data authenticity. Accordingly, this study proposes an optimized multidimensional method for anonymizing data based on both the priority weight-adjusted method and the mean difference recommending tree method (MDR tree method). The results of this study reveal that this new method generate more reliable anonymous data and reduce the information loss rate.

Li, Jian, Zhang, Zelin, Li, Shengyu, Benton, Ryan, Huang, Yulong, Kasukurthi, Mohan Vamsi, Li, Dongqi, Lin, Jingwei, Borchert, Glen M., Tan, Shaobo et al.. 2019. Reversible Data Hiding Based Key Region Protection Method in Medical Images. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). :1526–1530.

The transmission of medical image data in an open network environment is subject to privacy issues including patient privacy and data leakage. In the past, image encryption and information-hiding technology have been used to solve such security problems. But these methodologies, in general, suffered from difficulties in retrieving original images. We present in this paper an algorithm to protect key regions in medical images. First, coefficient of variation is used to locate the key regions, a.k.a. the lesion areas, of an image; other areas are then processed in blocks and analyzed for texture complexity. Next, our reversible data-hiding algorithm is used to embed the contents from the lesion areas into a high-texture area, and the Arnold transformation is performed to protect the original lesion information. In addition to this, we use the ciphertext of the basic information about the image and the decryption parameter to generate the Quick Response (QR) Code to replace the original key regions. Consequently, only authorized customers can obtain the encryption key to extract information from encrypted images. Experimental results show that our algorithm can not only restore the original image without information loss, but also safely transfer the medical image copyright and patient-sensitive information.

Li-Xin, L., Yong-Shan, D., Jia-Yan, W.. 2017. Differential Privacy Data Protection Method Based on Clustering. 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :11–16.

To enhance privacy protection and improve data availability, a differential privacy data protection method ICMD-DP is proposed. Based on insensitive clustering algorithm, ICMD-DP performs differential privacy on the results of ICMD (insensitive clustering method for mixed data). The combination of clustering and differential privacy realizes the differentiation of query sensitivity from single record to group record. At the meanwhile, it reduces the risk of information loss and information disclosure. In addition, to satisfy the requirement of maintaining differential privacy for mixed data, ICMD-DP uses different methods to calculate the distance and centroid of categorical and numerical attributes. Finally, experiments are given to illustrate the availability of the method.

Gao, Y., Luo, T., Li, J., Wang, C.. 2017. Research on K Anonymity Algorithm Based on Association Analysis of Data Utility. 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). :426–432.

More and more medical data are shared, which leads to disclosure of personal privacy information. Therefore, the construction of medical data privacy preserving publishing model is of great value: not only to make a non-correspondence between the released information and personal identity, but also to maintain the data utility after anonymity. However, there is an inherent contradiction between the anonymity and the data utility. In this paper, a Principal Component Analysis-Grey Relational Analysis (PCA-GRA) K anonymous algorithm is proposed to improve the data utility effectively under the premise of anonymity, in which the association between quasi-identifiers and the sensitive information is reckoned as a criterion to control the generalization hierarchy. Compared with the previous anonymity algorithms, results show that the proposed PCA-GRA K anonymous algorithm has achieved significant improvement in data utility from three aspects, namely information loss, feature maintenance and classification evaluation performance.

H. M. Ruan, M. H. Tsai, Y. N. Huang, Y. H. Liao, C. L. Lei. 2015. "Discovery of De-identification Policies Considering Re-identification Risks and Information Loss". 2015 10th Asia Joint Conference on Information Security. :69-76.

In data analysis, it is always a tough task to strike the balance between the privacy and the applicability of the data. Due to the demand for individual privacy, the data are being more or less obscured before being released or outsourced to avoid possible privacy leakage. This process is so called de-identification. To discuss a de-identification policy, the most important two aspects should be the re-identification risk and the information loss. In this paper, we introduce a novel policy searching method to efficiently find out proper de-identification policies according to acceptable re-identification risk while retaining the information resided in the data. With the UCI Machine Learning Repository as our real world dataset, the re-identification risk can therefore be able to reflect the true risk of the de-identified data under the de-identification policies. Moreover, using the proposed algorithm, one can then efficiently acquire policies with higher information entropy.