Biblio
These days the digitization process is everywhere, spreading also across central governments and local authorities. It is hoped that, using open government data for scientific research purposes, the public good and social justice might be enhanced. Taking into account the European General Data Protection Regulation recently adopted, the big challenge in Portugal and other European countries, is how to provide the right balance between personal data privacy and data value for research. This work presents a sensitivity study of data anonymization procedure applied to a real open government data available from the Brazilian higher education evaluation system. The ARX k-anonymization algorithm, with and without generalization of some research value variables, was performed. The analysis of the amount of data / information lost and the risk of re-identification suggest that the anonymization process may lead to the under-representation of minorities and sociodemographic disadvantaged groups. It will enable scientists to improve the balance among risk, data usability, and contributions for the public good policies and practices.
Data privacy has been an important area of research in recent years. Dataset often consists of sensitive data fields, exposure of which may jeopardize interests of individuals associated with the data. In order to resolve this issue, privacy techniques can be used to hinder the identification of a person through anonymization of the sensitive data in the dataset to protect sensitive information, while the anonymized dataset can be used by the third parties for analysis purposes without obstruction. In this research, we investigated a privacy technique, k-anonymity for different values of on different number columns of the dataset. Next, the information loss due to k-anonymity is computed. The anonymized files go through the classification process by some machine-learning algorithms i.e., Naive Bayes, J48 and neural network in order to check a balance between data anonymity and data utility. Based on the classification accuracy, the optimal values of and are obtained, and thus, the optimal and can be used for k-anonymity algorithm to anonymize optimal number of columns of the dataset.
In the current world, day by day the data growth and the investigation about that information increased due to the pervasiveness of computing devices, but people are reluctant to share their information on online portals or surveys fearing safety because sensitive information such as credit card information, medical conditions and other personal information in the wrong hands can mean danger to the society. These days privacy preserving has become a setback for storing data in data repository so for that reason data in the repository should be made undistinguishable, data is encrypted while storing and later decrypted when needed for analysis purpose in data mining. While storing the raw data of the individuals it is important to remove person-identifiable information such as name, employee id. However, the other attributes pertaining to the person should be encrypted so the methodologies used to implement. These methodologies can make data in the repository secure and PPDM task can made easier.
Recently, a large amount of research studies aiming at the privacy-preserving data publishing have been conducted. We find that most K-anonymity algorithms fail to consider the characteristics of attribute values distribution in data and the contribution value differences in quasi-identifier attributes when service-oriented. In this paper, the importance of distribution characteristics of attribute values and the differences in contribution value of quasi-identifier attributes to anonymous results are illustrated. In order to maximize the utility of released data, a service-oriented adaptive anonymity algorithm is proposed. We establish a model of reaction dispersion degree to quantify the characteristics of attribute value distribution and introduce the concept of utility weight related to the contribution value of quasi-identifier attributes. The priority coefficient and the characterization coefficient of partition quality are defined to optimize selection strategies of dimension and splitting value in anonymity group partition process adaptively, which can reduce unnecessary information loss so as to further improve the utility of anonymized data. The rationality and validity of the algorithm are verified by theoretical analysis and multiple experiments.
As the number of data in various industries and government sectors is growing exponentially, the `7V' concept of big data aims to create a new value by indiscriminately collecting and analyzing information from various fields. At the same time as the ecosystem of the ICT industry arrives, big data utilization is treatened by the privacy attacks such as infringement due to the large amount of data. To manage and sustain the controllable privacy level, there need some recommended de-identification techniques. This paper exploits those de-identification processes and three types of commonly used privacy models. Furthermore, this paper presents use cases which can be adopted those kinds of technologies and future development directions.
With the development of location technology, location-based services greatly facilitate people's life . However, due to the location information contains a large amount of user sensitive informations, the servicer in location-based services published location data also be subject to the risk of privacy disclosure. In particular, it is more easy to lead to privacy leaks without considering the attacker's semantic background knowledge while the publish sparse location data. So, we proposed semantic k-anonymity privacy protection method to against above problem in this paper. In this method, we first proposed multi-user compressing sensing method to reconstruct the missing location data . To balance the availability and privacy requirment of anonymity set, We use semantic translation and multi-view fusion to selected non-sensitive data to join anonymous set. Experiment results on two real world datasets demonstrate that our solution improve the quality of privacy protection to against semantic attacks.
k-anonymity is a popular model in privacy preserving data publishing. It provides privacy guarantee when a microdata table is released. In microdata, sensitive attributes contain high-sensitive and low sensitive values. Unfortunately, study in anonymity for distributing sensitive value is still rare. This study aims to distribute evenly high-sensitive value to quasi identifier group. We proposed an approach called Simple Distribution of Sensitive Value. We compared our method with systematic clustering which is considered as very effective method to group quasi identifier. Information entropy is used to measure the diversity in each quasi identifier group and in a microdata table. Experiment result show our method outperformed systematic clustering when high-sensitive value is distributed.
K-anonymity is a popular model used in microdata publishing to protect individual privacy. This paper introduces the idea of ball tree and projection area density partition into k-anonymity algorithm.The traditional kd-tree implements the division by forming a super-rectangular, but the super-rectangular has the area angle, so it cannot guarantee that the records on the corner are most similar to the records in this area. In this paper, the super-sphere formed by the ball-tree is used to address this problem. We adopt projection area density partition to increase the density of the resulting recorded points. We implement our algorithm with the Gotrack dataset and the Adult dataset in UCI. The experimentation shows that the k-anonymity algorithm based on ball-tree and projection area density partition, obtains more anonymous groups, and the generalization rate is lower. The smaller the K is, the more obvious the result advantage is. The result indicates that our algorithm can make data usability even higher.
In recent trends, privacy preservation is the most predominant factor, on big data analytics and cloud computing. Every organization collects personal data from the users actively or passively. Publishing this data for research and other analytics without removing Personally Identifiable Information (PII) will lead to the privacy breach. Existing anonymization techniques are failing to maintain the balance between data privacy and data utility. In order to provide a trade-off between the privacy of the users and data utility, a Mondrian based k-anonymity approach is proposed. To protect the privacy of high-dimensional data Deep Neural Network (DNN) based framework is proposed. The experimental result shows that the proposed approach mitigates the information loss of the data without compromising privacy.
In order to protect individuals' privacy, data have to be "well-sanitized" before sharing them, i.e. one has to remove any personal information before sharing data. However, it is not always clear when data shall be deemed well-sanitized. In this paper, we argue that the evaluation of sanitized data should be based on whether the data allows the inference of sensitive information that is specific to an individual, instead of being centered around the concept of re-identification. We propose a framework to evaluate the effectiveness of different sanitization techniques on a given dataset by measuring how much an individual's record from the sanitized dataset influences the inference of his/her own sensitive attribute. Our intent is not to accurately predict any sensitive attribute but rather to measure the impact of a single record on the inference of sensitive information. We demonstrate our approach by sanitizing two real datasets in different privacy models and evaluate/compare each sanitized dataset in our framework.
To preserve the privacy of social networks, most existing methods are applied to satisfy different anonymity models, but there are some serious problems such as huge large information losses and great structural modifications of original social network. Therefore, an improved privacy protection method called k-subgraph is proposed, which is based on k-degree anonymous graph derived from k-anonymity to keep the network structure stable. The method firstly divides network nodes into several clusters by label propagation algorithm, and then reconstructs the sub-graph by means of moving edges to achieve k-degree anonymity. Experimental results show that our k-subgraph method can not only effectively improve the defense capability against malicious attacks based on node degrees, but also maintain stability of network structure. In addition, the cost of information losses due to anonymity is minimized ideally.
The recent emergence of smartphones, cloud computing, and the Internet of Things has brought about the explosion of data creation. By collating and merging these enormous data with other information, services that use information become more sophisticated and advanced. However, at the same time, the consideration of privacy violations caused by such merging is indispensable. Various anonymization methods have been proposed to preserve privacy. The conventional perturbation-based anonymization method of location data adds comparatively larger noise, and the larger noise makes it difficult to utilize the data effectively for secondary use. In this research, to solve these problems, we first clarified the definition of privacy preservation and then propose TMk-anonymity according to the definition.
Location-Based Service (LBS) becomes increasingly important for our daily life. However, the localization information in the air is vulnerable to various attacks, which result in serious privacy concerns. To overcome this problem, we formulate a multi-objective optimization problem with considering both the query probability and the practical dummy location region. A low complexity dummy location selection scheme is proposed. We first find several candidate dummy locations with similar query probabilities. Among these selected candidates, a cloaking area based algorithm is then offered to find K - 1 dummy locations to achieve K-anonymity. The intersected area between two dummy locations is also derived to assist to determine the total cloaking area. Security analysis verifies the effectiveness of our scheme against the passive and active adversaries. Compared with other methods, simulation results show that the proposed dummy location scheme can improve the privacy level and enlarge the cloaking area simultaneously.
Analytics in big data is maturing and moving towards mass adoption. The emergence of analytics increases the need for innovative tools and methodologies to protect data against privacy violation. Many data anonymization methods were proposed to provide some degree of privacy protection by applying data suppression and other distortion techniques. However, currently available methods suffer from poor scalability, performance and lack of framework standardization. Current anonymization methods are unable to cope with the massive size of data processing. Some of these methods were especially proposed for MapReduce framework to operate in Big Data. However, they still operate in conventional data management approaches. Therefore, there were no remarkable gains in the performance. We introduce a framework that can operate in MapReduce environment to benefit from its advantages, as well as from those in Hadoop ecosystems. Our framework provides a granular user's access that can be tuned to different authorization levels. The proposed solution provides a fine-grained alteration based on the user's authorization level to access MapReduce domain for analytics. Using well-developed role-based access control approaches, this framework is capable of assigning roles to users and map them to relevant data attributes.
Tactical wireless sensor networks (WSNs) are deployed over a region of interest for mission centric operations. The sink node in a tactical WSN is the aggregation point of data processing. Due to its essential role in the network, the sink node is a high priority target for an attacker who wishes to disable a tactical WSN. This paper focuses on the mitigation of sink-node vulnerability in a tactical WSN. Specifically, we study the issue of protecting the sink node through a technique known as k-anonymity. To achieve k-anonymity, we use a specific routing protocol designed to work within the constraints of WSN communication protocols, specifically IEEE 802.15.4. We use and modify the Lightweight Ad hoc On-Demand Next Generation (LOADng) reactive-routing protocol to achieve anonymity. This modified LOADng protocol prevents an attacker from identifying the sink node without adding significant complexity to the regular sensor nodes. We simulate the modified LOADng protocol using a custom-designed simulator in MATLAB. We demonstrate the effectiveness of our protocol and also show some of the performance tradeoffs that come with this method.