Biblio
These days the digitization process is everywhere, spreading also across central governments and local authorities. It is hoped that, using open government data for scientific research purposes, the public good and social justice might be enhanced. Taking into account the European General Data Protection Regulation recently adopted, the big challenge in Portugal and other European countries, is how to provide the right balance between personal data privacy and data value for research. This work presents a sensitivity study of data anonymization procedure applied to a real open government data available from the Brazilian higher education evaluation system. The ARX k-anonymization algorithm, with and without generalization of some research value variables, was performed. The analysis of the amount of data / information lost and the risk of re-identification suggest that the anonymization process may lead to the under-representation of minorities and sociodemographic disadvantaged groups. It will enable scientists to improve the balance among risk, data usability, and contributions for the public good policies and practices.
Data privacy has been an important area of research in recent years. Dataset often consists of sensitive data fields, exposure of which may jeopardize interests of individuals associated with the data. In order to resolve this issue, privacy techniques can be used to hinder the identification of a person through anonymization of the sensitive data in the dataset to protect sensitive information, while the anonymized dataset can be used by the third parties for analysis purposes without obstruction. In this research, we investigated a privacy technique, k-anonymity for different values of on different number columns of the dataset. Next, the information loss due to k-anonymity is computed. The anonymized files go through the classification process by some machine-learning algorithms i.e., Naive Bayes, J48 and neural network in order to check a balance between data anonymity and data utility. Based on the classification accuracy, the optimal values of and are obtained, and thus, the optimal and can be used for k-anonymity algorithm to anonymize optimal number of columns of the dataset.
In the current world, day by day the data growth and the investigation about that information increased due to the pervasiveness of computing devices, but people are reluctant to share their information on online portals or surveys fearing safety because sensitive information such as credit card information, medical conditions and other personal information in the wrong hands can mean danger to the society. These days privacy preserving has become a setback for storing data in data repository so for that reason data in the repository should be made undistinguishable, data is encrypted while storing and later decrypted when needed for analysis purpose in data mining. While storing the raw data of the individuals it is important to remove person-identifiable information such as name, employee id. However, the other attributes pertaining to the person should be encrypted so the methodologies used to implement. These methodologies can make data in the repository secure and PPDM task can made easier.
Advertisement sharing in vehicular network through vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication is a fascinating in-vehicle service for advertisers and the users due to multiple reasons. It enable advertisers to promote their product or services in the region of their interest. Also the users get to receive more relevant ads. Usually, users tend to contribute in dissemination of ads if their privacy is preserved and if some incentive is provided. Recent researches have focused on enabling both of the parameters for the users by developing fair incentive mechanism which preserves privacy by using Zero-Knowledge Proof of Knowledge (ZKPoK) (Ming et al., 2019). However, the anonymity provided by ZKPoK can introduce internal attacker scenarios in the network due to which authenticated users can disseminate fake ads in the network without payment. As the existing scheme uses certificate-less cryptography, due to which malicious users cannot be removed from the network. In order to resolve these challenges, we employed conditional anonymity and introduced Monitoring Authority (MA) in the system. In our proposed scheme, the pseudonyms are assigned to the vehicles while their real identities are stored in Certification Authority (CA) in encrypted form. The pseudonyms are updated after a pre-defined time threshold to prevent behavioural privacy leakage. We performed security and performance analysis to show the efficiency of our proposed system.
To our best knowledge, the p-sensitive k-anonymity model is a sophisticated model to resist linking attacks and homogeneous attacks in data publishing. However, if the distribution of sensitive values is skew, the model is difficult to defend against skew attacks and even faces sensitive attacks. In practice, the privacy requirements of different sensitive values are not always identical. The “one size fits all” unified privacy protection level may cause unnecessary information loss. To address these problems, the paper quantifies privacy requirements with the concept of IDF and concerns more about sensitive groups. Two enhanced anonymous models with personalized protection characteristic, that is, (p,αisg) -sensitive k-anonymity model and (pi,αisg)-sensitive k-anonymity model, are then proposed to resist skew attacks and sensitive attacks. Furthermore, two clustering algorithms with global search and local search are designed to implement our models. Experimental results show that the two enhanced models have outstanding advantages in better privacy at the expense of a little data utility.
Recently, a large amount of research studies aiming at the privacy-preserving data publishing have been conducted. We find that most K-anonymity algorithms fail to consider the characteristics of attribute values distribution in data and the contribution value differences in quasi-identifier attributes when service-oriented. In this paper, the importance of distribution characteristics of attribute values and the differences in contribution value of quasi-identifier attributes to anonymous results are illustrated. In order to maximize the utility of released data, a service-oriented adaptive anonymity algorithm is proposed. We establish a model of reaction dispersion degree to quantify the characteristics of attribute value distribution and introduce the concept of utility weight related to the contribution value of quasi-identifier attributes. The priority coefficient and the characterization coefficient of partition quality are defined to optimize selection strategies of dimension and splitting value in anonymity group partition process adaptively, which can reduce unnecessary information loss so as to further improve the utility of anonymized data. The rationality and validity of the algorithm are verified by theoretical analysis and multiple experiments.
The existing anonymized differential privacy model adopts a unified anonymity method, ignoring the difference of personal privacy, which may lead to the problem of excessive or insufficient protection of the original data [1]. Therefore, this paper proposes a personalized k-anonymity model for tuples (PKA) and proposes a differential privacy data publishing algorithm (DPPA) based on personalized anonymity, firstly based on the tuple personality factor set by the user in the original data set. The values are classified and the corresponding privacy protection relevance is calculated. Then according to the tuple personality factor classification value, the data set is clustered by clustering method with different anonymity, and the quasi-identifier attribute of each cluster is aggregated and noise-added to realize anonymized differential privacy; finally merge the subset to get the data set that meets the release requirements. In this paper, the correctness of the algorithm is analyzed theoretically, and the feasibility and effectiveness of the proposed algorithm are verified by comparison with similar algorithms.
With the development of location technology, location-based services greatly facilitate people's life . However, due to the location information contains a large amount of user sensitive informations, the servicer in location-based services published location data also be subject to the risk of privacy disclosure. In particular, it is more easy to lead to privacy leaks without considering the attacker's semantic background knowledge while the publish sparse location data. So, we proposed semantic k-anonymity privacy protection method to against above problem in this paper. In this method, we first proposed multi-user compressing sensing method to reconstruct the missing location data . To balance the availability and privacy requirment of anonymity set, We use semantic translation and multi-view fusion to selected non-sensitive data to join anonymous set. Experiment results on two real world datasets demonstrate that our solution improve the quality of privacy protection to against semantic attacks.
Expected and unexpected risks in cloud computing, which included data security, data segregation, and the lack of control and knowledge, have led to some dilemmas in several fields. Among all of these dilemmas, the privacy problem is even more paramount, which has largely constrained the prevalence and development of cloud computing. There are several privacy protection algorithms proposed nowadays, which generally include two categories, Anonymity algorithm, and differential privacy mechanism. Since many types of research have already focused on the efficiency of the algorithms, few of them emphasized the different orientation and demerits between the two algorithms. Motivated by this emerging research challenge, we have conducted a comprehensive survey on the two popular privacy protection algorithms, namely K-Anonymity Algorithm and Differential Privacy Algorithm. Based on their principles, implementations, and algorithm orientations, we have done the evaluations of these two algorithms. Several expectations and comparisons are also conducted based on the current cloud computing privacy environment and its future requirements.
Bitcoin is a decentralized, pseudonymous cryptocurrency that is one of the most used digital assets to date. Its unregulated nature and inherent anonymity of users have led to a dramatic increase in its use for illicit activities. This calls for the development of novel methods capable of characterizing different entities in the Bitcoin network. In this paper, a method to attack Bitcoin anonymity is presented, leveraging a novel cascading machine learning approach that requires only a few features directly extracted from Bitcoin blockchain data. Cascading, used to enrich entities information with data from previous classifications, led to considerably improved multi-class classification performance with excellent values of Precision close to 1.0 for each considered class. Final models were implemented and compared using different machine learning models and showed significantly higher accuracy compared to their baseline implementation. Our approach can contribute to the development of effective tools for Bitcoin entity characterization, which may assist in uncovering illegal activities.
k-anonymity is a popular model in privacy preserving data publishing. It provides privacy guarantee when a microdata table is released. In microdata, sensitive attributes contain high-sensitive and low sensitive values. Unfortunately, study in anonymity for distributing sensitive value is still rare. This study aims to distribute evenly high-sensitive value to quasi identifier group. We proposed an approach called Simple Distribution of Sensitive Value. We compared our method with systematic clustering which is considered as very effective method to group quasi identifier. Information entropy is used to measure the diversity in each quasi identifier group and in a microdata table. Experiment result show our method outperformed systematic clustering when high-sensitive value is distributed.
K-anonymity is a popular model used in microdata publishing to protect individual privacy. This paper introduces the idea of ball tree and projection area density partition into k-anonymity algorithm.The traditional kd-tree implements the division by forming a super-rectangular, but the super-rectangular has the area angle, so it cannot guarantee that the records on the corner are most similar to the records in this area. In this paper, the super-sphere formed by the ball-tree is used to address this problem. We adopt projection area density partition to increase the density of the resulting recorded points. We implement our algorithm with the Gotrack dataset and the Adult dataset in UCI. The experimentation shows that the k-anonymity algorithm based on ball-tree and projection area density partition, obtains more anonymous groups, and the generalization rate is lower. The smaller the K is, the more obvious the result advantage is. The result indicates that our algorithm can make data usability even higher.
Ciphertext Policy Attribute Based Encryption techniques provide fine grained access control to securely share the data in the organizations where access rights of users vary according to their roles. We have noticed that various key delegation mechanisms are provided for CP-ABE schemes but no key delegation mechanism exists for CP-ABE with hidden access policy. In practical, users' identity may be revealed from access policy in the organizations and unlimited further delegations may results in unauthorized data access. For maintaining the users' anonymity, the access structure should be hidden and every user must be restricted for specified further delegations. In this work, we have presented a flexible secure key delegation mechanism for CP-ABE with hidden access structure. The proposed scheme enhances the capability of existing CP-ABE schemes by supporting flexible delegation, attribute revocation and user revocation with negligible enhancement in computational cost.
An emerging Internet business is residential proxy (RESIP) as a service, in which a provider utilizes the hosts within residential networks (in contrast to those running in a datacenter) to relay their customers' traffic, in an attempt to avoid server- side blocking and detection. With the prominent roles the services could play in the underground business world, little has been done to understand whether they are indeed involved in Cybercrimes and how they operate, due to the challenges in identifying their RESIPs, not to mention any in-depth analysis on them. In this paper, we report the first study on RESIPs, which sheds light on the behaviors and the ecosystem of these elusive gray services. Our research employed an infiltration framework, including our clients for RESIP services and the servers they visited, to detect 6 million RESIP IPs across 230+ countries and 52K+ ISPs. The observed addresses were analyzed and the hosts behind them were further fingerprinted using a new profiling system. Our effort led to several surprising findings about the RESIP services unknown before. Surprisingly, despite the providers' claim that the proxy hosts are willingly joined, many proxies run on likely compromised hosts including IoT devices. Through cross-matching the hosts we discovered and labeled PUP (potentially unwanted programs) logs provided by a leading IT company, we uncovered various illicit operations RESIP hosts performed, including illegal promotion, Fast fluxing, phishing, malware hosting, and others. We also reverse engi- neered RESIP services' internal infrastructures, uncovered their potential rebranding and reselling behaviors. Our research takes the first step toward understanding this new Internet service, contributing to the effective control of their security risks.
A multitude of leaked data can be purchased through the Dark Web nowadays. Recent reports highlight that the largest footprints of leaked data, which range from employee passwords to intellectual property, are linked to governmental institutions. According to OWL Cybersecurity, the US Navy is most affected. Thinking of leaked data like personal files, this can have a severe impact. For example, it can be the cornerstone for the start of sophisticated social engineering attacks, for getting credentials for illegal system access or installing malicious code in the target network. If personally identifiable information or sensitive data, access plans, strategies or intellectual property are traded on the Dark Web, this could pose a threat to the armed forces. The actual impact, role, and dimension of information treated in the Dark Web are rarely analysed. Is the available data authentic and useful? Can it endanger the capabilities of armed forces? These questions are even more challenging, as several well-known cases of deanonymization have been published over recent years, raising the question whether somebody really would use the Dark Web to sell highly sensitive information. In contrast, fake offers from scammers can be found regularly, only set up to cheat possible buyers. A victim of illegal offers on the Dark Web will typically not go to the police. The paper analyses the technical base of the Dark Web and examines possibilities of deanonymization. After an analysis of Dark Web marketplaces and the articles traded there, a discussion of the potential risks to military operations will be used to identify recommendations on how to minimize the risk. The analysis concludes that surveillance of the Dark Web is necessary to increase the chance of identifying sensitive information early; but actually the `open' internet, the surface web and the Deep Web, poses the more important risk factor, as it is - in practice - more difficult to surveil than the Dark Web, and only a small share of breached information is traded on the latter.
The real-time map updating enables vehicles to obtain accurate and timely traffic information. Especially for driverless cars, real-time map updating can provide high-precision map service to assist the navigation, which requires vehicles to actively upload the latest road conditions. However, due to the untrusted network environment, it is difficult for the real-time map updating server to evaluate the authenticity of the road information from the vehicles. In order to prevent malicious vehicles from deliberately spreading false information and protect the privacy of vehicles from tracking attacks, this paper proposes a trust-based real-time map updating scheme. In this scheme, the public key is used as the identifier of the vehicle for anonymous communication with conditional anonymity. In addition, the blockchain is applied to provide the existence proof for the public key certificate of the vehicle. At the same time, to avoid the spread of false messages, a trust evaluation algorithm is designed. The fog node can validate the received massages from vehicles using Bayesian Inference Model. Based on the verification results, the road condition information is sent to the real-time map updating server so that the server can update the map in time and prevent the secondary traffic accident. In order to calculate the trust value offset for the vehicle, the fog node generates a rating for each message source vehicle, and finally adds the relevant data to the blockchain. According to the result of security analysis, this scheme can guarantee the anonymity and prevent the Sybil attack. Simulation results show that the proposed scheme is effective and accurate in terms of real-time map updating and trust values calculating.