Biblio
Suppose we are given a large number of sequences on a given alphabet, and an adversary is interested in identifying (de-anonymizing) a specific target sequence based on its patterns. Our goal is to thwart such an adversary by obfuscating the target sequences by applying artificial (but small) distortions to its values. A key point here is that we would like to make no assumptions about the statistical model of such sequences. This is in contrast to existing literature where assumptions (e.g., Markov chains) are made regarding such sequences to obtain privacy guarantees. We relate this problem to a set of combinatorial questions on sequence construction based on which we are able to obtain provable guarantees. This problem is relevant to important privacy applications: from fingerprinting webpages visited by users through anonymous communication systems to linking communicating parties on messaging applications to inferring activities of users of IoT devices.
Internet Service Providers (ISPs) have an economic and operational interest in detecting malicious network activity relating to their subscribers. However, it is unclear what kind of traffic data an ISP has available for cyber-security research, and under which legal conditions it can be used. This paper gives an overview of the challenges posed by legislation and of the data sources available to a European ISP. DNS and NetFlow logs are identified as relevant data sources and the state of the art in anonymization and fingerprinting techniques is discussed. Based on legislation, data availability and privacy considerations, a practically applicable anonymization policy is presented.
Reuse of pre-existing industry datasets for research purposes requires a multi-stakeholder solution that balances the researcher's analysis objectives with the need to engage the industry data custodian, whilst respecting the privacy rights of human data subjects. Current methods place the burden on the data custodian, whom may not be sufficiently trained to fully appreciate the nuances of data de-identification. Through modelling of functional, quality, and emotional goals, we propose a de-identification in the cloud approach whereby the researcher proposes analyses along with the extraction and de-identification operations, while engaging the industry data custodian with secure control over authorising the proposed analyses. We demonstrate our approach through implementation of a de-identification portal for sports club data.
The recent emergence of smartphones, cloud computing, and the Internet of Things has brought about the explosion of data creation. By collating and merging these enormous data with other information, services that use information become more sophisticated and advanced. However, at the same time, the consideration of privacy violations caused by such merging is indispensable. Various anonymization methods have been proposed to preserve privacy. The conventional perturbation-based anonymization method of location data adds comparatively larger noise, and the larger noise makes it difficult to utilize the data effectively for secondary use. In this research, to solve these problems, we first clarified the definition of privacy preservation and then propose TMk-anonymity according to the definition.
Collecting and analyzing personal data is important in modern information applications. Though the privacy of data providers should be protected, some adversarial users may behave badly under circumstances where they are not identified. However, the privacy of honest users should not be infringed. Thus, detecting anomalies without revealing normal users-identities is quite important for operating information systems using personal data. Though various methods of statistics and machine learning have been developed for detecting anomalies, it is difficult to know in advance what anomaly will come up. Thus, it would be useful to provide a "general" framework that can employ any anomaly detection method regardless of the type of data and the nature of the abnormality. In this paper, we propose a privacy preserving anomaly detection framework that allows an authority to detect adversarial users while other honest users are kept anonymous. By using cryptographic techniques, group signatures with message-dependent opening (GS-MDO) and public key encryption with non-interactive opening (PKENO), we provide a correspondence table that links a user and data in a secure way, and we can employ any anonymization technique and any anomaly detection method. It is particularly worth noting that no big brother exists, meaning that no single entity can identify users, while bad behaviors are always traceable. We also show the result of implementing our framework. Briefly, the overhead of our framework is on the order of dozens of milliseconds.
The prevalence of mobile devices and location-based services (LBS) has generated great concerns regarding the LBS users' privacy, which can be compromised by statistical analysis of their movement patterns. A number of algorithms have been proposed to protect the privacy of users in such systems, but the fundamental underpinnings of such remain unexplored. Recently, the concept of perfect location privacy was introduced and its achievability was studied for anonymization-based LBS systems, where user identifiers are permuted at regular intervals to prevent identification based on statistical analysis of long time sequences. In this paper, we significantly extend that investigation by incorporating the other major tool commonly employed to obtain location privacy: obfuscation, where user locations are purposely obscured to protect their privacy. Since anonymization and obfuscation reduce user utility in LBS systems, we investigate how location privacy varies with the degree to which each of these two methods is employed. We provide: (1) achievability results for the case where the location of each user is governed by an i.i.d. process; (2) converse results for the i.i.d. case as well as the more general Markov Chain model. We show that, as the number of users in the network grows, the obfuscation-anonymization plane can be divided into two regions: in the first region, all users have perfect location privacy; and, in the second region, no user has location privacy.
User privacy is an important issue in a blockchain based transaction system. Bitcoin, being one of the most widely used blockchain based transaction system, fails to provide enough protection on users' privacy. Many subsequent studies focus on establishing a system that hides the linkage between the identities (pseudonyms) of users and the transactions they carry out in order to provide a high level of anonymity. Examples include Zerocoin, Zerocash and so on. It thus becomes an interesting question whether such new transaction systems do provide enough protection on users' privacy. In this paper, we propose a novel and effective approach for de-anonymizing these transaction systems by leveraging information in the system that is not directly related, including the number of transactions made by each identity and time stamp of sending and receiving. Combining probability studies with optimization tools, we establish a model which allows us to determine, among all possible ways of linking between transactions and identities, the one that is most likely to be true. Subsequent transaction graph analysis could then be carried out, leading to the de-anonymization of the system. To solve the model, we provide exact algorithms based on mixed integer linear programming. Our research also establishes interesting relationships between the de-anonymization problem and other problems studied in the literature of theoretical computer science, e.g., the graph matching problem and scheduling problem.
Analytics in big data is maturing and moving towards mass adoption. The emergence of analytics increases the need for innovative tools and methodologies to protect data against privacy violation. Many data anonymization methods were proposed to provide some degree of privacy protection by applying data suppression and other distortion techniques. However, currently available methods suffer from poor scalability, performance and lack of framework standardization. Current anonymization methods are unable to cope with the massive size of data processing. Some of these methods were especially proposed for MapReduce framework to operate in Big Data. However, they still operate in conventional data management approaches. Therefore, there were no remarkable gains in the performance. We introduce a framework that can operate in MapReduce environment to benefit from its advantages, as well as from those in Hadoop ecosystems. Our framework provides a granular user's access that can be tuned to different authorization levels. The proposed solution provides a fine-grained alteration based on the user's authorization level to access MapReduce domain for analytics. Using well-developed role-based access control approaches, this framework is capable of assigning roles to users and map them to relevant data attributes.
The anonymizing network Tor is examined as one method of anonymizing port scanning tools and avoiding identification and retaliation. Performing anonymized port scans through Tor is possible using Nmap, but parallelization of the scanning processes is required to accelerate the scan rate.
Privacy preservation is very essential in various real life applications such as medical science and financial analysis. This paper focuses on implementation of an asymmetric secure multi-party computation protocol using anonymization and public-key encryption where all parties have access to trusted third party (TTP) who (1) doesn't add any contribution to computation (2) doesn't know who is the owner of the input received (3) has large number of resources (4) decryption key is known to trusted third party (TTP) to get the actual input for computation of final result. In this environment, concern is to design a protocol which deploys TTP for computation. It is proposed that the protocol is very proficient (in terms of secure computation and individual privacy) for the parties than the other available protocols. The solution incorporates protocol using asymmetric encryption scheme where any party can encrypt a message with the public key but decryption can be done by only the possessor of the decryption key (private key). As the protocol works on asymmetric encryption and packetization it ensures following: (1) Confidentiality (Anonymity) (2) Security (3) Privacy (Data).
The growing popularity and development of data mining technologies bring serious threat to the security of individual,'s sensitive information. An emerging research topic in data mining, known as privacy-preserving data mining (PPDM), has been extensively studied in recent years. The basic idea of PPDM is to modify the data in such a way so as to perform data mining algorithms effectively without compromising the security of sensitive information contained in the data. Current studies of PPDM mainly focus on how to reduce the privacy risk brought by data mining operations, while in fact, unwanted disclosure of sensitive information may also happen in the process of data collecting, data publishing, and information (i.e., the data mining results) delivering. In this paper, we view the privacy issues related to data mining from a wider perspective and investigate various approaches that can help to protect sensitive information. In particular, we identify four different types of users involved in data mining applications, namely, data provider, data collector, data miner, and decision maker. For each type of user, we discuss his privacy concerns and the methods that can be adopted to protect sensitive information. We briefly introduce the basics of related research topics, review state-of-the-art approaches, and present some preliminary thoughts on future research directions. Besides exploring the privacy-preserving approaches for each type of user, we also review the game theoretical approaches, which are proposed for analyzing the interactions among different users in a data mining scenario, each of whom has his own valuation on the sensitive information. By differentiating the responsibilities of different users with respect to security of sensitive information, we would like to provide some useful insights into the study of PPDM.
Big data's explosive growth has prompted the US government to release new reports that address the issues--particularly related to privacy--resulting from this growth. The Web extra at http://youtu.be/j49eoe5g8-c is an audio recording from the Computing and the Law column, in which authors Brian M. Gaff, Heather Egan Sussman, and Jennifer Geetter discuss how big data's explosive growth has prompted the US government to release new reports that address the issues--particularly related to privacy--resulting from this growth.
The availability of sophisticated source attribution techniques raises new concerns about privacy and anonymity of photographers, activists, and human right defenders who need to stay anonymous while spreading their images and videos. Recently, the use of seam-carving, a content-aware resizing method, has been proposed to anonymize the source camera of images against the well-known photoresponse nonuniformity (PRNU)-based source attribution technique. In this paper, we provide an analysis of the seam-carving-based source camera anonymization method by determining the limits of its performance introducing two adversarial models. Our analysis shows that the effectiveness of the deanonymization attacks depend on various factors that include the parameters of the seam-carving method, strength of the PRNU noise pattern of the camera, and an adversary's ability to identify uncarved image blocks in a seam-carved image. Our results show that, for the general case, there should not be many uncarved blocks larger than the size of 50×50 pixels for successful anonymization of the source camera.