Biblio
In recent trends, privacy preservation is the most predominant factor, on big data analytics and cloud computing. Every organization collects personal data from the users actively or passively. Publishing this data for research and other analytics without removing Personally Identifiable Information (PII) will lead to the privacy breach. Existing anonymization techniques are failing to maintain the balance between data privacy and data utility. In order to provide a trade-off between the privacy of the users and data utility, a Mondrian based k-anonymity approach is proposed. To protect the privacy of high-dimensional data Deep Neural Network (DNN) based framework is proposed. The experimental result shows that the proposed approach mitigates the information loss of the data without compromising privacy.
Research on keystroke dynamics has the good potential to offer continuous authentication that complements conventional authentication methods in combating insider threats and identity theft before more harm can be done to the genuine users. Unfortunately, the large amount of data required by free-text keystroke authentication often contain personally identifiable information, or PII, and personally sensitive information, such as a user's first name and last name, username and password for an account, bank card numbers, and social security numbers. As a result, there are privacy risks associated with keystroke data that must be mitigated before they are shared with other researchers. We conduct a systematic study to remove PII's from a recent large keystroke dataset. We find substantial amounts of PII's from the dataset, including names, usernames and passwords, social security numbers, and bank card numbers, which, if leaked, may lead to various harms to the user, including personal embarrassment, blackmails, financial loss, and identity theft. We thoroughly evaluate the effectiveness of our detection program for each kind of PII. We demonstrate that our PII detection program can achieve near perfect recall at the expense of losing some useful information (lower precision). Finally, we demonstrate that the removal of PII's from the original dataset has only negligible impact on the detection error tradeoff of the free-text authentication algorithm by Gunetti and Picardi. We hope that this experience report will be useful in informing the design of privacy removal in future keystroke dynamics based user authentication systems.
Data is one of the most valuable assets for organization. It can facilitate users or organizations to meet their diverse goals, ranging from scientific advances to business intelligence. Due to the tremendous growth of data, the notion of big data has certainly gained momentum in recent years. Cloud computing is a key technology for storing, managing and analyzing big data. However, such large, complex, and growing data, typically collected from various data sources, such as sensors and social media, can often contain personally identifiable information (PII) and thus the organizations collecting the big data may want to protect their outsourced data from the cloud. In this paper, we survey our research towards development of efficient and effective privacy-enhancing (PE) techniques for management and analysis of big data in cloud computing.We propose our initial approaches to address two important PE applications: (i) privacy-preserving data management and (ii) privacy-preserving data analysis under the cloud environment. Additionally, we point out research issues that still need to be addressed to develop comprehensive solutions to the problem of effective and efficient privacy-preserving use of data.