Biblio
Expected and unexpected risks in cloud computing, which included data security, data segregation, and the lack of control and knowledge, have led to some dilemmas in several fields. Among all of these dilemmas, the privacy problem is even more paramount, which has largely constrained the prevalence and development of cloud computing. There are several privacy protection algorithms proposed nowadays, which generally include two categories, Anonymity algorithm, and differential privacy mechanism. Since many types of research have already focused on the efficiency of the algorithms, few of them emphasized the different orientation and demerits between the two algorithms. Motivated by this emerging research challenge, we have conducted a comprehensive survey on the two popular privacy protection algorithms, namely K-Anonymity Algorithm and Differential Privacy Algorithm. Based on their principles, implementations, and algorithm orientations, we have done the evaluations of these two algorithms. Several expectations and comparisons are also conducted based on the current cloud computing privacy environment and its future requirements.
K-anonymity is a popular model used in microdata publishing to protect individual privacy. This paper introduces the idea of ball tree and projection area density partition into k-anonymity algorithm.The traditional kd-tree implements the division by forming a super-rectangular, but the super-rectangular has the area angle, so it cannot guarantee that the records on the corner are most similar to the records in this area. In this paper, the super-sphere formed by the ball-tree is used to address this problem. We adopt projection area density partition to increase the density of the resulting recorded points. We implement our algorithm with the Gotrack dataset and the Adult dataset in UCI. The experimentation shows that the k-anonymity algorithm based on ball-tree and projection area density partition, obtains more anonymous groups, and the generalization rate is lower. The smaller the K is, the more obvious the result advantage is. The result indicates that our algorithm can make data usability even higher.