Biblio
In order to improve the buffering performance of the data encrypted by CP-ABE (ciphertext policy attribute based encryption), this paper proposed a Markov prefetching model based on attribute classification. The prefetching model combines the access strategy of CP-ABE encrypted file, establishes the user relationship network according to the attribute value of the user, classifies the user by the modularity-based community partitioning algorithm, and establishes a Markov prefetching model based on attribute classification. In comparison with the traditional Markov prefetching model and the classification-based Markov prefetching model, the attribute-based Markov prefetching model is proposed in this paper has higher prefetch accuracy and coverage.
As we notice the increasing adoption of Cellular IoT solutions (smart-home, e-health, among others), there are still some security aspects that can be improved as these devices can suffer various types of attacks that can have a high-impact over our daily lives. In order to avoid this, we present a multi-front security solution that consists on a federated cross-layered authentication mechanism, as well as a machine learning platform with anomaly detection techniques for data traffic analysis as a way to study devices' behavior so it can preemptively detect attacks and minimize their impact. In this paper, we also present a proof-of-concept to illustrate the proposed solution and showcase its feasibility, as well as the discussion of future iterations that will occur for this work.
To our best knowledge, the p-sensitive k-anonymity model is a sophisticated model to resist linking attacks and homogeneous attacks in data publishing. However, if the distribution of sensitive values is skew, the model is difficult to defend against skew attacks and even faces sensitive attacks. In practice, the privacy requirements of different sensitive values are not always identical. The “one size fits all” unified privacy protection level may cause unnecessary information loss. To address these problems, the paper quantifies privacy requirements with the concept of IDF and concerns more about sensitive groups. Two enhanced anonymous models with personalized protection characteristic, that is, (p,αisg) -sensitive k-anonymity model and (pi,αisg)-sensitive k-anonymity model, are then proposed to resist skew attacks and sensitive attacks. Furthermore, two clustering algorithms with global search and local search are designed to implement our models. Experimental results show that the two enhanced models have outstanding advantages in better privacy at the expense of a little data utility.
Recently, a large amount of research studies aiming at the privacy-preserving data publishing have been conducted. We find that most K-anonymity algorithms fail to consider the characteristics of attribute values distribution in data and the contribution value differences in quasi-identifier attributes when service-oriented. In this paper, the importance of distribution characteristics of attribute values and the differences in contribution value of quasi-identifier attributes to anonymous results are illustrated. In order to maximize the utility of released data, a service-oriented adaptive anonymity algorithm is proposed. We establish a model of reaction dispersion degree to quantify the characteristics of attribute value distribution and introduce the concept of utility weight related to the contribution value of quasi-identifier attributes. The priority coefficient and the characterization coefficient of partition quality are defined to optimize selection strategies of dimension and splitting value in anonymity group partition process adaptively, which can reduce unnecessary information loss so as to further improve the utility of anonymized data. The rationality and validity of the algorithm are verified by theoretical analysis and multiple experiments.
Partitional Clustering Algorithm (PCA) on the Hadoop Distributed File System is to perform big data securities using the Perturbation Technique is the main idea of the proposed work. There are numerous clustering methods available that are used to categorize the information from the big data. PCA discovers the cluster based on the initial partition of the data. In this approach, it is possible to develop a security safeguarding of data that is impoverished to allow the calculations and communication. The performances were analyzed on Health Care database under the studies of various parameters like precision, accuracy, and F-score measure. The outcome of the results is to demonstrate that this method is used to decrease the complication in preserving privacy and better accuracy than that of the existing techniques.
K-anonymity is a popular model used in microdata publishing to protect individual privacy. This paper introduces the idea of ball tree and projection area density partition into k-anonymity algorithm.The traditional kd-tree implements the division by forming a super-rectangular, but the super-rectangular has the area angle, so it cannot guarantee that the records on the corner are most similar to the records in this area. In this paper, the super-sphere formed by the ball-tree is used to address this problem. We adopt projection area density partition to increase the density of the resulting recorded points. We implement our algorithm with the Gotrack dataset and the Adult dataset in UCI. The experimentation shows that the k-anonymity algorithm based on ball-tree and projection area density partition, obtains more anonymous groups, and the generalization rate is lower. The smaller the K is, the more obvious the result advantage is. The result indicates that our algorithm can make data usability even higher.
Differential privacy is an approach that preserves patient privacy while permitting researchers access to medical data. This paper presents mechanisms proposed to satisfy differential privacy while answering a given workload of range queries. Representing input data as a vector of counts, these methods partition the vector according to relationships between the data and the ranges of the given queries. After partitioning the vector into buckets, the counts of each bucket are estimated privately and split among the bucket's positions to answer the given query set. The performance of the proposed method was evaluated using different workloads over several attributes. The results show that partitioning the vector based on the data can produce more accurate answers, while partitioning the vector based on the given workload improves privacy. This paper's two main contributions are: (1) improving earlier work on partitioning mechanisms by building a greedy algorithm to partition the counts' vector efficiently, and (2) its adaptive algorithm considers the sensitivity of the given queries before providing results.
The rapid development of Internet has resulted in massive information overloading recently. These information is usually represented by high-dimensional feature vectors in many related applications such as recognition, classification and retrieval. These applications usually need efficient indexing and search methods for such large-scale and high-dimensional database, which typically is a challenging task. Some efforts have been made and solved this problem to some extent. However, most of them are implemented in a single machine, which is not suitable to handle large-scale database.In this paper, we present a novel data index structure and nearest neighbor search algorithm implemented on Apache Spark. We impose a grid on the database and index data by non-empty grid cells. This grid-based index structure is simple and easy to be implemented in parallel. Moreover, we propose to build a scalable KNN graph on the grids, which increase the efficiency of this index structure by a low cost in parallel implementation. Finally, experiments are conducted in both public databases and synthetic databases, showing that the proposed methods achieve overall high performance in both efficiency and accuracy.
Collaborative Filtering (CF) is a successful technique that has been implemented in recommender systems and Privacy Preserving Collaborative Filtering (PPCF) aroused increasing concerns of the society. Current solutions mainly focus on cryptographic methods, obfuscation methods, perturbation methods and differential privacy methods. But these methods have some shortcomings, such as unnecessary computational cost, lower data quality and hard to calibrate the magnitude of noise. This paper proposes a (k, p, I)-anonymity method that improves the existing k-anonymity method in PPCF. The method works as follows: First, it applies Latent Factor Model (LFM) to reduce matrix sparsity. Then it improves Maximum Distance to Average Vector (MDAV) microaggregation algorithm based on importance partitioning to increase homogeneity among records in each group which can retain better data quality and (p, I)-diversity model where p is attacker's prior knowledge about users' ratings and I is the diversity among users in each group to improve the level of privacy preserving. Theoretical and experimental analyses show that our approach ensures a higher level of privacy preserving based on lower information loss.
Differential privacy is a rigorous privacy standard that has been applied to a range of data analysis tasks. To broaden the application scenarios of differential privacy when data records have dependencies, the notion of Bayesian differential privacy has been recently proposed. However, it is unknown whether Bayesian differential privacy preserves three nice properties of differential privacy: sequential composability, parallel composability, and post-processing. In this paper, we provide an affirmative answer to this question; i.e., Bayesian differential privacy still have these properties. The idea behind sequential composability is that if we have m algorithms Y1, Y2,łdots, Ym, where Y$\mathscrl$ is independently $ε\mathscrl$-Bayesian differential private for $\mathscrl$ = 1,2,łdots, m, then by feeding the result of Y1 into Y2, the result of Y2 into Y3, and so on, we will finally have an $Σ$m$\mathscrl$=;1 $ε\mathscrl$-Bayesian differential private algorithm. For parallel composability, we consider the situation where a database is partitioned into m disjoint subsets. The $\mathscrl$-th subset is input to a Bayesian differential private algorithm Y$\mathscrl$, for $\mathscrl$= 1, 2,łdots, m. Then the parallel composition of Y1, Y2,łdots, Ym will be maxm$\mathscrl$=;1=1 $ε\mathscrl$-Bayesian differential private. The postprocessing property means that a data analyst, without additional knowledge abo- t the private database, cannot compute a function of the output of a Bayesian differential private algorithm and reduce its privacy guarantee.