Biblio
In the current world, day by day the data growth and the investigation about that information increased due to the pervasiveness of computing devices, but people are reluctant to share their information on online portals or surveys fearing safety because sensitive information such as credit card information, medical conditions and other personal information in the wrong hands can mean danger to the society. These days privacy preserving has become a setback for storing data in data repository so for that reason data in the repository should be made undistinguishable, data is encrypted while storing and later decrypted when needed for analysis purpose in data mining. While storing the raw data of the individuals it is important to remove person-identifiable information such as name, employee id. However, the other attributes pertaining to the person should be encrypted so the methodologies used to implement. These methodologies can make data in the repository secure and PPDM task can made easier.
With the advent of the big data era, information systems have exhibited some new features, including boundary obfuscation, system virtualization, unstructured and diversification of data types, and low coupling among function and data. These features not only lead to a big difference between big data technology (DT) and information technology (IT), but also promote the upgrading and evolution of network security technology. In response to these changes, in this paper we compare the characteristics between IT era and DT era, and then propose four DT security principles: privacy, integrity, traceability, and controllability, as well as active and dynamic defense strategy based on "propagation prediction, audit prediction, dynamic management and control". We further discuss the security challenges faced by DT and the corresponding assurance strategies. On this basis, the big data security technologies can be divided into four levels: elimination, continuation, improvement, and innovation. These technologies are analyzed, combed and explained according to six categories: access control, identification and authentication, data encryption, data privacy, intrusion prevention, security audit and disaster recovery. The results will support the evolution of security technologies in the DT era, the construction of big data platforms, the designation of security assurance strategies, and security technology choices suitable for big data.
Data outsourcing to cloud has been a common IT practice nowadays due to its significant benefits. Meanwhile, security and privacy concerns are critical obstacles to hinder the further adoption of cloud. Although data encryption can mitigate the problem, it reduces the functionality of query processing, e.g., disabling SQL queries. Several schemes have been proposed to enable one-dimensional query on encrypted data, but multi-dimensional range query has not been well addressed. In this paper, we propose a secure and scalable scheme that can support multi-dimensional range queries over encrypted data. The proposed scheme has three salient features: (1) Privacy: the server cannot learn the contents of queries and data records during query processing. (2) Efficiency: we utilize hierarchical cubes to encode multi-dimensional data records and construct a secure tree index on top of such encoding to achieve sublinear query time. (3) Verifiability: our scheme allows users to verify the correctness and completeness of the query results to address server's malicious behaviors. We perform formal security analysis and comprehensive experimental evaluations. The results on real datasets demonstrate that our scheme achieves practical performance while guaranteeing data privacy and result integrity.