Biblio
With the ubiquitous computing of providing services and applications at anywhere and anytime, cloud computing is the best option as it offers flexible and pay-per-use based services to its customers. Nevertheless, security and privacy are the main challenges to its success due to its dynamic and distributed architecture, resulting in generating big data that should be carefully analysed for detecting network's vulnerabilities. In this paper, we propose a Collaborative Anomaly Detection Framework (CADF) for detecting cyber attacks from cloud computing environments. We provide the technical functions and deployment of the framework to illustrate its methodology of implementation and installation. The framework is evaluated on the UNSW-NB15 dataset to check its credibility while deploying it in cloud computing environments. The experimental results showed that this framework can easily handle large-scale systems as its implementation requires only estimating statistical measures from network observations. Moreover, the evaluation performance of the framework outperforms three state-of-the-art techniques in terms of false positive rate and detection rate.
The increasing complexity and ubiquity in user connectivity, computing environments, information content, and software, mobile, and web applications transfers the responsibility of privacy management to the individuals. Hence, making it extremely difficult for users to maintain the intelligent and targeted level of privacy protection that they need and desire, while simultaneously maintaining their ability to optimally function. Thus, there is a critical need to develop intelligent, automated, and adaptable privacy management systems that can assist users in managing and protecting their sensitive data in the increasingly complex situations and environments that they find themselves in. This work is a first step in exploring the development of such a system, specifically how user personality traits and other characteristics can be used to help automate determination of user sharing preferences for a variety of user data and situations. The Big-Five personality traits of openness, conscientiousness, extroversion, agreeableness, and neuroticism are examined and used as inputs into several popular machine learning algorithms in order to assess their ability to elicit and predict user privacy preferences. Our results show that the Big-Five personality traits can be used to significantly improve the prediction of user privacy preferences in a number of contexts and situations, and so using machine learning approaches to automate the setting of user privacy preferences has the potential to greatly reduce the burden on users while simultaneously improving the accuracy of their privacy preferences and security.
In this paper, we review big data characteristics and security challenges in the cloud and visit different cloud domains and security regulations. We propose using integrated auditing for secure data storage and transaction logs, real-time compliance and security monitoring, regulatory compliance, data environment, identity and access management, infrastructure auditing, availability, privacy, legality, cyber threats, and granular auditing to achieve big data security. We apply a stochastic process model to conduct security analyses in availability and mean time to security failure. Potential future works are also discussed.
Enterprises usually provide strong controls to prevent cyberattacks and inadvertent leakage of data to external entities. However, in the case where employees and data scientists have legitimate access to analyze and derive insights from the data, there are insufficient controls and employees are usually permitted access to all information about the customers of the enterprise including sensitive and private information. Though it is important to be able to identify useful patterns of one's customers for better customization and service, customers' privacy must not be sacrificed to do so. We propose an alternative — a framework that will allow privacy preserving data analytics over big data. In this paper, we present an efficient and scalable framework for Apache Spark, a cluster computing framework, that provides strong privacy guarantees for users even in the presence of an informed adversary, while still providing high utility for analysts. The framework, titled Shade, includes two mechanisms — SparkLAP, which provides Laplacian perturbation based on a user's query and SparkSAM, which uses the contents of the database itself in order to calculate the perturbation. We show that the performance of Shade is substantially better than earlier differential privacy systems without loss of accuracy, particularly when run on datasets small enough to fit in memory, and find that SparkSAM can even exceed performance of an identical nonprivate Spark query.
In the age of Big Data, we are witnessing a huge proliferation of digital data capturing our lives and our surroundings. Data privacy is a critical barrier to data analytics and privacy-preserving data disclosure becomes a key aspect to leveraging large-scale data analytics due to serious privacy risks. Traditional privacy-preserving data publishing solutions have focused on protecting individual's private information while considering all aggregate information about individuals as safe for disclosure. This paper presents a new privacy-aware data disclosure scheme that considers group privacy requirements of individuals in bipartite association graph datasets (e.g., graphs that represent associations between entities such as customers and products bought from a pharmacy store) where even aggregate information about groups of individuals may be sensitive and need protection. We propose the notion of $ε$g-Group Differential Privacy that protects sensitive information of groups of individuals at various defined group protection levels, enabling data users to obtain the level of information entitled to them. Based on the notion of group privacy, we develop a suite of differentially private mechanisms that protect group privacy in bipartite association graphs at different group privacy levels based on specialization hierarchies. We evaluate our proposed techniques through extensive experiments on three real-world association graph datasets and our results demonstrate that the proposed techniques are effective, efficient and provide the required guarantees on group privacy.
With the development of modern logistics industry railway freight enterprises as the main traditional logistics enterprises, the service mode is facing many problems. In the era of big data, for railway freight enterprises, coordinated development and sharing of information resources have become the requirements of the times, while how to protect the privacy of citizens has become one of the focus issues of the public. To prevent the disclosure or abuse of the citizens' privacy information, the citizens' privacy needs to be preserved in the process of information opening and sharing. However, most of the existing privacy preserving models cannot to be used to resist attacks with continuously growing background knowledge. This paper presents the method of applying differential privacy to protect associated data, which can be shared in railway freight service association information. First, the original service data need to slice by optimal shard length, then differential method and apriori algorithm is used to add Laplace noise in the Candidate sets. Thus the citizen's privacy information can be protected even if the attacker gets strong background knowledge. Last, sharing associated data to railway information resource partners. The steps and usefulness of the discussed privacy preservation method is illustrated by an example.
In smart grid, large quantities of data is collected from various applications, such as smart metering substation state monitoring, electric energy data acquisition, and smart home. Big data acquired in smart grid applications is usually sensitive. For instance, in order to dispatch accurately and support the dynamic price, lots of smart meters are installed at user's house to collect the real-time data, but all these collected data are related to user privacy. In this paper, we propose a data aggregation scheme based on secret sharing with fault tolerance in smart grid, which ensures that control center gets the integrated data without revealing user's privacy. Meanwhile, we also consider fault tolerance during the data aggregation. At last, we analyze the security of our scheme and carry out experiments to validate the results.
In the recent years, we have observed the development of several connected and mobile devices intended for daily use. This development has come with many risks that might not be perceived by the users. These threats are compromising when an unauthorized entity has access to private big data generated through the user objects in the Internet of Things. In the literature, many solutions have been proposed in order to protect the big data, but the security remains a challenging issue. This work is carried out with the aim to provide a solution to the access control to the big data and securing the localization of their generator objects. The proposed models are based on Attribute Based Encryption, CHORD protocol and $μ$TESLA. Through simulations, we compare our solutions to concurrent protocols and we show its efficiency in terms of relevant criteria.
Location privacy has become a significant challenge of big data. Particularly, by the advantage of big data handling tools availability, huge location data can be managed and processed easily by an adversary to obtain user private information from Location-Based Services (LBS). So far, many methods have been proposed to preserve user location privacy for these services. Among them, dummy-based methods have various advantages in terms of implementation and low computation costs. However, they suffer from the spatiotemporal correlation issue when users submit consecutive requests. To solve this problem, a practical hybrid location privacy protection scheme is presented in this paper. The proposed method filters out the correlated fake location data (dummies) before submissions. Therefore, the adversary can not identify the user's real location. Evaluations and experiments show that our proposed filtering technique significantly improves the performance of existing dummy-based methods and enables them to effectively protect the user's location privacy in the environment of big data.
The main issue with big data in cloud is the processed or used always need to be by third party. It is very important for the owners of data or clients to trust and to have the guarantee of privacy for the information stored in cloud or analyzed as big data. The privacy models studied in previous research showed that privacy infringement for big data happened because of limitation, privacy guarantee rate or dissemination of accurate data which is obtainable in the data set. In addition, there are various privacy models. In order to determine the best and the most appropriate model to be applied in the future, which also guarantees big data privacy, it is necessary to invest in research and study. In the next part, we surfed some of the privacy models in order to determine the advantages and disadvantages of each model in privacy assurance for big data in cloud. The present study also proposes combined Diff-Anonym algorithm (K-anonymity and differential models) to provide data anonymity with guarantee to keep balance between ambiguity of private data and clarity of general data.
In a world where traditional notions of privacy are increasingly challenged by the myriad companies that collect and analyze our data, it is important that decision-making entities are held accountable for unfair treatments arising from irresponsible data usage. Unfortunately, a lack of appropriate methodologies and tools means that even identifying unfair or discriminatory effects can be a challenge in practice. We introduce the unwarranted associations (UA) framework, a principled methodology for the discovery of unfair, discriminatory, or offensive user treatment in data-driven applications. The UA framework unifies and rationalizes a number of prior attempts at formalizing algorithmic fairness. It uniquely combines multiple investigative primitives and fairness metrics with broad applicability, granular exploration of unfair treatment in user subgroups, and incorporation of natural notions of utility that may account for observed disparities. We instantiate the UA framework in FairTest, the first comprehensive tool that helps developers check data-driven applications for unfair user treatment. It enables scalable and statistically rigorous investigation of associations between application outcomes (such as prices or premiums) and sensitive user attributes (such as race or gender). Furthermore, FairTest provides debugging capabilities that let programmers rule out potential confounders for observed unfair effects. We report on use of FairTest to investigate and in some cases address disparate impact, offensive labeling, and uneven rates of algorithmic error in four data-driven applications. As examples, our results reveal subtle biases against older populations in the distribution of error in a predictive health application and offensive racial labeling in an image tagger.
Sharing and working on sensitive data in distributed settings from healthcare to finance is a major challenge due to security and privacy concerns. Secure multiparty computation (SMC) is a viable panacea for this, allowing distributed parties to make computations while the parties learn nothing about their data, but the final result. Although SMC is instrumental in such distributed settings, it does not provide any guarantees not to leak any information about individuals to adversaries. Differential privacy (DP) can be utilized to address this; however, achieving SMC with DP is not a trivial task, either. In this paper, we propose a novel Secure Multiparty Distributed Differentially Private (SM-DDP) protocol to achieve secure and private computations in a multiparty environment. Specifically, with our protocol, we simultaneously achieve SMC and DP in distributed settings focusing on linear regression on horizontally distributed data. That is, parties do not see each others’ data and further, can not infer information about individuals from the final constructed statistical model. Any statistical model function that allows independent calculation of local statistics can be computed through our protocol. The protocol implements homomorphic encryption for SMC and functional mechanism for DP to achieve the desired security and privacy guarantees. In this work, we first introduce the theoretical foundation for the SM-DDP protocol and then evaluate its efficacy and performance on two different datasets. Our results show that one can achieve individual-level privacy through the proposed protocol with distributed DP, which is independently applied by each party in a distributed fashion. Moreover, our results also show that the SM-DDP protocol incurs minimal computational overhead, is scalable, and provides security and privacy guarantees.
The Internet of Things (IoT) devices perform security-critical operations and deal with sensitive information in the IoT-based systems. Therefore, the increased deployment of smart devices will make them targets for cyber attacks. Adversaries can perform malicious actions, leak private information, and track devices' and their owners' location by gaining unauthorized access to IoT devices and networks. However, conventional security protocols are not primarily designed for resource constrained devices and therefore cannot be applied directly to IoT systems. In this paper, we propose Boot-IoT - a privacy-preserving, lightweight, and scalable security scheme for limited resource devices. Boot-IoT prevents a malicious device from joining an IoT network. Boot-IoT enables a device to compute a unique identity for authentication each time the device enters a network. Moreover, during device to device communication, Boot-IoT provides a lightweight mutual authentication scheme that ensures privacy-preserving identity usages. We present a detailed analysis of the security strength of BootIoT. We implemented a prototype of Boot-IoT on IoT devices powered by Contiki OS and provided an extensive comparative analysis of Boot-IoT with contemporary authentication methods. Our results show that Boot-IoT is resource efficient and provides better scalability compared to current solutions.
Cloud computing is significantly reshaping the computing industry built around core concepts such as virtualization, processing power, connectivity and elasticity to store and share IT resources via a broad network. It has emerged as the key technology that unleashes the potency of Big Data, Internet of Things, Mobile and Web Applications, and other related technologies; but it also comes with its challenges - such as governance, security, and privacy. This paper is focused on the security and privacy challenges of cloud computing with specific reference to user authentication and access management for cloud SaaS applications. The suggested model uses a framework that harnesses the stateless and secure nature of JWT for client authentication and session management. Furthermore, authorized access to protected cloud SaaS resources have been efficiently managed. Accordingly, a Policy Match Gate (PMG) component and a Policy Activity Monitor (PAM) component have been introduced. In addition, other subcomponents such as a Policy Validation Unit (PVU) and a Policy Proxy DB (PPDB) have also been established for optimized service delivery. A theoretical analysis of the proposed model portrays a system that is secure, lightweight and highly scalable for improved cloud resource security and management.
Smart Card has complications with validation and transmission process. Therefore, by using peeping attack, the secret code was stolen and secret filming while entering Personal Identification Number at the ATM machine. We intend to develop an authentication system to banks that protects the asset of user's. The data of a user is to be ensured that secure and isolated from the data leakage and other attacks Therefore, we propose a system, where ATM machine will have a QR code in which the information's are encrypted corresponding to the ATM machine and a mobile application in the customer's mobile which will decrypt the encoded QR information and sends the information to the server and user's details are displayed in the ATM machine and transaction can be done. Now, the user securely enters information to transfer money without risk of peeping attack in Automated Teller Machine by just scanning the QR code at the ATM by mobile application. Here, both the encryption and decryption technique are carried out by using Triple DES Algorithm (Data Encryption Standard).
Personalized medicine performs diagnoses and treatments according to the DNA information of the patients. The new paradigm will change the health care model in the future. A doctor will perform the DNA sequence matching instead of the regular clinical laboratory tests to diagnose and medicate the diseases. Additionally, with the help of the affordable personal genomics services such as 23andMe, personalized medicine will be applied to a great population. Cloud computing will be the perfect computing model as the volume of the DNA data and the computation over it are often immense. However, due to the sensitivity, the DNA data should be encrypted before being outsourced into the cloud. In this paper, we start from a practical system model of the personalize medicine and present a solution for the secure DNA sequence matching problem in cloud computing. Comparing with the existing solutions, our scheme protects the DNA data privacy as well as the search pattern to provide a better privacy guarantee. We have proved that our scheme is secure under the well-defined cryptographic assumption, i.e., the sub-group decision assumption over a bilinear group. Unlike the existing interactive schemes, our scheme requires only one round of communication, which is critical in practical application scenarios. We also carry out a simulation study using the real-world DNA data to evaluate the performance of our scheme. The simulation results show that the computation overhead for real world problems is practical, and the communication cost is small. Furthermore, our scheme is not limited to the genome matching problem but it applies to general privacy preserving pattern matching problems which is widely used in real world.
Since the first whole-genome sequencing, the biomedical research community has made significant steps towards a more precise, predictive and personalized medicine. Genomic data is nowadays widely considered privacy-sensitive and consequently protected by strict regulations and released only after careful consideration. Various additional types of biomedical data, however, are not shielded by any dedicated legal means and consequently disseminated much less thoughtfully. This in particular holds true for DNA methylation data as one of the most important and well-understood epigenetic element influencing human health. In this paper, we show that, in contrast to the aforementioned belief, releasing one's DNA methylation data causes privacy issues akin to releasing one's actual genome. We show that already a small subset of methylation regions influenced by genomic variants are sufficient to infer parts of someone's genome, and to further map this DNA methylation profile to the corresponding genome. Notably, we show that such re-identification is possible with 97.5% accuracy, relying on a dataset of more than 2500 genomes, and that we can reject all wrongly matched genomes using an appropriate statistical test. We provide means for countering this threat by proposing a novel cryptographic scheme for privately classifying tumors that enables a privacy-respecting medical diagnosis in a common clinical setting. The scheme relies on a combination of random forests and homomorphic encryption, and it is proven secure in the honest-but-curious model. We evaluate this scheme on real DNA methylation data, and show that we can keep the computational overhead to acceptable values for our application scenario.
Homomorphic encryption technology can settle a dispute of data privacy security in cloud environment, but there are many problems in the process of access the data which is encrypted by a homomorphic algorithm in the cloud. In this paper, on the premise of attribute encryption, we propose a fully homomorphic encrypt scheme which based on attribute encryption with LSSS matrix. This scheme supports fine-grained cum flexible access control along with "Query-Response" mechanism to enable users to efficiently retrieve desired data from cloud servers. In addition, the scheme should support considerable flexibility to revoke system privileges from users without updating the key client, it reduces the pressure of the client greatly. Finally, security analysis illustrates that the scheme can resist collusion attack. A comparison of the performance from existing CP-ABE scheme, indicates that our scheme reduces the computation cost greatly for users.
Genetic data are important dataset utilised in genetic epidemiology to investigate biologically coded information within the human genome. Enormous research has been delved into in recent years in order to fully sequence and understand the genome. Personalised medicine, patient response to treatments and relationships between specific genes and certain characteristics such as phenotypes and diseases, are positive impacts of studying the genome, just to mention a few. The sensitivity, longevity and non-modifiable nature of genetic data make it even more interesting, consequently, the security and privacy for the storage and processing of genomic data beg for attention. A common activity carried out by geneticists is the association analysis between allele-allele, or even a genetic locus and a disease. We demonstrate the use of cryptographic techniques such as homomorphic encryption schemes and multiparty computations, how such analysis can be carried out in a privacy friendly manner. We compute a 3 × 3 contingency table, and then, genome analyses algorithms such as linkage disequilibrium (LD) measures, all on the encrypted domain. Our computation guarantees privacy of the genome data under our security settings, and provides up to 98.4% improvement, compared to an existing solution.
As the amount of spatial data gets bigger, organizations realized that it is cheaper and more flexible to keep their data on the Cloud rather than to establish and maintain in-house huge data centers. Though this saves a lot for IT costs, organizations are still concerned about the privacy and security of their data. Encrypting the whole database before uploading it to the Cloud solves the security issue. But querying the database requires downloading and decrypting the data set, which is impractical. In this paper, we propose a new scheme for protecting the privacy and integrity of spatial data stored in the Cloud while being able to execute range queries efficiently. The proposed technique suggests a new index structure to support answering range query over encrypted data set. The proposed indexing scheme is based on the Z-curve. The paper describes a distributed algorithm for answering range queries over spatial data stored on the Cloud. We carried many simulation experiments to measure the performance of the proposed scheme. The experimental results show that the proposed scheme outperforms the most recent schemes by Kim et al. in terms of data redundancy.
With increasing popularity of cloud computing, the data owners are motivated to outsource their sensitive data to cloud servers for flexibility and reduced cost in data management. However, privacy is a big concern for outsourcing data to the cloud. The data owners typically encrypt documents before outsourcing for privacy-preserving. As the volume of data is increasing at a dramatic rate, it is essential to develop an efficient and reliable ciphertext search techniques, so that data owners can easily access and update cloud data. In this paper, we propose a privacy preserving multi-keyword ranked search scheme over encrypted data in cloud along with data integrity using a new authenticated data structure MIR-tree. The MIR-tree based index with including the combination of widely used vector space model and TF×IDF model in the index construction and query generation. We use inverted file index for storing word-digest, which provides efficient and fast relevance between the query and cloud data. Design an authentication set(AS) for authenticating the queries, for verifying top-k search results. Because of tree based index, our scheme achieves optimal search efficiency and reduces communication overhead for verifying the search results. The analysis shows security and efficiency of our scheme.
Cloud computing, often referred to as simply “the cloud,” is the delivery of on-demand computing resources; everything from applications to data centers over the Internet. Cloud is used not only for storing data, but also the stored data can be shared by multiple users. Due to this, the integrity of cloud data is subject to doubt. Every time it is not possible for user to download all data and verify integrity, so proposed system contain Third Party Auditor (TPA) to verify the integrity of shared data. During auditing, the shared data is kept private from public verifiers, who are able to verify shared data integrity without downloading or retrieving the entire data file. Group signature is used to preserve identity privacy of group members from third party auditor. Privacy preserving is done to ensure that the TPA cannot derive user's data content from the information collected during the auditing process.
In the present time, there has been a huge increase in large data repositories by corporations, governments, and healthcare organizations. These repositories provide opportunities to design/improve decision-making systems by mining trends and patterns from the data set (that can provide credible information) to improve customer service (e.g., in healthcare). As a result, while data sharing is essential, it is an obligation to maintaining the privacy of the data donors as data custodians have legal and ethical responsibilities to secure confidentiality. This research proposes a 2-layer privacy preserving (2-LPP) data sanitization algorithm that satisfies ε-differential privacy for publishing sanitized data. The proposed algorithm also reduces the re-identification risk of the sanitized data. The proposed algorithm has been implemented, and tested with two different data sets. Compared to other existing works, the results obtained from the proposed algorithm show promising performance.
Privacy and security have been discussed in many occasions and in most cases, the importance that these two aspects play on the information system domain are mentioned often. Many times, research is carried out on the individual information security or privacy measures where it is commonly regarded with the focus on the particular measure or both privacy and security are regarded as a whole subject. However, there have been no attempts at establishing a proper method in categorizing any form of objects of protection. Through the review done on this paper, we would like to investigate the relationship between privacy and security and form a break down the aspects of privacy and security in order to provide better understanding through determining if a measure or methodology is security, privacy oriented or both. We would recommend that in further research, a further refined formulation should be formed in order to carry out this determination process. As a result, we propose a Privacy-Security Tree (PST) in this paper that distinguishes the privacy from security measures.
After a brief introduction on optical chaotic cryptography, we compare the standard short cavity, close-loop, two-laser and three-laser schemes for secure transmission, showing that both are suitable for secure data exchange, the three-laser scheme offering a slightly better level of privacy, due to its symmetrical topology.