Biblio
Hadoop is developed as a distributed data processing platform for analyzing big data. Enterprises can analyze big data containing users' sensitive information by using Hadoop and utilize them for their marketing. Therefore, researches on data encryption have been widely done to protect the leakage of sensitive data stored in Hadoop. However, the existing researches support only the AES international standard data encryption algorithm. Meanwhile, the Korean government selected ARIA algorithm as a standard data encryption scheme for domestic usages. In this paper, we propose a HDFS data encryption scheme which supports both ARIA and AES algorithms on Hadoop. First, the proposed scheme provides a HDFS block-splitting component that performs ARIA/AES encryption and decryption under the Hadoop distributed computing environment. Second, the proposed scheme provides a variable-length data processing component that can perform encryption and decryption by adding dummy data, in case when the last data block does not contains 128-bit data. Finally, we show from performance analysis that our proposed scheme is efficient for various applications, such as word counting, sorting, k-Means, and hierarchical clustering.
Distributed data aggregation via summation (counting) helped us to learn the insights behind the raw data. However, such computing suffered from a high privacy risk of malicious collusion attacks. That is, the colluding adversaries infer a victim's privacy from the gaps between the aggregation outputs and their source data. Among the solutions against such collusion attacks, Distributed Differential Privacy (DDP) shows a significant effect of privacy preservation. Specifically, a DDP scheme guarantees the global differential privacy (the presence or absence of any data curator barely impacts the aggregation outputs) by ensuring local differential privacy at the end of each data curator. To guarantee an overall privacy performance of a distributed data aggregation system against malicious collusion attacks, part of the existing work on such DDP scheme aim to provide an estimated lower bound of privacy budget for the global differential privacy. However, there are two main problems: low data utility from using a large global function sensitivity; unknown privacy guarantee when the aggregation sensitivity of the whole system is less than the sum of the data curator's aggregation sensitivity. To address these problems while ensuring distributed differential privacy, we provide a new lower bound of privacy budget, which works with an unconditional aggregation sensitivity of the whole distributed system. Moreover, we study the performance of our privacy bound in different scenarios of data updates. Both theoretical and experimental evaluations show that our privacy bound offers better global privacy performance than the existing work.
The blockchain technology has emerged as an attractive solution to address performance and security issues in distributed systems. Blockchain's public and distributed peer-to-peer ledger capability benefits cloud computing services which require functions such as, assured data provenance, auditing, management of digital assets, and distributed consensus. Blockchain's underlying consensus mechanism allows to build a tamper-proof environment, where transactions on any digital assets are verified by set of authentic participants or miners. With use of strong cryptographic methods, blocks of transactions are chained together to enable immutability on the records. However, achieving consensus demands computational power from the miners in exchange of handsome reward. Therefore, greedy miners always try to exploit the system by augmenting their mining power. In this paper, we first discuss blockchain's capability in providing assured data provenance in cloud and present vulnerabilities in blockchain cloud. We model the block withholding (BWH) attack in a blockchain cloud considering distinct pool reward mechanisms. BWH attack provides rogue miner ample resources in the blockchain cloud for disrupting honest miners' mining efforts, which was verified through simulations.
Data security has become an issue of increasing importance, especially for Web applications and distributed databases. One solution is using cryptographic algorithms whose improvement has become a constant concern. The increasing complexity of these algorithms involves higher execution times, leading to an application performance decrease. This paper presents a comparison of execution times for three algorithms using asymmetric keys, depending on the size of the encryption/decryption keys: RSA, ElGamal, and ECIES. For this algorithms comparison, a benchmark using Java APIs and an application for testing them on a test database was created.
Service composition is currently done by (hierarchical) orchestration and choreography. However, these approaches do not support explicit control flow and total compositionality, which are crucial for the scalability of service-oriented systems. In this paper, we propose exogenous connectors for service composition. These connectors support both explicit control flow and total compositionality in hierarchical service composition. To validate and evaluate our proposal, we present a case study based on the popular MusicCorp.
This paper proposes a novel privacy-preserving smart metering system for aggregating distributed smart meter data. It addresses two important challenges: (i) individual users wish to publish sensitive smart metering data for specific purposes, and (ii) an untrusted aggregator aims to make queries on the aggregate data. We handle these challenges using two main techniques. First, we propose Fourier Perturbation Algorithm (FPA) and Wavelet Perturbation Algorithm (WPA) which utilize Fourier/Wavelet transformation and distributed differential privacy (DDP) to provide privacy for the released statistic with provable sensitivity and error bounds. Second, we leverage an exponential ElGamal encryption mechanism to enable secure communications between the users and the untrusted aggregator. Standard differential privacy techniques perform poorly for time-series data as it results in a Θ(n) noise to answer n queries, rendering the answers practically useless if n is large. Our proposed distributed differential privacy mechanism relies on Gaussian principles to generate distributed noise, which guarantees differential privacy for each user with O(1) error, and provides computational simplicity and scalability. Compared with Gaussian Perturbation Algorithm (GPA) which adds distributed Gaussian noise to the original data, the experimental results demonstrate the superiority of the proposed FPA and WPA by adding noise to the transformed coefficients.
We propose a privacy-preserving framework for learning visual classifiers by leveraging distributed private image data. This framework is designed to aggregate multiple classifiers updated locally using private data and to ensure that no private information about the data is exposed during and after its learning procedure. We utilize a homomorphic cryptosystem that can aggregate the local classifiers while they are encrypted and thus kept secret. To overcome the high computational cost of homomorphic encryption of high-dimensional classifiers, we (1) impose sparsity constraints on local classifier updates and (2) propose a novel efficient encryption scheme named doublypermuted homomorphic encryption (DPHE) which is tailored to sparse high-dimensional data. DPHE (i) decomposes sparse data into its constituent non-zero values and their corresponding support indices, (ii) applies homomorphic encryption only to the non-zero values, and (iii) employs double permutations on the support indices to make them secret. Our experimental evaluation on several public datasets shows that the proposed approach achieves comparable performance against state-of-the-art visual recognition methods while preserving privacy and significantly outperforms other privacy-preserving methods.
With the development of modern logistics industry railway freight enterprises as the main traditional logistics enterprises, the service mode is facing many problems. In the era of big data, for railway freight enterprises, coordinated development and sharing of information resources have become the requirements of the times, while how to protect the privacy of citizens has become one of the focus issues of the public. To prevent the disclosure or abuse of the citizens' privacy information, the citizens' privacy needs to be preserved in the process of information opening and sharing. However, most of the existing privacy preserving models cannot to be used to resist attacks with continuously growing background knowledge. This paper presents the method of applying differential privacy to protect associated data, which can be shared in railway freight service association information. First, the original service data need to slice by optimal shard length, then differential method and apriori algorithm is used to add Laplace noise in the Candidate sets. Thus the citizen's privacy information can be protected even if the attacker gets strong background knowledge. Last, sharing associated data to railway information resource partners. The steps and usefulness of the discussed privacy preservation method is illustrated by an example.
Currently, the networking of everyday objects, socalled Internet of Things (IoT), such as vehicles and home automation environments is progressing rapidly. Formerly deployed as domain-specific solutions, the development is continuing to link different domains together to form a large heterogeneous IoT ecosystem. This development raises challenges in different fields such as scalability of billions of devices, interoperability across different IoT domains and the need of mobility support. The Information-Centric Networking (ICN) paradigm is a promising candidate to form a unified platform to connect different IoT domains together including infrastructure, wireless, and ad-hoc environments. This paper describes a vision of a harmonized architectural design providing dynamic access of data and services based on an ICN. Within the context of connected vehicles, the paper introduces requirements and challenges of the vision and contributes in open research directions in Information-Centric Networking.
We propose secure RAID, i.e., low-complexity schemes to store information in a distributed manner that is resilient to node failures and resistant to node eavesdropping. We generalize the concept of systematic encoding to secure RAID and show that systematic schemes have significant advantages in the efficiencies of encoding, decoding and random access. For the practical high rate regime, we construct three XOR-based systematic secure RAID schemes with optimal encoding and decoding complexities, from the EVENODD codes and B codes, which are array codes widely used in the RAID architecture. These schemes optimally tolerate two node failures and two eavesdropping nodes. For more general parameters, we construct efficient systematic secure RAID schemes from Reed-Solomon codes. Our results suggest that building “keyless”, information-theoretic security into the RAID architecture is practical.
In this paper, we introduce the use of adaptive controllers into software-defined networking (SDN) and propose the use of adaptive consistency models in the context of distributed SDN controllers. These adaptive controllers can tune their own configurations in real-time in order to enhance the performance of the applications running on top of them. We expect that the use of such controllers could alleviate some of the emerging challenges in SDN that could have an impact on the performance, security, or scalability of the network. Further, we propose extending the SDN controller architecture to support adaptive consistency based on tunable consistency models. Finally, we compare the performance of a proof-of-concept distributed load-balancing application when it runs on-top of: (1) an adaptive and (2) a non-adaptive controller. Our results indicate that adaptive controllers were more resilient to sudden changes in the network conditions than the non-adaptive ones.
Sharing and working on sensitive data in distributed settings from healthcare to finance is a major challenge due to security and privacy concerns. Secure multiparty computation (SMC) is a viable panacea for this, allowing distributed parties to make computations while the parties learn nothing about their data, but the final result. Although SMC is instrumental in such distributed settings, it does not provide any guarantees not to leak any information about individuals to adversaries. Differential privacy (DP) can be utilized to address this; however, achieving SMC with DP is not a trivial task, either. In this paper, we propose a novel Secure Multiparty Distributed Differentially Private (SM-DDP) protocol to achieve secure and private computations in a multiparty environment. Specifically, with our protocol, we simultaneously achieve SMC and DP in distributed settings focusing on linear regression on horizontally distributed data. That is, parties do not see each others’ data and further, can not infer information about individuals from the final constructed statistical model. Any statistical model function that allows independent calculation of local statistics can be computed through our protocol. The protocol implements homomorphic encryption for SMC and functional mechanism for DP to achieve the desired security and privacy guarantees. In this work, we first introduce the theoretical foundation for the SM-DDP protocol and then evaluate its efficacy and performance on two different datasets. Our results show that one can achieve individual-level privacy through the proposed protocol with distributed DP, which is independently applied by each party in a distributed fashion. Moreover, our results also show that the SM-DDP protocol incurs minimal computational overhead, is scalable, and provides security and privacy guarantees.
The Internet of Things (IoT) era envisions billions of interconnected devices capable of providing new interactions between the physical and digital worlds, offering new range of content and services. At the fundamental level, IoT nodes are physical devices that exist in the real world, consisting of networking, sensor, and processing components. Some application examples include mobile and pervasive computing or sensor nets, and require distributed device deployment that feed information into databases for exploitation. While the data can be centralized, there are advantages, such as system resiliency and security to adopting a decentralized architecture that pushes the computation and storage to the network edge and onto IoT devices. However, these devices tend to be much more limited in computation power than traditional racked servers. This research explores using the Cassandra distributed database on IoT-representative device specifications. Experiments conducted on both virtual machines and Raspberry Pi's to simulate IoT devices, examined latency issues with network compression, processing workloads, and various memory and node configurations in laboratory settings. We demonstrate that distributed databases are feasible on Raspberry Pi's as IoT representative devices and show findings that may help in application design.
In this paper, the correctness of the routing algorithm for the distributed key-value store based on order preserving linear hashing and Skip Graph is proved. In this system, data are divided by linear hashing and Skip Graph is used for overlay network. The routing table of this system is very uniform. Then, short detours can exist in the route of forwarding. By using these detours, the number of hops for the query forwarding is reduced.
Data assurance and resilience are crucial security issues in cloud-based IoT applications. With the widespread adoption of drones in IoT scenarios such as warfare, agriculture and delivery, effective solutions to protect data integrity and communications between drones and the control system have been in urgent demand to prevent potential vulnerabilities that may cause heavy losses. To secure drone communication during data collection and transmission, as well as preserve the integrity of collected data, we propose a distributed solution by utilizing blockchain technology along with the traditional cloud server. Instead of registering the drone itself to the blockchain, we anchor the hashed data records collected from drones to the blockchain network and generate a blockchain receipt for each data record stored in the cloud, reducing the burden of moving drones with the limit of battery and process capability while gaining enhanced security guarantee of the data. This paper presents the idea of securing drone data collection and communication in combination with a public blockchain for provisioning data integrity and cloud auditing. The evaluation shows that our system is a reliable and distributed system for drone data assurance and resilience with acceptable overhead and scalability for a large number of drones.
The Internet of Things(IoT) has become a popular technology, and various middleware has been proposed and developed for IoT systems. However, there have been few studies on the data management of IoT systems. In this paper, we consider graph database models for the data management of IoT systems because these models can specify relationships in a straightforward manner among entities such as devices, users, and information that constructs IoT systems. However, applying a graph database to the data management of IoT systems raises issues regarding distribution and security. For the former issue, we propose graph database operations integrated with REST APIs. For the latter, we extend a graph edge property by adding access protocol permissions and checking permissions using the APIs with authentication. We present the requirements for a use case scenario in addition to the features of a distributed graph database for IoT data management to solve the aforementioned issues, and implement a prototype of the graph database.
Now-a-days for most of the organizations across the globe, two important IT initiatives are: Big Data Analytics and Cloud Computing. Big Data Analytics can provide valuables insight that can create competitiveness and generate increased revenues. Cloud Computing can enhance productivity and efficiencies thus reducing cost. Cloud Computing offers groups of servers, storages and various networking resources. It enables environment of Big Data to processes voluminous, high velocity and varied formats of Big Data.
Sharing cyber security data across organizational boundaries brings both privacy risks in the exposure of personal information and data, and organizational risk in disclosing internal information. These risks occur as information leaks in network traffic or logs, and also in queries made across organizations. They are also complicated by the trade-offs in privacy preservation and utility present in anonymization to manage disclosure. In this paper, we define three principles that guide sharing security information across organizations: Least Disclosure, Qualitative Evaluation, and Forward Progress. We then discuss engineering approaches that apply these principles to a distributed security system. Application of these principles can reduce the risk of data exposure and help manage trust requirements for data sharing, helping to meet our goal of balancing privacy, organizational risk, and the ability to better respond to security with shared information.
Now a day's cloud computing is power station to run multiple businesses. It is cumulating more and more users every day. Database-as-a-service is service model provided by cloud computing to store, manage and process data on a cloud platform. Database-as-a-service has key characteristics such as availability, scalability, elasticity. A customer does not have to worry about database installation and management. As a replacement, the cloud database service provider takes responsibility for installing and maintaining the database. The real problem occurs when it comes to storing confidential or private information in the cloud database, we cannot rely on the cloud data vendor. A curious cloud database vendor may capture and leak the secret information. For that purpose, Protected Database-as-a-service is a novel solution to this problem that provides provable and pragmatic privacy in the face of a compromised cloud database service provider. Protected Database-as-a-service defines various encryption schemes to choose encryption algorithm and encryption key to encrypt and decrypt data. It also provides "Master key" to users, so that a metadata storage table can be decrypted only by using the master key of the users. As a result, a cloud service vendor never gets access to decrypted data, and even if all servers are jeopardized, in such inauspicious circumstances a cloud service vendor will not be able to decrypt the data. Proposed Protected Database-as-a-service system allows multiple geographically distributed clients to execute concurrent and independent operation on encrypted data and also conserve data confidentiality and consistency at cloud level, to eradicate any intermediate server between the client and the cloud database.
The data processing capabilities of MapReduce systems pioneered with the on-demand scalability of cloud computing have enabled the Big Data revolution. However, the data controllers/owners worried about the privacy and accountability impact of storing their data in the cloud infrastructures as the existing cloud computing solutions provide very limited control on the underlying systems. The intuitive approach - encrypting data before uploading to the cloud - is not applicable to MapReduce computation as the data analytics tasks are ad-hoc defined in the MapReduce environment using general programming languages (e.g, Java) and homomorphic encryption methods that can scale to big data do not exist. In this paper, we address the challenges of determining and detecting unauthorized access to data stored in MapReduce based cloud environments. To this end, we introduce alarm raising honeypots distributed over the data that are not accessed by the authorized MapReduce jobs, but only by the attackers and/or unauthorized users. Our analysis shows that unauthorized data accesses can be detected with reasonable performance in MapReduce based cloud environments.
The new era of information communication and technology (ICT), everyone wants to store/share their Data or information in online media, like in cloud database, mobile database, grid database, drives etc. When the data is stored in online media the main problem is arises related to data is privacy because different types of hacker, attacker or crackers wants to disclose their private information as publically. Security is a continuous process of protecting the data or information from attacks. For securing that information from those kinds of unauthorized people we proposed and implement of one the technique based on the data modification concept with taking the iris database on weka tool. And this paper provides the high privacy in distributed clustered database environments.
In this paper, we present the design, architecture, and implementation of a novel analysis engine, called Feature Collection and Correlation Engine (FCCE), that finds correlations across a diverse set of data types spanning over large time windows with very small latency and with minimal access to raw data. FCCE scales well to collecting, extracting, and querying features from geographically distributed large data sets. FCCE has been deployed in a large production network with over 450,000 workstations for 3 years, ingesting more than 2 billion events per day and providing low latency query responses for various analytics. We explore two security analytics use cases to demonstrate how we utilize the deployment of FCCE on large diverse data sets in the cyber security domain: 1) detecting fluxing domain names of potential botnet activity and identifying all the devices in the production network querying these names, and 2) detecting advanced persistent threat infection. Both evaluation results and our experience with real-world applications show that FCCE yields superior performance over existing approaches, and excels in the challenging cyber security domain by correlating multiple features and deriving security intelligence.