Biblio
Big data provides a way to handle and analyze large amount of data or complex set. It provides a systematic extraction also. In this paper a hybrid security analysis based on intelligent adaptive learning in big data has been discussed with the current trends. This paper also explores the possibility of cloud computing collaboration with big data. The advantages along with the impact for the overall platform evaluation has been discussed with the traditional trends. It has been useful in the analysis and the exploration of future research. This discussion also covers the computational variability and the connotation in terms of data reliability, availability and management in big data with data security aspects.
Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.
For future Internet, information-centric networking (ICN) is considered a potential solution to many of its current problems, such as content distribution, mobility, and security. Named Data Networking (NDN) is a more popular ICN project. However, concern regarding the protection of user data persists. Information caching in NDN decouples content and content publishers, which leads to content security threats due to lack of secure controls. Therefore, this paper presents a CP-ABE (ciphertext policy attribute based encryption) access control scheme based on hash table and data segmentation (CHTDS). Based on data segmentation, CHTDS uses a method of linearly splitting fixed data blocks, which effectively improves data management. CHTDS also introduces CP-ABE mechanism and hash table data structure to ensure secure access control and privilege revocation does not need to re-encrypt the published content. The analysis results show that CHTDS can effectively realize the security and fine-grained access control in the NDN environment, and reduce communication overhead for content access.
In recent trends, privacy preservation is the most predominant factor, on big data analytics and cloud computing. Every organization collects personal data from the users actively or passively. Publishing this data for research and other analytics without removing Personally Identifiable Information (PII) will lead to the privacy breach. Existing anonymization techniques are failing to maintain the balance between data privacy and data utility. In order to provide a trade-off between the privacy of the users and data utility, a Mondrian based k-anonymity approach is proposed. To protect the privacy of high-dimensional data Deep Neural Network (DNN) based framework is proposed. The experimental result shows that the proposed approach mitigates the information loss of the data without compromising privacy.
Supply chain management (SCM) is fundamental for gaining financial, environmental and social benefits in the supply chain industry. However, traditional SCM mechanisms usually suffer from a wide scope of issues such as lack of information sharing, long delays for data retrieval, and unreliability in product tracing. Recent advances in blockchain technology show great potential to tackle these issues due to its salient features including immutability, transparency, and decentralization. Although there are some proof-of-concept studies and surveys on blockchain-based SCM from the perspective of logistics, the underlying technical challenges are not clearly identified. In this paper, we provide a comprehensive analysis of potential opportunities, new requirements, and principles of designing blockchain-based SCM systems. We summarize and discuss four crucial technical challenges in terms of scalability, throughput, access control, data retrieval and review the promising solutions. Finally, a case study of designing blockchain-based food traceability system is reported to provide more insights on how to tackle these technical challenges in practice.
The paper presents a conceptual framework for security embedded task offloading requirements for IoT-Fog based future communication networks. The focus of the paper is to enumerate the need of embedded security requirements in this IoT-Fog paradigm including the middleware technologies in the overall architecture. Task offloading plays a significant role in the load balancing, energy and data management, security, reducing information processing and propagation latencies. The motivation behind introducing the embedded security is to meet the challenges of future smart networks including two main reasons namely; to improve the data protection and to minimize the internet disturbance and intrusiveness. We further discuss the middleware technologies such as cloudlets, mobile edge computing, micro datacenters, self-healing infrastructures and delay tolerant networks for security provision, optimized energy consumption and to reduce the latency. The paper introduces concepts of system virtualization and parallelism in IoT-Fog based systems and highlight the security features of the system. Some research opportunities and challenges are discussed to improve secure offloading from IoT into fog.
The Internet of Things is stepping out of its infancy into full maturity, requiring massive data processing and storage. Unfortunately, because of the unique characteristics of resource constraints, short-range communication, and self-organization in IoT, it always resorts to the cloud or fog nodes for outsourced computation and storage, which has brought about a series of novel challenging security and privacy threats. For this reason, one of the critical challenges of having numerous IoT devices is the capacity to manage them and their data. A specific concern is from which devices or Edge clouds to accept join requests or interaction requests. This paper discusses a design concept for developing the IoT data management platform, along with a data management and lineage traceability implementation of the platform based on blockchain and smart contracts, which approaches the two major challenges: how to implement effective data management and enrich rational interoperability for trusted groups of linked Things; And how to settle conflicts between untrusted IoT devices and its requests taking into account security and privacy preserving. Experimental results show that the system scales well with the loss of computing and communication performance maintaining within the acceptable range, works well to effectively defend against unauthorized access and empower data provenance and transparency, which verifies the feasibility and efficiency of the design concept to provide privacy, fine-grained, and integrity data management over the IoT devices by introducing the blockchain-based data management platform.
The Agave Platform first appeared in 2011 as a pilot project for the iPlant Collaborative [11]. In its first two years, Foundation saw over 40% growth per month, supporting 1000+ clients, 600+ applications, 4 HPC systems at 3 centers across the US. It also gained users outside of plant biology. To better serve the needs of the general open science community, we rewrote Foundation as a scalable, cloud native application and named it the Agave Platform. In this paper we present the Agave Platform, a Science-as-a-Service (ScaaS) platform for reproducible science. We provide a brief history and technical overview of the project, and highlight three case studies leveraging the platform to create synergistic value for their users.
In blockchain-based systems, malicious behaviour can be detected using auditable information in transactions managed by distributed ledgers. Besides cryptocurrency, blockchain technology has recently been used for other applications, such as file storage. However, most of existing blockchain- based file storage systems can not revoke a user efficiently when multiple users have access to the same file that is encrypted. Actually, they need to update file encryption keys and distribute new keys to remaining users, which significantly increases computation and bandwidth overheads. In this work, we propose a blockchain and proxy re-encryption based design for encrypted file sharing that brings a distributed access control and data management. By combining blockchain with proxy re-encryption, our approach not only ensures confidentiality and integrity of files, but also provides a scalable key management mechanism for file sharing among multiple users. Moreover, by storing encrypted files and related keys in a distributed way, our method can resist collusion attacks between revoked users and distributed proxies.
Consent is a key measure for privacy protection and needs to be `meaningful' to give people informational power. It is increasingly important that individuals are provided with real choices and are empowered to negotiate for meaningful consent. Meaningful consent is an important area for consideration in IoT systems since privacy is a significant factor impacting on adoption of IoT. Obtaining meaningful consent is becoming increasingly challenging in IoT environments. It is proposed that an ``apparency, pragmatic/semantic transparency model'' adopted for data management could make consent more meaningful, that is, visible, controllable and understandable. The model has illustrated the why and what issues regarding data management for potential meaningful consent [1]. In this paper, we focus on the `how' issue, i.e. how to implement the model in IoT systems. We discuss apparency by focusing on the interactions and data actions in the IoT system; pragmatic transparency by centring on the privacy risks, threats of data actions; and semantic transparency by focusing on the terms and language used by individuals and the experts. We believe that our discussion would elicit more research on the apparency model' in IoT for meaningful consent.
Existing data management and searching system for Internet of Things uses centralized database. For this reason, security vulnerabilities are found in this system which consists of server such as IP spoofing, single point of failure and Sybil attack. This paper proposes data management system is based on blockchain which ensures security by using ECDSA digital signature and SHA-256 hash function. Location that is indicated as IP address of data owner and data name are transcribed in block which is included in the blockchain. Furthermore, we devise data manegement and searching method through analyzing block hash value. By using security properties of blockchain such as authentication, non-repudiation and data integrity, this system has advantage of security comparing to previous data management and searching system using centralized database or P2P networks.
Analytics in big data is maturing and moving towards mass adoption. The emergence of analytics increases the need for innovative tools and methodologies to protect data against privacy violation. Many data anonymization methods were proposed to provide some degree of privacy protection by applying data suppression and other distortion techniques. However, currently available methods suffer from poor scalability, performance and lack of framework standardization. Current anonymization methods are unable to cope with the massive size of data processing. Some of these methods were especially proposed for MapReduce framework to operate in Big Data. However, they still operate in conventional data management approaches. Therefore, there were no remarkable gains in the performance. We introduce a framework that can operate in MapReduce environment to benefit from its advantages, as well as from those in Hadoop ecosystems. Our framework provides a granular user's access that can be tuned to different authorization levels. The proposed solution provides a fine-grained alteration based on the user's authorization level to access MapReduce domain for analytics. Using well-developed role-based access control approaches, this framework is capable of assigning roles to users and map them to relevant data attributes.
Cloud computing enables the outsourcing of big data analytics, where a third-party server is responsible for data management and processing. In this paper, we consider the outsourcing model in which a third-party server provides record matching as a service. In particular, given a target record, the service provider returns all records from the outsourced dataset that match the target according to specific distance metrics. Identifying matching records in databases plays an important role in information integration and entity resolution. A major security concern of this outsourcing paradigm is whether the service provider returns the correct record matching results. To solve the problem, we design EARRING, an Efficient Authentication of outsouRced Record matchING framework. EARRING requires the service provider to construct the verification object (VO) of the record matching results. From the VO, the client is able to catch any incorrect result with cheap computational cost. Experiment results on real-world datasets demonstrate the efficiency of EARRING.
Provenance describes detailed information about the history of a piece of data, containing the relationships among elements such as users, processes, jobs, and workflows that contribute to the existence of data. Provenance is key to supporting many data management functionalities that are increasingly important in operations such as identifying data sources, parameters, or assumptions behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. Despite its importance, however, provenance support is largely underdeveloped in highly parallel architectures and systems. One major challenge is the demanding requirements of providing provenance service in situ. The need to remain lightweight and to be always on often conflicts with the need to be transparent and offer an accurate catalog of details regarding the applications and systems. To tackle this challenge, we introduce a lightweight provenance service, called LPS, for high-performance computing (HPC) systems. LPS leverages a kernel instrument mechanism to achieve transparency and introduces representative execution and flexible granularity to capture comprehensive provenance with controllable overhead. Extensive evaluations and use cases have confirmed its efficiency and usability. We believe that LPS can be integrated into current and future HPC systems to support a variety of data management needs.
The Internet of Things(IoT) has become a popular technology, and various middleware has been proposed and developed for IoT systems. However, there have been few studies on the data management of IoT systems. In this paper, we consider graph database models for the data management of IoT systems because these models can specify relationships in a straightforward manner among entities such as devices, users, and information that constructs IoT systems. However, applying a graph database to the data management of IoT systems raises issues regarding distribution and security. For the former issue, we propose graph database operations integrated with REST APIs. For the latter, we extend a graph edge property by adding access protocol permissions and checking permissions using the APIs with authentication. We present the requirements for a use case scenario in addition to the features of a distributed graph database for IoT data management to solve the aforementioned issues, and implement a prototype of the graph database.
The evolution of the Internet of Things (IoT) requires a well-defined infrastructure of systems that provides services for device abstraction and data management, and also supports the development of applications. Middleware for IoT has been recognized as the system that can provide these services and has become increasingly important for IoT in recent years. The large amount of data that flows into a middleware system demands a security architecture that ensures the protection of all layers of the system, including the communication channels and border APIs used to integrate the applications and IoT devices. However, this security architecture should be based on lightweight approaches since middleware systems are widely applied in constrained environments. Some works have already defined new solutions and adaptations to existing approaches in order to mitigate IoT middleware security problems. In this sense, this article discusses the role of lightweight approaches to the standardization of a security architecture for IoT middleware systems. This article also analyzes concepts and existing works, and presents some important IoT middleware challenges that may be addressed by emerging lightweight security approaches in order to achieve the consolidation of a standard security architecture and the mitigation of the security problems found in IoT middleware systems.