Biblio
Cloud Storage Brokers (CSB) provide seamless and concurrent access to multiple Cloud Storage Services (CSS) while abstracting cloud complexities from end-users. However, this multi-cloud strategy faces several security challenges including enlarged attack surfaces, malicious insider threats, security complexities due to integration of disparate components and API interoperability issues. Novel security approaches are imperative to tackle these security issues. Therefore, this paper proposes CS-BAuditor, a novel cloud security system that continuously audits CSB resources, to detect malicious activities and unauthorized changes e.g. bucket policy misconfigurations, and remediates these anomalies. The cloud state is maintained via a continuous snapshotting mechanism thereby ensuring fault tolerance. We adopt the principles of chaos engineering by integrating BrokerMonkey, a component that continuously injects failure into our reference CSB system, CloudRAID. Hence, CSBAuditor is continuously tested for efficiency i.e. its ability to detect the changes injected by BrokerMonkey. CSBAuditor employs security metrics for risk analysis by computing severity scores for detected vulnerabilities using the Common Configuration Scoring System, thereby overcoming the limitation of insufficient security metrics in existing cloud auditing schemes. CSBAuditor has been tested using various strategies including chaos engineering failure injection strategies. Our experimental evaluation validates the efficiency of our approach against the aforementioned security issues with a detection and recovery rate of over 96 %.
Cyber physical system (CPS) is often deployed at safety-critical key infrastructures and fields, fault tolerance policies are extensively applied in CPS systems to improve its credibility; the same physical backup of hardware redundancy (SPB) technology is frequently used for its simple and reliable implementation. To resolve challenges faced with in simulation test of SPB-CPS, this paper dynamically determines the test resources matched with the CPS scale by using the adaptive allocation policies, establishes the hierarchical models and inter-layer message transmission mechanism. Meanwhile, the collaborative simulation time sequence push strategy and the node activity test mechanism based on the sliding window are designed in this paper to improve execution efficiency of the simulation test. In order to validate effectiveness of the method proposed in this paper, we successfully built up a fault-tolerant CPS simulation platform. Experiments showed that it can improve the SPB-CPS simulation test efficiency.
Delivery service via ridesharing is a promising service to share travel costs and improve vehicle occupancy. Existing ridesharing systems require participating vehicles to periodically report individual private information (e.g., identity and location) to a central controller, which is a potential central point of failure, resulting in possible data leakage or tampering in case of controller break down or under attack. In this paper, we propose a Blockchain secured ridesharing delivery system, where the immutability and distributed architecture of the Blockchain can effectively prevent data tampering. However, such tamper-resistance property comes at the cost of a long confirmation delay caused by the consensus process. A Hash-oriented Practical Byzantine Fault Tolerance (PBFT) based consensus algorithm is proposed to improve the Blockchain efficiency and reduce the transaction confirmation delay from 10 minutes to 15 seconds. The Hash-oriented PBFT effectively avoids the double-spending attack and Sybil attack. Security analysis and simulation results demonstrate that the proposed Blockchain secured ridesharing delivery system offers strong security guarantees and satisfies the quality of delivery service in terms of confirmation delay and transaction throughput.
Model compression is considered to be an effective way to reduce the implementation cost of deep neural networks (DNNs) while maintaining the inference accuracy. Many recent studies have developed efficient model compression algorithms and implementations in accelerators on various devices. Protecting integrity of DNN inference against fault attacks is important for diverse deep learning enabled applications. However, there has been little research investigating the fault resilience of DNNs and the impact of model compression on fault tolerance. In this work, we consider faults on different data types and develop a simulation framework for understanding the fault resiliency of compressed DNN models as compared to uncompressed models. We perform our experiments on two common DNNs, LeNet-5 and VGG16, and evaluate their fault resiliency with different types of compression. The results show that binary quantization can effectively increase the fault resilience of DNN models by 10000x for both LeNet5 and VGG16. Finally, we propose software and hardware mitigation techniques to increase the fault resiliency of DNN models.
Data analytics and telemetry have become paramount to monitoring and maintaining quality-of-service in addition to business analytics. Stream processing-a model where a network of operators receives and processes continuously arriving discrete elements-is well-suited for these needs. Current and previous studies and frameworks have focused on continuity of operations and aggregate performance metrics. However, real-time performance and tail latency are also important. Timing errors caused by either performance or failed communication faults also affect real-time performance more drastically than aggregate metrics. In this paper, we introduce redundancy in the stream data to improve the real-time performance and resiliency to timing errors caused by either performance or failed communication faults. We also address limitations in previous solutions using a fine-grained acknowledgment tracking scheme to both increase the effectiveness for resiliency to performance faults and enable effectiveness for failed communication faults. Our results show that fine-grained acknowledgment schemes can improve the tail and mean latencies by approximately 30%. We also show that these schemes can improve resiliency to performance faults compared to existing work. Our improvements result in 47.4% to 92.9% fewer missed deadlines compared to 17.3% to 50.6% for comparable topologies and redundancy levels in the state of the art. Finally, we show that redundancies of 25% to 100% can reduce the number of data elements that miss their deadline constraints by 0.76% to 14.04% for applications with high fan-out and by 7.45% up to 50% for applications with no fan-out.
The evolution of convolutional neural networks (CNNs) into more complex forms of organization, with additional layers, larger convolutions and increasing connections, established the state-of-the-art in terms of accuracy errors for detection and classification challenges in images. Moreover, as they evolved to a point where Gigabytes of memory are required for their operation, we have reached a stage where it becomes fundamental to understand how their inference capabilities can be impaired if data elements somehow become corrupted in memory. This paper introduces fault-injection in these systems by simulating failing bit-cells in hardware memories brought on by relaxing the 100% reliable operation assumption. We analyze the behavior of these networks calculating inference under severe fault-injection rates and apply fault mitigation strategies to improve on the CNNs resilience. For the MNIST dataset, we show that 8x less memory is required for the feature maps memory space, and that in sub-100% reliable operation, fault-injection rates up to 10-1 (with most significant bit protection) can withstand only a 1% error probability degradation. Furthermore, considering the offload of the feature maps memory to an embedded dynamic RAM (eDRAM) system, using technology nodes from 65 down to 28 nm, up to 73 80% improved power efficiency can be obtained.
Root cause analysis (RCA) is a common and recurring task performed by operators of cellular networks. It is done mainly to keep customers satisfied with the quality of offered services and to maximize return on investment (ROI) by minimizing and where possible eliminating the root causes of faults in cellular networks. Currently, the actual detection and diagnosis of faults or potential faults is still a manual and slow process often carried out by network experts who manually analyze and correlate various pieces of network data such as, alarms, call traces, configuration management (CM) and key performance indicator (KPI) data in order to come up with the most probable root cause of a given network fault. In this paper, we propose an automated fault detection and diagnosis solution called adaptive root cause analysis (ARCA). The solution uses measurements and other network data together with Bayesian network theory to perform automated evidence based RCA. Compared to the current common practice, our solution is faster due to automation of the entire RCA process. The solution is also cheaper because it needs fewer or no personnel in order to operate and it improves efficiency through domain knowledge reuse during adaptive learning. As it uses a probabilistic Bayesian classifier, it can work with incomplete data and it can handle large datasets with complex probability combinations. Experimental results from stratified synthesized data affirmatively validate the feasibility of using such a solution as a key part of self-healing (SH) especially in emerging self-organizing network (SON) based solutions in LTE Advanced (LTE-A) and 5G.
We will focused the concept of serializability in order to ensure the correct processing of transactions. However, both serializability and relevant properties within transaction-based applications might be affected. Ensure transaction serialization in corrupt systems is one of the demands that can handle properly interrelated transactions, which prevents blocking situations that involve the inability to commit either transaction or related sub-transactions. In addition some transactions has been marked as malicious and they compromise the serialization of running system. In such context, this paper proposes an approach for the processing of transactions in a cloud of databases environment able to secure serializability in running transactions whether the system is compromised or not. We propose also an intrusion tolerant scheme to ensure the continuity of the running transactions. A case study and a simulation result are shown to illustrate the capabilities of the suggested system.
Cloud computation has become prominent with seemingly unlimited amount of storage and computation available to users. Yet, security is a major issue that hampers the growth of cloud. In this research we investigate a collaborative Intrusion Detection System (IDS) based on the ensemble learning method. It uses weak classifiers, and allows the use of untapped resources of cloud to detect various types of attacks on the cloud system. In the proposed system, tasks are distributed among available virtual machines (VM), individual results are then merged for the final adaptation of the learning model. Performance evaluation is carried out using decision trees and using fuzzy classifiers, on KDD99, one of the largest datasets for IDS. Segmentation of the dataset is done in order to mimic the behavior of real-time data traffic occurred in a real cloud environment. The experimental results show that the proposed approach reduces the execution time with improved accuracy, and is fault-tolerant when handling VM failures. The system is a proof-of-concept model for a scalable, cloud-based distributed system that is able to explore untapped resources, and may be used as a base model for a real-time hierarchical IDS.
Modern storage systems stripe redundant data across multiple nodes to provide availability guarantees against node failures. One form of data redundancy is based on XOR-based erasure codes, which use only XOR operations for encoding and decoding. In addition to tolerating failures, a storage system must also provide fast failure recovery to reduce the window of vulnerability. This work addresses the problem of speeding up the recovery of a single-node failure for general XOR-based erasure codes. We propose a replace recovery algorithm, which uses a hill-climbing technique to search for a fast recovery solution, such that the solution search can be completed within a short time period. We further extend the algorithm to adapt to the scenario where nodes have heterogeneous capabilities (e.g., processing power and transmission bandwidth). We implement our replace recovery algorithm atop a parallelized architecture to demonstrate its feasibility. We conduct experiments on a networked storage system testbed, and show that our replace recovery algorithm uses less recovery time than the conventional recovery approach.