Biblio
In smart grid, large quantities of data is collected from various applications, such as smart metering substation state monitoring, electric energy data acquisition, and smart home. Big data acquired in smart grid applications is usually sensitive. For instance, in order to dispatch accurately and support the dynamic price, lots of smart meters are installed at user's house to collect the real-time data, but all these collected data are related to user privacy. In this paper, we propose a data aggregation scheme based on secret sharing with fault tolerance in smart grid, which ensures that control center gets the integrated data without revealing user's privacy. Meanwhile, we also consider fault tolerance during the data aggregation. At last, we analyze the security of our scheme and carry out experiments to validate the results.
Reliable and scalable storage systems are key to cloud-based applications. In cloud storage, users store their data on remote servers rather than their local computers. Secure storage is used to ensure the safety of data in clouds. As more and more users rely on third-party cloud vendors to store their data, concerns have arisen among users and cloud providers. Encryption-based approaches are commonly used in secure storage systems. Data are encrypted and stored on persistent storage like disks and flash memories. When data are needed by the users, they are decrypted and accessed by the users. This way of managing data hurts the scalability and throughput of cloud systems. In the meantime, cloud systems have to perform fault-tolerance strategies on data, which also brings performance deduction. The combination of these issues cause a high price for data security in cloud systems. Aware of such issues. we propose methods to reduce the overhead of secure storage while guaranteeing the safeness of data.
We will focused the concept of serializability in order to ensure the correct processing of transactions. However, both serializability and relevant properties within transaction-based applications might be affected. Ensure transaction serialization in corrupt systems is one of the demands that can handle properly interrelated transactions, which prevents blocking situations that involve the inability to commit either transaction or related sub-transactions. In addition some transactions has been marked as malicious and they compromise the serialization of running system. In such context, this paper proposes an approach for the processing of transactions in a cloud of databases environment able to secure serializability in running transactions whether the system is compromised or not. We propose also an intrusion tolerant scheme to ensure the continuity of the running transactions. A case study and a simulation result are shown to illustrate the capabilities of the suggested system.
The performance, dependability, and security of cloud service systems are vital for the ongoing operation, control, and support. Thus, controlled improvement in service requires a comprehensive analysis and systematic identification of the fundamental underlying constituents of cloud using a rigorous discipline. In this paper, we introduce a framework which helps identifying areas for potential cloud service enhancements. A cloud service cannot be completed if there is a failure in any of its underlying resources. In addition, resources are kept offline for scheduled maintenance. We use redundant resources to mitigate the impact of failures/maintenance for ensuring performance and dependability; which helps enhancing security as well. For example, at least 4 replicas are required to defend the intrusion of a single instance or a single malicious attack/fault as defined by Byzantine Fault Tolerance (BFT). Data centers with high performance, dependability, and security are outsourced to the cloud computing environment with greater flexibility of cost of owing the computing infrastructure. In this paper, we analyze the effectiveness of redundant resource usage in terms of dependability metric and cost of service deployment based on the priority of service requests. The trade-off among dependability, cost, and security under different redundancy schemes are characterized through the comprehensive analytical models.
Barrier coverage has been widely adopted to prevent unauthorized invasion of important areas in sensor networks. As sensors are typically placed outdoors, they are susceptible to getting faulty. Previous works assumed that faulty sensors are easy to recognize, e.g., they may stop functioning or output apparently deviant sensory data. In practice, it is, however, extremely difficult to recognize faulty sensors as well as their invalid output. We, in this paper, propose a novel fault-tolerant intrusion detection algorithm (TrusDet) based on trust management to address this challenging issue. TrusDet comprises of three steps: i) sensor-level detection, ii) sink-level decision by collective voting, and iii) trust management and fault determination. In the Step i) and ii), TrusDet divides the surveillance area into a set of fine- grained subareas and exploits temporal and spatial correlation of sensory output among sensors in different subareas to yield a more accurate and robust performance of barrier coverage. In the Step iii), TrusDet builds a trust management based framework to determine the confidence level of sensors being faulty. We implement TrusDet on HC- SR501 infrared sensors and demonstrate that TrusDet has a desired performance.
Consensus is a fundamental approach to implementing fault-tolerant services through replication. It is well known that there exists a tradeoff between the cost and the resilience. For instance, Crash Fault Tolerant (CFT) protocols have a low cost but can only handle crash failures while Byzantine Fault Tolerant (BFT) protocols handle arbitrary failures but have a higher cost. Hybrid protocols enjoy the benefits of both high performance without failures and high resiliency under failures by switching among different subprotocols. However, it is challenging to determine which subprotocols should be used. We propose a moving target approach to switch among protocols according to the existing system and network vulnerability. At the core of our approach is a formalized cost model that evaluates the vulnerability and performance of consensus protocols based on real-time Intrusion Detection System (IDS) signals. Based on the evaluation results, we demonstrate that a safe, cheap, and unpredictable protocol is always used and a high IDS error rate can be tolerated.
Computer security has become an increasingly important hot topic in computer and communication industry, since it is important to support critical business process and to protect personal and sensitive information. Computer security is to keep security attributes (confidentiality, integrity and availability) of computer systems, which face the threats such as deny-of-service (DoS), virus and intrusion. To ensure high computer security, the intrusion tolerance technique based on fault-tolerant scheme has been widely applied. This paper presents the quantitative performance evaluation of a virtual machine (VM) based intrusion tolerant system. Concretely, two security measures are derived; MTTSF (mean time to security failure) and the effective traffic intensity. The mathematical analysis is achieved by using Laplace-Stieltjes transforms according to the analysis of M/G/1 queueing system.
Security patterns are generic solutions that can be applied since early stages of software life to overcome recurrent security weaknesses. Their generic nature and growing number make their choice difficult, even for experts in system design. To help them on the pattern choice, this paper proposes a semi-automatic methodology of classification and the classification itself, which exposes relationships among software weaknesses, security principles and security patterns. It expresses which patterns remove a given weakness with respect to the security principles that have to be addressed to fix the weakness. The methodology is based on seven steps, which anatomize patterns and weaknesses into set of more precise sub-properties that are associated through a hierarchical organization of security principles. These steps provide the detailed justifications of the resulting classification and allow its upgrade. Without loss of generality, this classification has been established for Web applications and covers 185 software weaknesses, 26 security patterns and 66 security principles. Research supported by the industrial chair on Digital Confidence (http://confiance-numerique.clermont-universite.fr/index-en.html).
This paper focuses on the issues of secure key management for smart grid. With the present key management schemes, it will not yield security for deployment in smart grid. A novel key management scheme is proposed in this paper which merges elliptic curve public key technique and symmetric key technique. Based on the Needham-Schroeder authentication protocol, symmetric key scheme works. Well known threats like replay attack and man-in-the-middle attack can be successfully abolished using Smart Grid. The benefits of the proposed system are fault-tolerance, accessibility, Strong security, scalability and Efficiency.
Tree structures such as breadth-first search (BFS) trees and minimum spanning trees (MST) are among the most fundamental graph structures in distributed network algorithms. However, by definition, these structures are not robust against failures and even a single edge's removal can disrupt their functionality. A well-studied concept which attempts to circumvent this issue is Fault-Tolerant Tree Structures, where the tree gets augmented with additional edges from the network so that the functionality of the structure is maintained even when an edge fails. These structures, or other equivalent formulations, have been studied extensively from a centralized viewpoint. However, despite the fact that the main motivations come from distributed networks, their distributed construction has not been addressed before. In this paper, we present distributed algorithms for constructing fault tolerant BFS and MST structures. The time complexity of our algorithms are nearly optimal in the following strong sense: they almost match even the lower bounds of constructing (basic) BFS and MST trees.
The popularity of Cloud computing has considerably increased during the last years. The increase of Cloud users and their interactions with the Cloud infrastructure raise the risk of resources faults. Such a problem can lead to a bad reputation of the Cloud environment which slows down the evolution of this technology. To address this issue, the dynamic and the complex architecture of the Cloud should be taken into account. Indeed, this architecture requires that resources protection and healing must be transparent and without external intervention. Unlike previous work, we suggest integrating the fundamental aspects of autonomic computing in the Cloud to deal with the self-healing of Cloud resources. Starting from the high degree of match between autonomic computing systems and multiagent systems, we propose to take advantage from the autonomous behaviour of agent technology to create an intelligent Cloud that supports autonomic aspects. Our proposed solution is a multi-agent system which interacts with the Cloud infrastructure to analyze the resources state and execute Checkpoint/Replication strategy or migration technique to solve the problem of failed resources.
State machine replication (SMR) is a well-established technique to fault-tolerant systems. In part, this is explained by the simplicity of the approach and its strong consistency guarantees. Recently, several proposals have suggested parallelizing the execution of state machine replicas to achieve high throughput. Concurrent execution of commands has many implications, including the recovery of replicas from failures. Conventional checkpointing techniques, for example, must be revisited in parallelized models. In this paper, we review parallel variations of state machine replication and discuss how checkpointing procedures apply to these models. Moreover, we evaluate the impact caused by checkpointing techniques on recovery through simulations.
Fault tolerance is a key challenge to building the first exa\textbackslash-scale system. To understand the potential impacts of failures on next-generation systems, significant effort has been devoted to collecting, characterizing and analyzing failures on current systems. These studies require large volumes of data and complex analysis. Because the occurrence of failures in large-scale systems is unpredictable, failures are commonly modeled as a stochastic process. Failure data from current systems is examined in an attempt to identify the underlying probability distribution and its statistical properties. In this paper, we use modeling to examine the impact of failure distributions on the time-to-solution and the optimal checkpoint interval of applications that use coordinated checkpoint/restart. Using this approach, we show that as failures become more frequent, the failure distribution has a larger influence on application performance. We also show that as failure times are less tightly grouped (i.e., as the standard deviation increases) the underlying probability distribution has a greater impact on application performance. Finally, we show that computing the checkpoint interval based on the assumption that failures are exponentially distributed has a modest impact on application performance even when failures are drawn from a different distribution. Our work provides critical analysis and guidance to the process of analyzing failure data in the context of coordinated checkpoint/restart. Specifically, the data presented in this paper helps to distinguish cases where the failure distribution has a strong influence on application performance from those cases when the failure distribution has relatively little impact.
Standard routing protocols for IPv6 over Low power Wireless Personal Area Networks (6LoWPAN) are mainly designed for data collection applications and work by establishing a tree-based network topology, which enables packets to be sent upwards, from the leaves to the root, adapting to dynamics of low-power communication links. The routing tables in such unidirectional networks are very simple and small since each node just needs to maintain the address of its parent in the tree, providing the best-quality route at every moment. In this work, we propose Matrix, a platform-independent routing protocol that utilizes the existing tree structure of the network to enable reliable and efficient any-to-any data traffic. Matrix uses hierarchical IPv6 address assignment in order to optimize routing table size, while preserving bidirectional routing. Moreover, it uses a local broadcast mechanism to forward messages to the right subtree when persistent node or link failures occur. We implemented Matrix on TinyOS and evaluated its performance both analytically and through simulations on TOSSIM. Our results show that the proposed protocol is superior to available protocols for 6LoWPAN, when it comes to any-to-any data communication, in terms of reliability, message efficiency, and memory footprint.
Mutex locks have traditionally been the most common mechanism for protecting shared data structures in parallel programs. However, the robustness of such locks against process failures has not been studied thoroughly. Most (user-level) mutex algorithms are designed around the assumption that processes are reliable, meaning that a process may not fail while executing the lock acquisition and release code, or while inside the critical section. If such a failure does occur, then the liveness properties of a conventional mutex lock may cease to hold until the application or operating system intervenes by cleaning up the internal structure of the lock. For example, a process that is attempting to acquire an otherwise starvation-free mutex may be blocked forever waiting for a failed process to release the critical section. Adding to the difficulty, if the failed process recovers and attempts to acquire the same mutex again without appropriate cleanup, then the mutex may become corrupted to the point where it loses safety, notably the mutual exclusion property. We address this challenge by formalizing the problem of recoverable mutual exclusion, and proposing several solutions that vary both in their assumptions regarding hardware support for synchronization, and in their time complexity. Compared to known solutions, our algorithms are more robust as they do not restrict where or when a process may crash, and provide stricter guarantees in terms of time complexity, which we define in terms of remote memory references.
Controller Area Network (CAN) is the main bus network that connects electronic control units in automobiles. Although CAN protocols have been revised to improve the vehicle safety, the security weaknesses of CAN have not been fully addressed. Security threats on automobiles might be from external wireless communication or from internal malicious CAN nodes mounted on the CAN bus. Despite of various threat sources, the security weakness of CAN is the root of security problems. Due to the limited computation power and storage capacity on each CAN node, there is a lack of hardware-efficient protection methods for the CAN system without losing the compatibility to CAN protocols. To save the cost and maintain the compatibility, we propose to exploit the built-in CAN fault confinement mechanism to detect the masquerade attacks originated from the malicious CAN devices on the CAN bus. Simulation results show that our method achieves the attack misdetection rate at the order of 10-5 and reduces the encryption latency by up to 68% over the complete frame encryption method.
Fault-tolerance has huge impact on embedded safety-critical systems. As technology that assists to the development of such improvement, Safe Node Sequence Protocol (SNSP) is designed to make part of such impact. In this paper, we present a mechanism for fault-tolerance and recovery based on the Safe Node Sequence Protocol (SNSP) to strengthen the system robustness, from which the correctness of a fault-tolerant prototype system is analyzed and verified. In order to verify the correctness of more than thirty failure modes, we have partitioned the complete protocol state machine into several subsystems, followed to the injection of corresponding fault classes into dedicated independent models. Experiments demonstrate that this method effectively reduces the size of overall state space, and verification results indicate that the protocol is able to recover from the fault model in a fault-tolerant system and continue to operate as errors occur.
Mobile Ad-Hoc Networks are dynamic and wireless self-organization networks that many mobile nodes connect to each other weakly. To compare with traditional networks, they suffer failures that prevent the system from working properly. Nevertheless, we have to cope with many security issues such as unauthorized attempts, security threats and reliability. Using mobile agents in having low level fault tolerance ad-hoc networks provides fault masking that the users never notice. Mobile agent migration among nodes, choosing an alternative paths autonomous and, having high level fault tolerance provide networks that have low bandwidth and high failure ratio, more reliable. In this paper we declare that mobile agents fault tolerance peculiarity and existing fault tolerance method based on mobile agents. Also in ad-hoc networks that need security precautions behind fault tolerance, we express the new model: Secure Mobil Agent Based Fault Tolerance Model.
Variable Precision Rough Set (VPRS) model is one of the most important extensions of the Classical Rough Set (RS) theory. It employs a majority inclusion relation mechanism in order to make the Classical RS model become more fault tolerant, and therefore the generalization of the model is improved. This paper can be viewed as an extension of previous investigations on attribution reduction problem in VPRS model. In our investigation, we illustrated with examples that the previously proposed reduct definitions may spoil the hidden classification ability of a knowledge system by ignoring certian essential attributes in some circumstances. Consequently, by proposing a new β-consistent notion, we analyze the relationship between the structures of Decision Table (DT) and different definitions of reduct in VPRS model. Then we give a new notion of β-complement reduct that can avoid the defects of reduct notions defined in previous literatures. We also supply the method to obtain the β- complement reduct using a decision table splitting algorithm, and finally demonstrate the feasibility of our approach with sample instances.
In this paper we address the problem of designing a fault tolerant control scheme for an HVAC control system where sensing and actuation data are exchanged with a centralized controller via a wireless sensors and actuators network where the communication nodes are subject to permanent failures and malicious intrusions.
Providers of critical infrastructure services strive to maintain the high availability of their SCADA systems. This paper reports on our experience designing, architecting, and evaluating the first survivable SCADA system-one that is able to ensure correct behavior with minimal performance degradation even during cyber attacks that compromise part of the system. We describe the challenges we faced when integrating modern intrusion-tolerant protocols with a conventional SCADA architecture and present the techniques we developed to overcome these challenges. The results illustrate that our survivable SCADA system not only functions correctly in the face of a cyber attack, but that it also processes in excess of 20 000 messages per second with a latency of less than 30 ms, making it suitable for even large-scale deployments managing thousands of remote terminal units.
Complex event processing has become an important technology for big data and intelligent computing because it facilitates the creation of actionable, situational knowledge from potentially large amount events in soft realtime. Complex event processing can be instrumental for many mission-critical applications, such as business intelligence, algorithmic stock trading, and intrusion detection. Hence, the servers that carry out complex event processing must be made trustworthy. In this paper, we present a threat analysis on complex event processing systems and describe a set of mechanisms that can be used to control various threats. By exploiting the application semantics for typical event processing operations, we are able to design lightweight mechanisms that incur minimum runtime overhead appropriate for soft realtime computing.
Byzantine fault tolerance has been intensively studied over the past decade as a way to enhance the intrusion resilience of computer systems. However, state-machine-based Byzantine fault tolerance algorithms require deterministic application processing and sequential execution of totally ordered requests. One way of increasing the practicality of Byzantine fault tolerance is to exploit the application semantics, which we refer to as application-aware Byzantine fault tolerance. Application-aware Byzantine fault tolerance makes it possible to facilitate concurrent processing of requests, to minimize the use of Byzantine agreement, and to identify and control replica nondeterminism. In this paper, we provide an overview of recent works on application-aware Byzantine fault tolerance techniques. We elaborate the need for exploiting application semantics for Byzantine fault tolerance and the benefits of doing so, provide a classification of various approaches to application-aware Byzantine fault tolerance, and outline the mechanisms used in achieving application-aware Byzantine fault tolerance according to our classification.
Byzantine fault tolerance has been intensively studied over the past decade as a way to enhance the intrusion resilience of computer systems. However, state-machine-based Byzantine fault tolerance algorithms require deterministic application processing and sequential execution of totally ordered requests. One way of increasing the practicality of Byzantine fault tolerance is to exploit the application semantics, which we refer to as application-aware Byzantine fault tolerance. Application-aware Byzantine fault tolerance makes it possible to facilitate concurrent processing of requests, to minimize the use of Byzantine agreement, and to identify and control replica nondeterminism. In this paper, we provide an overview of recent works on application-aware Byzantine fault tolerance techniques. We elaborate the need for exploiting application semantics for Byzantine fault tolerance and the benefits of doing so, provide a classification of various approaches to application-aware Byzantine fault tolerance, and outline the mechanisms used in achieving application-aware Byzantine fault tolerance according to our classification.
In recent years, there has been a huge trend towards running network intensive applications, such as Internet servers and Cloud-based service in virtual environment, where multiple virtual machines (VMs) running on the same machine share the machine's physical and network resources. In such environment, the virtual machine monitor (VMM) virtualizes the machine's resources in terms of CPU, memory, storage, network and I/O devices to allow multiple operating systems running in different VMs to operate and access the network concurrently. A key feature of virtualization is live migration (LM) that allows transfer of virtual machine from one physical server to another without interrupting the services running in virtual machine. Live migration facilitates workload balancing, fault tolerance, online system maintenance, consolidation of virtual machines etc. However, live migration is still in an early stage of implementation and its security is yet to be evaluated. The security concern of live migration is a major factor for its adoption by the IT industry. Therefore, this paper uses the X.805 security standard to investigate attacks on live virtual machine migration. The analysis highlights the main source of threats and suggests approaches to tackle them. The paper also surveys and compares different proposals in the literature to secure the live migration.