Biblio
The blockchain technology revolution and the use of blockchains in various applications have resulted in many companies and programmers developing and customizing specific fit-for-purpose consensus algorithms. Security and performance are determined by the chosen consensus algorithm; hence, the reliability and security of these algorithms must be assured and tested, which requires an understanding of all the security assumptions that make such algorithms correct and byzantine fault-tolerant.This paper studies the "security ingredients" that enable a given consensus algorithm to achieve safety, liveness, and byzantine fault tolerance (BFT) in both permissioned and permissionless blockchain systems. The key contributions of this paper are the organization of these requirements and a new taxonomy that describes the requirements for security. The CAP Theorem is utilized to explain important tradeoffs between consistency and availability in consensus algorithm design, which are crucial depending on the specific application of a given algorithm. This topic has also been explored previously by De Angelis. However, this paper expands that prior explanation and dilemma of consistency vs. availability and then combines this with Buterin's Trilemma to complete the overall exposition of tradeoffs.
Cyber physical system (CPS) is often deployed at safety-critical key infrastructures and fields, fault tolerance policies are extensively applied in CPS systems to improve its credibility; the same physical backup of hardware redundancy (SPB) technology is frequently used for its simple and reliable implementation. To resolve challenges faced with in simulation test of SPB-CPS, this paper dynamically determines the test resources matched with the CPS scale by using the adaptive allocation policies, establishes the hierarchical models and inter-layer message transmission mechanism. Meanwhile, the collaborative simulation time sequence push strategy and the node activity test mechanism based on the sliding window are designed in this paper to improve execution efficiency of the simulation test. In order to validate effectiveness of the method proposed in this paper, we successfully built up a fault-tolerant CPS simulation platform. Experiments showed that it can improve the SPB-CPS simulation test efficiency.
Functionally safe control logic design without full duplication is difficult due to the complexity of random control logic. The Reorder buffer (ROB) is a control logic function commonly used in high performance computing systems. In this study, we focus on a safe ROB design used in an industry quality Network-on-Chip (NoC) Advanced eXtensible Interface (AXI) Network Interface (NI) block. We developed and applied area efficient safe design techniques including partial duplication, Error Detection Code (EDC) and invariance checking with formal proofs and showed that we can achieve a desired safe Diagnostic Coverage (DC) requirement with small area and power overheads and no performance degradation.
Atomic multicast is a communication primitive that delivers messages to multiple groups of processes according to some total order, with each group receiving the projection of the total order onto messages addressed to it. To be scalable, atomic multicast needs to be genuine, meaning that only the destination processes of a message should participate in ordering it. In this paper we propose a novel genuine atomic multicast protocol that in the absence of failures takes as low as 3 message delays to deliver a message when no other messages are multicast concurrently to its destination groups, and 5 message delays in the presence of concurrency. This improves the latencies of both the fault-tolerant version of classical Skeen's multicast protocol (6 or 12 message delays, depending on concurrency) and its recent improvement by Coelho et al. (4 or 8 message delays). To achieve such low latencies, we depart from the typical way of guaranteeing fault-tolerance by replicating each group with Paxos. Instead, we weave Paxos and Skeen's protocol together into a single coherent protocol, exploiting opportunities for white-box optimisations. We experimentally demonstrate that the superior theoretical characteristics of our protocol are reflected in practical performance pay-offs.
This paper offers a new approach to modelling the effect of cyber-attacks on reliability of software used in industrial control applications. The model is based on the view that successful cyber-attacks introduce failure regions, which are not present in non-compromised software. The model is then extended to cover a fault tolerant architecture, such as the 1-out-of-2 software, popular for building industrial protection systems. The model is used to study the effectiveness of software maintenance policies such as patching and "cleansing" ("proactive recovery") under different adversary models ranging from independent attacks to sophisticated synchronized attacks on the channels. We demonstrate that the effect of attacks on reliability of diverse software significantly depends on the adversary model. Under synchronized attacks system reliability may be more than an order of magnitude worse than under independent attacks on the channels. These findings, although not surprising, highlight the importance of using an adequate adversary model in the assessment of how effective various cyber-security controls are.
Creating and implementing fault-tolerant distributed algorithms is a challenging task in highly safety-critical industries. Using formal methods supports design and development of complex algorithms. However, formal methods are often perceived as an unjustifiable overhead. This paper presents the experience and insights when using TLA+ and PlusCal to model and develop fault-tolerant and safety-critical modules for TAS Control Platform, a platform for railway control applications up to safety integrity level (SIL) 4. We show how formal methods helped us improve the correctness of the algorithms, improved development efficiency and how part of the gap between model and implementation has been closed by translation to C code. Additionally, we describe how we gained trust in the formal model and tools by following a specific design process called property-driven design, which also implicitly addresses software quality metrics such as code coverage metrics.
In smart grid, large quantities of data is collected from various applications, such as smart metering substation state monitoring, electric energy data acquisition, and smart home. Big data acquired in smart grid applications is usually sensitive. For instance, in order to dispatch accurately and support the dynamic price, lots of smart meters are installed at user's house to collect the real-time data, but all these collected data are related to user privacy. In this paper, we propose a data aggregation scheme based on secret sharing with fault tolerance in smart grid, which ensures that control center gets the integrated data without revealing user's privacy. Meanwhile, we also consider fault tolerance during the data aggregation. At last, we analyze the security of our scheme and carry out experiments to validate the results.
We will focused the concept of serializability in order to ensure the correct processing of transactions. However, both serializability and relevant properties within transaction-based applications might be affected. Ensure transaction serialization in corrupt systems is one of the demands that can handle properly interrelated transactions, which prevents blocking situations that involve the inability to commit either transaction or related sub-transactions. In addition some transactions has been marked as malicious and they compromise the serialization of running system. In such context, this paper proposes an approach for the processing of transactions in a cloud of databases environment able to secure serializability in running transactions whether the system is compromised or not. We propose also an intrusion tolerant scheme to ensure the continuity of the running transactions. A case study and a simulation result are shown to illustrate the capabilities of the suggested system.
The performance, dependability, and security of cloud service systems are vital for the ongoing operation, control, and support. Thus, controlled improvement in service requires a comprehensive analysis and systematic identification of the fundamental underlying constituents of cloud using a rigorous discipline. In this paper, we introduce a framework which helps identifying areas for potential cloud service enhancements. A cloud service cannot be completed if there is a failure in any of its underlying resources. In addition, resources are kept offline for scheduled maintenance. We use redundant resources to mitigate the impact of failures/maintenance for ensuring performance and dependability; which helps enhancing security as well. For example, at least 4 replicas are required to defend the intrusion of a single instance or a single malicious attack/fault as defined by Byzantine Fault Tolerance (BFT). Data centers with high performance, dependability, and security are outsourced to the cloud computing environment with greater flexibility of cost of owing the computing infrastructure. In this paper, we analyze the effectiveness of redundant resource usage in terms of dependability metric and cost of service deployment based on the priority of service requests. The trade-off among dependability, cost, and security under different redundancy schemes are characterized through the comprehensive analytical models.
Barrier coverage has been widely adopted to prevent unauthorized invasion of important areas in sensor networks. As sensors are typically placed outdoors, they are susceptible to getting faulty. Previous works assumed that faulty sensors are easy to recognize, e.g., they may stop functioning or output apparently deviant sensory data. In practice, it is, however, extremely difficult to recognize faulty sensors as well as their invalid output. We, in this paper, propose a novel fault-tolerant intrusion detection algorithm (TrusDet) based on trust management to address this challenging issue. TrusDet comprises of three steps: i) sensor-level detection, ii) sink-level decision by collective voting, and iii) trust management and fault determination. In the Step i) and ii), TrusDet divides the surveillance area into a set of fine- grained subareas and exploits temporal and spatial correlation of sensory output among sensors in different subareas to yield a more accurate and robust performance of barrier coverage. In the Step iii), TrusDet builds a trust management based framework to determine the confidence level of sensors being faulty. We implement TrusDet on HC- SR501 infrared sensors and demonstrate that TrusDet has a desired performance.
Consensus is a fundamental approach to implementing fault-tolerant services through replication. It is well known that there exists a tradeoff between the cost and the resilience. For instance, Crash Fault Tolerant (CFT) protocols have a low cost but can only handle crash failures while Byzantine Fault Tolerant (BFT) protocols handle arbitrary failures but have a higher cost. Hybrid protocols enjoy the benefits of both high performance without failures and high resiliency under failures by switching among different subprotocols. However, it is challenging to determine which subprotocols should be used. We propose a moving target approach to switch among protocols according to the existing system and network vulnerability. At the core of our approach is a formalized cost model that evaluates the vulnerability and performance of consensus protocols based on real-time Intrusion Detection System (IDS) signals. Based on the evaluation results, we demonstrate that a safe, cheap, and unpredictable protocol is always used and a high IDS error rate can be tolerated.
Computer security has become an increasingly important hot topic in computer and communication industry, since it is important to support critical business process and to protect personal and sensitive information. Computer security is to keep security attributes (confidentiality, integrity and availability) of computer systems, which face the threats such as deny-of-service (DoS), virus and intrusion. To ensure high computer security, the intrusion tolerance technique based on fault-tolerant scheme has been widely applied. This paper presents the quantitative performance evaluation of a virtual machine (VM) based intrusion tolerant system. Concretely, two security measures are derived; MTTSF (mean time to security failure) and the effective traffic intensity. The mathematical analysis is achieved by using Laplace-Stieltjes transforms according to the analysis of M/G/1 queueing system.
Security patterns are generic solutions that can be applied since early stages of software life to overcome recurrent security weaknesses. Their generic nature and growing number make their choice difficult, even for experts in system design. To help them on the pattern choice, this paper proposes a semi-automatic methodology of classification and the classification itself, which exposes relationships among software weaknesses, security principles and security patterns. It expresses which patterns remove a given weakness with respect to the security principles that have to be addressed to fix the weakness. The methodology is based on seven steps, which anatomize patterns and weaknesses into set of more precise sub-properties that are associated through a hierarchical organization of security principles. These steps provide the detailed justifications of the resulting classification and allow its upgrade. Without loss of generality, this classification has been established for Web applications and covers 185 software weaknesses, 26 security patterns and 66 security principles. Research supported by the industrial chair on Digital Confidence (http://confiance-numerique.clermont-universite.fr/index-en.html).
Controller Area Network (CAN) is the main bus network that connects electronic control units in automobiles. Although CAN protocols have been revised to improve the vehicle safety, the security weaknesses of CAN have not been fully addressed. Security threats on automobiles might be from external wireless communication or from internal malicious CAN nodes mounted on the CAN bus. Despite of various threat sources, the security weakness of CAN is the root of security problems. Due to the limited computation power and storage capacity on each CAN node, there is a lack of hardware-efficient protection methods for the CAN system without losing the compatibility to CAN protocols. To save the cost and maintain the compatibility, we propose to exploit the built-in CAN fault confinement mechanism to detect the masquerade attacks originated from the malicious CAN devices on the CAN bus. Simulation results show that our method achieves the attack misdetection rate at the order of 10-5 and reduces the encryption latency by up to 68% over the complete frame encryption method.
Fault-tolerance has huge impact on embedded safety-critical systems. As technology that assists to the development of such improvement, Safe Node Sequence Protocol (SNSP) is designed to make part of such impact. In this paper, we present a mechanism for fault-tolerance and recovery based on the Safe Node Sequence Protocol (SNSP) to strengthen the system robustness, from which the correctness of a fault-tolerant prototype system is analyzed and verified. In order to verify the correctness of more than thirty failure modes, we have partitioned the complete protocol state machine into several subsystems, followed to the injection of corresponding fault classes into dedicated independent models. Experiments demonstrate that this method effectively reduces the size of overall state space, and verification results indicate that the protocol is able to recover from the fault model in a fault-tolerant system and continue to operate as errors occur.
Mobile Ad-Hoc Networks are dynamic and wireless self-organization networks that many mobile nodes connect to each other weakly. To compare with traditional networks, they suffer failures that prevent the system from working properly. Nevertheless, we have to cope with many security issues such as unauthorized attempts, security threats and reliability. Using mobile agents in having low level fault tolerance ad-hoc networks provides fault masking that the users never notice. Mobile agent migration among nodes, choosing an alternative paths autonomous and, having high level fault tolerance provide networks that have low bandwidth and high failure ratio, more reliable. In this paper we declare that mobile agents fault tolerance peculiarity and existing fault tolerance method based on mobile agents. Also in ad-hoc networks that need security precautions behind fault tolerance, we express the new model: Secure Mobil Agent Based Fault Tolerance Model.
Variable Precision Rough Set (VPRS) model is one of the most important extensions of the Classical Rough Set (RS) theory. It employs a majority inclusion relation mechanism in order to make the Classical RS model become more fault tolerant, and therefore the generalization of the model is improved. This paper can be viewed as an extension of previous investigations on attribution reduction problem in VPRS model. In our investigation, we illustrated with examples that the previously proposed reduct definitions may spoil the hidden classification ability of a knowledge system by ignoring certian essential attributes in some circumstances. Consequently, by proposing a new β-consistent notion, we analyze the relationship between the structures of Decision Table (DT) and different definitions of reduct in VPRS model. Then we give a new notion of β-complement reduct that can avoid the defects of reduct notions defined in previous literatures. We also supply the method to obtain the β- complement reduct using a decision table splitting algorithm, and finally demonstrate the feasibility of our approach with sample instances.