Biblio

List
Filter

Found 11 results

Filters: Keyword is fault-tolerance [Clear All Filters]

2022-07-01

Owoade, Ayoade Akeem, Osunmakinde, Isaac Olusegun. 2021. Fault-tolerance to Cascaded Link Failures of Video Traffic on Attacked Wireless Networks. 2021 IST-Africa Conference (IST-Africa). :1–11.

Research has been conducted on wireless network single link failures. However, cascaded link failures due to fraudulent attacks have not received enough attention, whereas this requires solutions. This research developed an enhanced genetic algorithm (EGA) focused on capacity efficiency and fast restoration to rapidly resolve link-link failures. On complex nodes network, this fault-tolerant model was tested for such failures. Optimal alternative routes and the bandwidth required for quick rerouting of video traffic were generated by the proposed model. Increasing cascaded link failures increases bandwidth usage and causes transmission delay, which slows down video traffic routing. The proposed model outperformed popular Dijkstra models, in terms of time complexity. The survived solution paths demonstrate that the proposed model works well in maintaining connectivity despite cascaded link failures and would therefore be extremely useful in pandemic periods on emergency matters. The proposed technology is feasible for current business applications that require high-speed broadband networks.

2019-08-26

Hasircioglu, Burak, Pignolet, Yvonne-Anne, Sivanthi, Thanikesavan. 2018. Transparent Fault Tolerance for Real-Time Automation Systems. Proceedings of the 1st International Workshop on Internet of People, Assistive Robots and Things. :7-12.

Developing software is hard. Developing software that is resilient and does not crash at the occurrence of unexpected inputs or events is even harder, especially with IoT devices and real-time requirements, e.g., due to interactions with human beings. Therefore, there is a need for a software architecture that helps software developers to build fault-tolerant software with as little pain and effort as possible. To this end, we have designed a fault tolerance framework for automation systems that lets developers be mostly oblivious to fault tolerance issues. Thus they can focus on the application logic encapsulated in (micro)services. That is, the developer only needs to specify the required fault tolerance level by description, not implementation. The fault tolerance aspects are transparent to the developer, as the framework takes care of them. This approach is particularly suited for the development for mixed-criticality systems, where different parts have very different and demanding functional and non-functional requirements. For such systems highly specialized developers are needed and removing the burden of fault tolerance results in faster time to market and safer and more dependable systems.

2018-09-12

Park, Junkil, Ivanov, Radoslav, Weimer, James, Pajic, Miroslav, Son, Sang Hyuk, Lee, Insup. 2017. Security of Cyber-Physical Systems in the Presence of Transient Sensor Faults. ACM Trans. Cyber-Phys. Syst.. 1:15:1–15:23.

This article is concerned with the security of modern Cyber-Physical Systems in the presence of transient sensor faults. We consider a system with multiple sensors measuring the same physical variable, where each sensor provides an interval with all possible values of the true state. We note that some sensors might output faulty readings and others may be controlled by a malicious attacker. Differing from previous works, in this article, we aim to distinguish between faults and attacks and develop an attack detection algorithm for the latter only. To do this, we note that there are two kinds of faults—transient and permanent; the former are benign and short-lived, whereas the latter may have dangerous consequences on system performance. We argue that sensors have an underlying transient fault model that quantifies the amount of time in which transient faults can occur. In addition, we provide a framework for developing such a model if it is not provided by manufacturers. Attacks can manifest as either transient or permanent faults depending on the attacker’s goal. We provide different techniques for handling each kind. For the former, we analyze the worst-case performance of sensor fusion over time given each sensor’s transient fault model and develop a filtered fusion interval that is guaranteed to contain the true value and is bounded in size. To deal with attacks that do not comply with sensors’ transient fault models, we propose a sound attack detection algorithm based on pairwise inconsistencies between sensor measurements. Finally, we provide a real-data case study on an unmanned ground vehicle to evaluate the various aspects of this article.

2017-12-28

Mehetrey, P., Shahriari, B., Moh, M.. 2016. Collaborative Ensemble-Learning Based Intrusion Detection Systems for Clouds. 2016 International Conference on Collaboration Technologies and Systems (CTS). :404–411.

Cloud computation has become prominent with seemingly unlimited amount of storage and computation available to users. Yet, security is a major issue that hampers the growth of cloud. In this research we investigate a collaborative Intrusion Detection System (IDS) based on the ensemble learning method. It uses weak classifiers, and allows the use of untapped resources of cloud to detect various types of attacks on the cloud system. In the proposed system, tasks are distributed among available virtual machines (VM), individual results are then merged for the final adaptation of the learning model. Performance evaluation is carried out using decision trees and using fuzzy classifiers, on KDD99, one of the largest datasets for IDS. Segmentation of the dataset is done in order to mimic the behavior of real-time data traffic occurred in a real cloud environment. The experimental results show that the proposed approach reduces the execution time with improved accuracy, and is fault-tolerant when handling VM failures. The system is a proof-of-concept model for a scalable, cloud-based distributed system that is able to explore untapped resources, and may be used as a base model for a real-time hierarchical IDS.

2017-12-12

Abdi, Fardin, Tabish, Rohan, Rungger, Matthias, Zamani, Majid, Caccamo, Marco. 2017. Application and System-level Software Fault Tolerance Through Full System Restarts. Proceedings of the 8th International Conference on Cyber-Physical Systems. :197–206.

Due to the growing performance requirements, embedded systems are increasingly more complex. Meanwhile, they are also expected to be reliable. Guaranteeing reliability on complex systems is very challenging. Consequently, there is a substantial need for designs that enable the use of unverified components such as real-time operating system (RTOS) without requiring their correctness to guarantee safety. In this work, we propose a novel approach to design a controller that enables the system to restart and remain safe during and after the restart. Complementing this controller with a switching logic allows the system to use complex, unverified controller to drive the system as long as it does not jeopardize safety. Such a design also tolerates faults that occur in the underlying software layers such as RTOS and middleware and recovers from them through system-level restarts that reinitialize the software (middleware, RTOS, and applications) from a read-only storage. Our approach is implementable using one commercial off-the-shelf (COTS) processing unit. To demonstrate the efficacy of our solution, we fully implement a controller for a 3 degree of freedom (3DOF) helicopter. We test the system by injecting various types of faults into the applications and RTOS and verify that the system remains safe.

2017-11-03

Beevi, L. S., Merlin, G., MoganaPriya, G.. 2016. Security and privacy for smart grid using scalable key management. 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT). :4716–4721.

This paper focuses on the issues of secure key management for smart grid. With the present key management schemes, it will not yield security for deployment in smart grid. A novel key management scheme is proposed in this paper which merges elliptic curve public key technique and symmetric key technique. Based on the Needham-Schroeder authentication protocol, symmetric key scheme works. Well known threats like replay attack and man-in-the-middle attack can be successfully abolished using Smart Grid. The benefits of the proposed system are fault-tolerance, accessibility, Strong security, scalability and Efficiency.

2017-06-05

Korupolu, Madhukar, Rajaraman, Rajmohan. 2016. Robust and Probabilistic Failure-Aware Placement. Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. :213–224.

Motivated by the growing complexity and heterogeneity of modern data centers, and the prevalence of commodity component failures, this paper studies the failure-aware placement problem of placing tasks of a parallel job on machines in the data center with the goal of increasing availability. We consider two models of failures: adversarial and probabilistic. In the adversarial model, each node has a weight (higher weight implying higher reliability) and the adversary can remove any subset of nodes of total weight at most a given bound W and our goal is to find a placement that incurs the least disruption against such an adversary. In the probabilistic model, each node has a probability of failure and we need to find a placement that maximizes the probability that at least K out of N tasks survive at any time. For adversarial failures, we first show that (i) the problems are in Σ2, the second level of the polynomial hierarchy, (ii) a basic variant, that we call RobustFAP, is co-NP-hard, and (iii) an all-or-nothing version of RobustFAP is Σ2-complete. We then give a PTAS for RobustFAP, a key ingredient of which is a solution that we design for a fractional version of RobustFAP. We then study fractional RobustFAP over hierarchies, denoted HierRobustFAP, and introduce a notion of hierarchical max-min fairness/ and a novel Generalized Spreading/ algorithm which is simultaneously optimal for all W. These generalize the classical notion of max-min fairness to work with nodes of differing capacities, differing reliability weights and hierarchical structures. Using randomized rounding, we extend this to give an algorithm for integral HierRobustFAP. For the probabilistic version, we first give an algorithm that achieves an additive ε approximation in the failure probability for the single level version, called ProbFAP, while giving up a (1 + ε) multiplicative factor in the number of failures. We then extend the result to the hierarchical version, HierProbFAP, achieving an ε additive approximation in failure probability while giving up an (L + ε) multiplicative factor in the number of failures, where \$L\$ is the number of levels in the hierarchy.

2017-05-30

Angarita, Rafael, Rukoz, Marta, Manouvrier, Maude, Cardinale, Yudith. 2016. A Knowledge-based Approach for Self-healing Service-oriented Applications. Proceedings of the 8th International Conference on Management of Digital EcoSystems. :1–8.

In the context of service-oriented applications, the self-healing property provides reliable execution in order to support failures and assist automatic recovery techniques. This paper presents a knowledge-based approach for self-healing Composite Service (CS) applications. A CS is an application composed by a set of services interacting each other and invoked on the Web. Our approach is supported by Service Agents, which are in charge of the CS fault-tolerance execution control, making decisions about the selection of recovery and proactive strategies. Service Agents decisions are based on the information they have about the whole application, about themselves, and about what it is expected and what it is really happening at run-time. Hence, application knowledge for decision making comprises off-line precomputed global and local information, user QoS preferences, and propagated actual run-time information. Our approach is evaluated experimentally using a case study.

2017-05-19

Ivanov, Radoslav, Pajic, Miroslav, Lee, Insup. 2016. Attack-Resilient Sensor Fusion for Safety-Critical Cyber-Physical Systems. ACM Trans. Embed. Comput. Syst.. 15:21:1–21:24.

This article focuses on the design of safe and attack-resilient Cyber-Physical Systems (CPS) equipped with multiple sensors measuring the same physical variable. A malicious attacker may be able to disrupt system performance through compromising a subset of these sensors. Consequently, we develop a precise and resilient sensor fusion algorithm that combines the data received from all sensors by taking into account their specified precisions. In particular, we note that in the presence of a shared bus, in which messages are broadcast to all nodes in the network, the attacker’s impact depends on what sensors he has seen before sending the corrupted measurements. Therefore, we explore the effects of communication schedules on the performance of sensor fusion and provide theoretical and experimental results advocating for the use of the Ascending schedule, which orders sensor transmissions according to their precision starting from the most precise. In addition, to improve the accuracy of the sensor fusion algorithm, we consider the dynamics of the system in order to incorporate past measurements at the current time. Possible ways of mapping sensor measurement history are investigated in the article and are compared in terms of the confidence in the final output of the sensor fusion. We show that the precision of the algorithm using history is never worse than the no-history one, while the benefits may be significant. Furthermore, we utilize the complementary properties of the two methods and show that their combination results in a more precise and resilient algorithm. Finally, we validate our approach in simulation and experiments on a real unmanned ground robot.

2017-03-29

Rajabi, Arezoo, Bobba, Rakesh B.. 2016. A Resilient Algorithm for Power System Mode Estimation Using Synchrophasors. Proceedings of the 2Nd Annual Industrial Control System Security Workshop. :23–29.

Bulk electric systems include hundreds of synchronous generators. Faults in such systems can induce oscillations in the generators which if not detected and controlled can destabilize the system. Mode estimation is a popular method for oscillation detection. In this paper, we propose a resilient algorithm to estimate electro-mechanical oscillation modes in large scale power system in the presence of false data. In particular, we add a fault tolerance mechanism to a variant of alternating direction method of multipliers (ADMM) called S-ADMM. We evaluate our method on an IEEE 68-bus test system under different attack scenarios and show that in all the scenarios our algorithm converges well.

2015-05-06

Rui Zhou, Rong Min, Qi Yu, Chanjuan Li, Yong Sheng, Qingguo Zhou, Xuan Wang, Kuan-Ching Li. 2014. Formal Verification of Fault-Tolerant and Recovery Mechanisms for Safe Node Sequence Protocol. Advanced Information Networking and Applications (AINA), 2014 IEEE 28th International Conference on. :813-820.

Fault-tolerance has huge impact on embedded safety-critical systems. As technology that assists to the development of such improvement, Safe Node Sequence Protocol (SNSP) is designed to make part of such impact. In this paper, we present a mechanism for fault-tolerance and recovery based on the Safe Node Sequence Protocol (SNSP) to strengthen the system robustness, from which the correctness of a fault-tolerant prototype system is analyzed and verified. In order to verify the correctness of more than thirty failure modes, we have partitioned the complete protocol state machine into several subsystems, followed to the injection of corresponding fault classes into dedicated independent models. Experiments demonstrate that this method effectively reduces the size of overall state space, and verification results indicate that the protocol is able to recover from the fault model in a fault-tolerant system and continue to operate as errors occur.