Biblio

List
Filter

Found 102 results

Filters: Keyword is Fault tolerance [Clear All Filters]

2020-02-26

Sabbagh, Majid, Gongye, Cheng, Fei, Yunsi, Wang, Yanzhi. 2019. Evaluating Fault Resiliency of Compressed Deep Neural Networks. 2019 IEEE International Conference on Embedded Software and Systems (ICESS). :1–7.

Model compression is considered to be an effective way to reduce the implementation cost of deep neural networks (DNNs) while maintaining the inference accuracy. Many recent studies have developed efficient model compression algorithms and implementations in accelerators on various devices. Protecting integrity of DNN inference against fault attacks is important for diverse deep learning enabled applications. However, there has been little research investigating the fault resilience of DNNs and the impact of model compression on fault tolerance. In this work, we consider faults on different data types and develop a simulation framework for understanding the fault resiliency of compressed DNN models as compared to uncompressed models. We perform our experiments on two common DNNs, LeNet-5 and VGG16, and evaluate their fault resiliency with different types of compression. The results show that binary quantization can effectively increase the fault resilience of DNN models by 10000x for both LeNet5 and VGG16. Finally, we propose software and hardware mitigation techniques to increase the fault resiliency of DNN models.

Tran, Geoffrey Phi, Walters, John Paul, Crago, Stephen. 2019. Increased Fault-Tolerance and Real-Time Performance Resiliency for Stream Processing Workloads through Redundancy. 2019 IEEE International Conference on Services Computing (SCC). :51–55.

Data analytics and telemetry have become paramount to monitoring and maintaining quality-of-service in addition to business analytics. Stream processing-a model where a network of operators receives and processes continuously arriving discrete elements-is well-suited for these needs. Current and previous studies and frameworks have focused on continuity of operations and aggregate performance metrics. However, real-time performance and tail latency are also important. Timing errors caused by either performance or failed communication faults also affect real-time performance more drastically than aggregate metrics. In this paper, we introduce redundancy in the stream data to improve the real-time performance and resiliency to timing errors caused by either performance or failed communication faults. We also address limitations in previous solutions using a fine-grained acknowledgment tracking scheme to both increase the effectiveness for resiliency to performance faults and enable effectiveness for failed communication faults. Our results show that fine-grained acknowledgment schemes can improve the tail and mean latencies by approximately 30%. We also show that these schemes can improve resiliency to performance faults compared to existing work. Our improvements result in 47.4% to 92.9% fewer missed deadlines compared to 17.3% to 50.6% for comparable topologies and redundancy levels in the state of the art. Finally, we show that redundancies of 25% to 100% can reduce the number of data elements that miss their deadline constraints by 0.76% to 14.04% for applications with high fan-out and by 7.45% up to 50% for applications with no fan-out.

Abraham, Jacob A.. 2019. Resiliency Demands on Next Generation Critical Embedded Systems. 2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS). :135–138.

Emerging intelligent systems have stringent constraints including cost and power consumption. When they are used in critical applications, resiliency becomes another key requirement. Much research into techniques for fault tolerance and dependability has been successfully applied to highly critical systems, such as those used in space, where cost is not an overriding constraint. Further, most resiliency techniques were focused on dealing with failures in the hardware and bugs in the software. The next generation of systems used in critical applications will also have to be tolerant to test escapes after manufacturing, soft errors and transients in the electronics, hardware bugs, hardware and software Trojans and viruses, as well as intrusions and other security attacks during operation. This paper will assess the impact of these threats on the results produced by a critical system, and proposed solutions to each of them. It is argued that run-time checks at the application-level are necessary to deal with errors in the results.

2020-02-18

Gotsman, Alexey, Lefort, Anatole, Chockler, Gregory. 2019. White-Box Atomic Multicast. 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :176–187.

Atomic multicast is a communication primitive that delivers messages to multiple groups of processes according to some total order, with each group receiving the projection of the total order onto messages addressed to it. To be scalable, atomic multicast needs to be genuine, meaning that only the destination processes of a message should participate in ordering it. In this paper we propose a novel genuine atomic multicast protocol that in the absence of failures takes as low as 3 message delays to deliver a message when no other messages are multicast concurrently to its destination groups, and 5 message delays in the presence of concurrency. This improves the latencies of both the fault-tolerant version of classical Skeen's multicast protocol (6 or 12 message delays, depending on concurrency) and its recent improvement by Coelho et al. (4 or 8 message delays). To achieve such low latencies, we depart from the typical way of guaranteeing fault-tolerance by replicating each group with Paxos. Instead, we weave Paxos and Skeen's protocol together into a single coherent protocol, exploiting opportunities for white-box optimisations. We experimentally demonstrate that the superior theoretical characteristics of our protocol are reflected in practical performance pay-offs.

2020-02-17

Nugroho, Yeremia Nikanor, Andika, Ferdi, Sari, Riri Fitri. 2019. Scalability Evaluation of Aspen Tree and Fat Tree Using NS3. 2019 IEEE Conference on Application, Information and Network Security (AINS). :89–93.

When discussing data center networks (DCN), topology has a significant influence on the availability of data to the host. The performance of DCN is relative to the scale of the network. On a particular network scale, it can even cause a connection to the host to be disconnected due to the overhead of routing information. It takes a long time to get connected again so that the data packet that has been sent is lost. The length of time for updating routing information to all parts of the topology so that it can be reconnected or referred to as the time of convergence is the cause. Scalability of a network is proportional to the time of convergence. This article discusses Aspen Tree and Fat Tree, which is about the modification of multi-root trees that have been modified. In Fat Tree, a final set of hosts from a network can be disconnected from a network topology until there is an update of routing information that is disseminated to each switch on the network, due to a link failure. Aspen Tree is a reference topology because it is considered to reduce convergence time and control the overhead of network failure recovery. The DCN topology performance models are implemented using the open source NS-3 platform to support validation of performance evaluations.

2019-12-30

Chen, Jing, Du, Ruiying. 2009. Fault Tolerance and Security in Forwarding Packets Using Game Theory. 2009 International Conference on Multimedia Information Networking and Security. 2:534–537.

In self-organized wireless network, such as ad hoc network, sensor network or mesh network, nodes are independent individuals which have different benefit; Therefore, selfish nodes refuse to forward packets for other nodes in order to save energy which causes the network fault. At the same time, some nodes may be malicious, whose aim is to damage the network. In this paper, we analyze the cooperation stimulation and security in self-organized wireless networks under a game theoretic framework. We first analyze a four node wireless network in which nodes share the channel by relaying for others during its idle periods in order to help the other nodes, each node has to use a part of its available channel capacity. And then, the fault tolerance and security problem is modeled as a non-cooperative game in which each player maximizes its own utility function. The goal of the game is to maximize the utility function in the giving condition in order to get better network efficiency. At last, for characterizing the efficiency of Nash equilibria, we analyze the so called price of anarchy, as the ratio between the objective function at the worst Nash equilibrium and the optimal objective function. Our results show that the players can get the biggest payoff if they obey cooperation strategy.

2019-12-16

Ferdowsi, Farzad, Barati, Masoud, Edrington, Chris S.. 2019. Real-Time Resiliency Assessment of Control Systems in Microgrids Using the Complexity Metric. 2019 IEEE Green Technologies Conference(GreenTech). :1-5.

This paper presents a novel technique to quantify the operational resilience for power electronic-based components affected by High-Impact Low-Frequency (HILF) weather-related events such as high speed winds. In this study, the resilience quantification is utilized to investigate how prompt the system goes back to the pre-disturbance or another stable operational state. A complexity quantification metric is used to assess the system resilience. The test system is a Solid-State Transformer (SST) representing a complex, nonlinear interconnected system. Results show the effectiveness of the proposed technique for quantifying the operational resilience in systems affected by weather-related disturbances.

2019-12-02

Sengupta, Anirban, Kachave, Deepak. 2018. Integrating Compiler Driven Transformation and Simulated Annealing Based Floorplan for Optimized Transient Fault Tolerant DSP Cores. 2018 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS). :17–20.

Reliability of electronic devices in sub-nanometer technology scale has become a major concern. However, demand for battery operated low power, high performance devices necessitates technology scaling. To meet these contradictory design goals optimization and reliability must be performed simultaneously. This paper proposes by integrating compiler driven transformation and simulated annealing based optimization process for generating optimized low cost transient fault tolerant DSP core. The case study on FIR filter shows improved performance (in terms of reduced area and delay) of proposed approach in comparison to state-of-art transient fault tolerant approach.

2019-11-26

Rguibi, Mohamed Amine, Moussa, Najem. 2018. Simulating Worm Propagation in Interconnected Peer-to-Peer Networks. 2018 International Conference on Advanced Communication Technologies and Networking (CommNet). :1-7.

Peer-to-peer computing (P2P) refers to the famous technology that provides peers an equal spontaneous collaboration in the network by using appropriate information and communication systems without the need for a central server coordination. Today, the interconnection of several P2P networks has become a genuine solution for increasing system reliability, fault tolerance and resource availability. However, the existence of security threats in such networks, allows us to investigate the safety of users from P2P threats by studying the effects of competition between these interconnected networks. In this paper, we present an e-epidemic model to characterize the worm propagation in an interconnected peer-to-peer network. Here, we address this issue by introducing a model of network competition where an unprotected network is willing to partially weaken its own safety in order to more severely damage a more protected network. The unprotected network can infect all peers in the competitive networks after their non react against the passive worm propagation. Our model also evaluated the effect of an immunization strategies adopted by the protected network to resist against attacking networks. The launch time of immunization strategies in the protected network, the number of peers synapse connected to the both networks, and other effective parameters have also been investigated in this paper.

2019-09-04

Lawson, M., Lofstead, J.. 2018. Using a Robust Metadata Management System to Accelerate Scientific Discovery at Extreme Scales. 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage Data Intensive Scalable Computing Systems (PDSW-DISCS). :13–23.

Our previous work, which can be referred to as EMPRESS 1.0, showed that rich metadata management provides a relatively low-overhead approach to facilitating insight from scale-up scientific applications. However, this system did not provide the functionality needed for a viable production system or address whether such a system could scale. Therefore, we have extended our previous work to create EMPRESS 2.0, which incorporates the features required for a useful production system. Through a discussion of EMPRESS 2.0, this paper explores how to incorporate rich query functionality, fault tolerance, and atomic operations into a scalable, storage system independent metadata management system that is easy to use. This paper demonstrates that such a system offers significant performance advantages over HDF5, providing metadata querying that is 150X to 650X faster, and can greatly accelerate post-processing. Finally, since the current implementation of EMPRESS 2.0 relies on an RDBMS, this paper demonstrates that an RDBMS is a viable technology for managing data-oriented metadata.

2019-08-26

Ozeer, Umar, Etchevers, Xavier, Letondeur, Loïc, Ottogalli, Fran\c cois-Gaël, Salaün, Gwen, Vincent, Jean-Marc. 2018. Resilience of Stateful IoT Applications in a Dynamic Fog Environment. Proceedings of the 15th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services. :332-341.

Fog computing provides computing, storage and communication resources at the edge of the network, near the physical world. Subsequently, end devices nearing the physical world can have interesting properties such as short delays, responsiveness, optimized communications and privacy. However, these end devices have low stability and are prone to failures. There is consequently a need for failure management protocols for IoT applications in the Fog. The design of such solutions is complex due to the specificities of the environment, i.e., (i) dynamic infrastructure where entities join and leave without synchronization, (ii) high heterogeneity in terms of functions, communication models, network, processing and storage capabilities, and, (iii) cyber-physical interactions which introduce non-deterministic and physical world's space and time dependent events. This paper presents a fault tolerance approach taking into account these three characteristics of the Fog-IoT environment. Fault tolerance is achieved by saving the state of the application in an uncoordinated way. When a failure is detected, notifications are propagated to limit the impact of failures and dynamically reconfigure the application. Data stored during the state saving process are used for recovery, taking into account consistency with respect to the physical world. The approach was validated through practical experiments on a smart home platform.

2019-03-25

Hasan, K., Shetty, S., Hassanzadeh, A., Salem, M. B., Chen, J.. 2018. Self-Healing Cyber Resilient Framework for Software Defined Networking-Enabled Energy Delivery System. 2018 IEEE Conference on Control Technology and Applications (CCTA). :1692–1697.

Software defined networking (SDN) is a networking paradigm to provide automated network management at run time through network orchestration and virtualization. SDN can also enhance system resilience through recovery from failures and maintaining critical operations during cyber attacks. SDN's self-healing mechanisms can be leveraged to realized autonomous attack containment, which dynamically modifies access control rules based on configurable trust levels. In this paper, we present an approach to aid in selection of security countermeasures dynamically in an SDN enabled Energy Delivery System (EDS) and achieving tradeoff between providing security and QoS. We present the modeling of security cost based on end-to-end packet delay and throughput. We propose a non-dominated sorting based multi-objective optimization framework which can be implemented within an SDN controller to address the joint problem of optimizing between security and QoS parameters by alleviating time complexity at O(M N2), where M is the number of objective functions and N is the number of population for each generation respectively. We present simulation results which illustrate how data availability and data integrity can be achieved while maintaining QoS constraints.

2019-03-22

Liu, Y., Li, X., Xiao, L.. 2018. Service Oriented Resilience Strategy for Cloud Data Center. 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C). :269-274.

As an information hinge of various trades and professions in the era of big data, cloud data center bears the responsibility to provide uninterrupted service. To cope with the impact of failure and interruption during the operation on the Quality of Service (QoS), it is important to guarantee the resilience of cloud data center. Thus, different resilience actions are conducted in its life circle, that is, resilience strategy. In order to measure the effect of resilience strategy on the system resilience, this paper propose a new approach to model and evaluate the resilience strategy for cloud data center focusing on its core part of service providing-IT architecture. A comprehensive resilience metric based on resilience loss is put forward considering the characteristic of cloud data center. Furthermore, mapping model between system resilience and resilience strategy is built up. Then, based on a hierarchical colored generalized stochastic petri net (HCGSPN) model depicting the procedure of the system processing the service requests, simulation is conducted to evaluate the resilience strategy through the metric calculation. With a case study of a company's cloud data center, the applicability and correctness of the approach is demonstrated.

2018-12-10

Ma, L. M., IJtsma, M., Feigh, K. M., Paladugu, A., Pritchett, A. R.. 2018. Modelling and evaluating failures in human-robot teaming using simulation. 2018 IEEE Aerospace Conference. :1–16.

As robotic capabilities improve and robots become more capable as team members, a better understanding of effective human-robot teaming is needed. In this paper, we investigate failures by robots in various team configurations in space EVA operations. This paper describes the methodology of extending and the application of Work Models that Compute (WMC), a computational simulation framework, to model robot failures, interruptions, and the resolutions they require. Using these models, we investigate how different team configurations respond to a robot's failure to correctly complete the task and overall mission. We also identify key factors that impact the teamwork metrics for team designers to keep in mind while assembling teams and assigning taskwork to the agents. We highlight different metrics that these failures impact on team performance through varying components of teaming and interaction that occur. Finally, we discuss the future implications of this work and the future work to be done to investigate function allocation in human-robot teams.

2018-09-12

Hoepman, Jaap-Henk. 2017. Privacy Friendly Aggregation of Smart Meter Readings, Even When Meters Crash. Proceedings of the 2Nd Workshop on Cyber-Physical Security and Resilience in Smart Grids. :3–7.

A well studied privacy problem in the area of smart grids is the question of how to aggregate the sum of a set of smart meter readings in a privacy friendly manner, i.e., in such a way that individual meter readings are not revealed to the adversary. Much less well studied is how to deal with arbitrary meter crashes during such aggregation protocols: current privacy friendly aggregation protocols cannot deal with these type of failures. Such failures do happen in practice, though. We therefore propose two privacy friendly aggregation protocols that tolerate such crash failures, up to a predefined maximum number of smart meters. The basic protocol tolerates meter crashes at the start of each aggregation round only. The full, more complex, protocol tolerates meter crashes at arbitrary moments during an aggregation round. It runs in a constant number of phases, cleverly avoiding the otherwise applicable consensus protocol lower bound.

2018-08-23

Vora, Keval, Tian, Chen, Gupta, Rajiv, Hu, Ziang. 2017. CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing. Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. :223–236.

Existing distributed asynchronous graph processing systems employ checkpointing to capture globally consistent snapshots and rollback all machines to most recent checkpoint to recover from machine failures. In this paper we argue that recovery in distributed asynchronous graph processing does not require the entire execution state to be rolled back to a globally consistent state due to the relaxed asynchronous execution semantics. We define the properties required in the recovered state for it to be usable for correct asynchronous processing and develop CoRAL, a lightweight checkpointing and recovery algorithm. First, this algorithm carries out confined recovery that only rolls back graph execution states of the failed machines to affect recovery. Second, it relies upon lightweight checkpoints that capture locally consistent snapshots with a reduced peak network bandwidth requirement. Our experiments using real-world graphs show that our technique recovers from failures and finishes processing 1.5x to 3.2x faster compared to the traditional asynchronous checkpointing and recovery mechanism when failures impact 1 to 6 machines of a 16 machine cluster. Moreover, capturing locally consistent snapshots significantly reduces intermittent high peak bandwidth usage required to save the snapshots – the average reduction in 99th percentile bandwidth ranges from 22% to 51% while 1 to 6 snapshot replicas are being maintained.

Chaturvedi, P., Daniel, A. K.. 2017. Trust aware node scheduling protocol for target coverage using rough set theory. 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT). :511–514.

Wireless sensor networks have achieved the substantial research interest in the present time because of their unique features such as fault tolerance, autonomous operation etc. The coverage maximization while considering the resource scarcity is a crucial problem in the wireless sensor networks. The approaches which address these problems and maximize the network lifetime are considered prominent. The node scheduling is such mechanism to address this issue. The scheduling strategy which addresses the target coverage problem based on coverage probability and trust values is proposed in Energy Efficient Coverage Protocol (EECP). In this paper the optimized decision rules is obtained by using the rough set theory to determine the number of active nodes. The results show that the proposed extension results in the lesser number of decision rules to consider in determination of node states in the network, hence it improves the network efficiency by reducing the number of packets transmitted and reducing the overhead.

2018-06-07

Marques, J., Andrade, J., Falcao, G.. 2017. Unreliable memory operation on a convolutional neural network processor. 2017 IEEE International Workshop on Signal Processing Systems (SiPS). :1–6.

The evolution of convolutional neural networks (CNNs) into more complex forms of organization, with additional layers, larger convolutions and increasing connections, established the state-of-the-art in terms of accuracy errors for detection and classification challenges in images. Moreover, as they evolved to a point where Gigabytes of memory are required for their operation, we have reached a stage where it becomes fundamental to understand how their inference capabilities can be impaired if data elements somehow become corrupted in memory. This paper introduces fault-injection in these systems by simulating failing bit-cells in hardware memories brought on by relaxing the 100% reliable operation assumption. We analyze the behavior of these networks calculating inference under severe fault-injection rates and apply fault mitigation strategies to improve on the CNNs resilience. For the MNIST dataset, we show that 8x less memory is required for the feature maps memory space, and that in sub-100% reliable operation, fault-injection rates up to 10-1 (with most significant bit protection) can withstand only a 1% error probability degradation. Furthermore, considering the offload of the feature maps memory to an embedded dynamic RAM (eDRAM) system, using technology nodes from 65 down to 28 nm, up to 73 80% improved power efficiency can be obtained.

2018-05-24

Peisert, Sean, Bishop, Matt, Talbot, Ed. 2017. A Model of Owner Controlled, Full-Provenance, Non-Persistent, High-Availability Information Sharing. Proceedings of the 2017 New Security Paradigms Workshop. :80–89.

In this paper, we propose principles of information control and sharing that support ORCON (ORiginator COntrolled access control) models while simultaneously improving components of confidentiality, availability, and integrity needed to inherently support, when needed, responsibility to share policies, rapid information dissemination, data provenance, and data redaction. This new paradigm of providing unfettered and unimpeded access to information by authorized users, while at the same time, making access by unauthorized users impossible, contrasts with historical approaches to information sharing that have focused on need to know rather than need to (or responsibility to) share.

2018-05-16

Hukerikar, Saurabh, Ashraf, Rizwan A., Engelmann, Christian. 2017. Towards New Metrics for High-Performance Computing Resilience. Proceedings of the 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale. :23–30.

Ensuring the reliability of applications is becoming an increasingly important challenge as high-performance computing (HPC) systems experience an ever-growing number of faults, errors and failures. While the HPC community has made substantial progress in developing various resilience solutions, it continues to rely on platform-based metrics to quantify application resiliency improvements. The resilience of an HPC application is concerned with the reliability of the application outcome as well as the fault handling efficiency. To understand the scope of impact, effective coverage and performance efficiency of existing and emerging resilience solutions, there is a need for new metrics. In this paper, we develop new ways to quantify resilience that consider both the reliability and the performance characteristics of the solutions from the perspective of HPC applications. As HPC systems continue to evolve in terms of scale and complexity, it is expected that applications will experience various types of faults, errors and failures, which will require applications to apply multiple resilience solutions across the system stack. The proposed metrics are intended to be useful for understanding the combined impact of these solutions on an application's ability to produce correct results and to evaluate their overall impact on an application's performance in the presence of various modes of faults.

2018-05-01

Srinivasan, Avinash, Dong, Hunter, Stavrou, Angelos. 2017. FROST: Anti-Forensics Digital-Dead-DROp Information Hiding RobuST to Detection & Data Loss with Fault Tolerance. Proceedings of the 12th International Conference on Availability, Reliability and Security. :82:1–82:8.

Covert operations involving clandestine dealings and communication through cryptic and hidden messages have existed since time immemorial. While these do have a negative connotation, they have had their fair share of use in situations and applications beneficial to society in general. A "Dead Drop" is one such method of espionage trade craft used to physically exchange items or information between two individuals using a secret rendezvous point. With a "Dead Drop", to maintain operational security, the exchange itself is asynchronous. Information hiding in the slack space is one modern technique that has been used extensively. Slack space is the unused space within the last block allocated to a stored file. However, hiding in slack space operates under significant constraints with little resilience and fault tolerance. In this paper, we propose FROST – a novel asynchronous "Digital Dead Drop" robust to detection and data loss with tunable fault tolerance. Fault tolerance is a critical attribute of a secure and robust system design. Through extensive validation of FROST prototype implementation on Ubuntu Linux, we confirm the performance and robustness of the proposed digital dead drop to detection and data loss. We verify the recoverability of the secret message under various operating conditions ranging from block corruption and drive de-fragmentation to growing existing files on the target drive.

2018-03-19

Popov, P.. 2017. Models of Reliability of Fault-Tolerant Software Under Cyber-Attacks. 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE). :228–239.

This paper offers a new approach to modelling the effect of cyber-attacks on reliability of software used in industrial control applications. The model is based on the view that successful cyber-attacks introduce failure regions, which are not present in non-compromised software. The model is then extended to cover a fault tolerant architecture, such as the 1-out-of-2 software, popular for building industrial protection systems. The model is used to study the effectiveness of software maintenance policies such as patching and "cleansing" ("proactive recovery") under different adversary models ranging from independent attacks to sophisticated synchronized attacks on the channels. We demonstrate that the effect of attacks on reliability of diverse software significantly depends on the adversary model. Under synchronized attacks system reliability may be more than an order of magnitude worse than under independent attacks on the channels. These findings, although not surprising, highlight the importance of using an adequate adversary model in the assessment of how effective various cyber-security controls are.

Showkatbakhsh, M., Shoukry, Y., Chen, R. H., Diggavi, S., Tabuada, P.. 2017. An SMT-Based Approach to Secure State Estimation under Sensor and Actuator Attacks. 2017 IEEE 56th Annual Conference on Decision and Control (CDC). :157–162.

This paper addresses the problem of state estimation of a linear time-invariant system when some of the sensors or/and actuators are under adversarial attack. In our set-up, the adversarial agent attacks a sensor (actuator) by manipulating its measurement (input), and we impose no constraint on how the measurements (inputs) are corrupted. We introduce the notion of ``sparse strong observability'' to characterize systems for which the state estimation is possible, given bounds on the number of attacked sensors and actuators. Furthermore, we develop a secure state estimator based on Satisfiability Modulo Theory (SMT) solvers.

2018-02-28

Zhang, N., Sirbu, M. A., Peha, J. M.. 2017. A comparison of migration and multihoming support in IPv6 and XIA. 2017 International Symposium on Networks, Computers and Communications (ISNCC). :1–8.

Mobility and multihoming have become the norm in Internet access, e.g. smartphones with Wi-Fi and LTE, and connected vehicles with LTE and DSRC links that change rapidly. Mobility creates challenges for active session continuity when provider-aggregatable locators are used, while multihoming brings opportunities for improving resiliency and allocative efficiency. This paper proposes a novel migration protocol, in the context of the eXpressive Internet Architecture (XIA), the XIA Migration Protocol. We compare it with Mobile IPv6, with respect to handoff latency and overhead, flow migration support, and defense against spoofing and replay of protocol messages. Handoff latencies of the XIA Migration Protocol and Mobile IPv6 Enhanced Route Optimization are comparable and neither protocol opens up avenues for spoofing or replay attacks. However, XIA requires no mobility anchor point to support client mobility while Mobile IPv6 always depends on a home agent. We show that XIA has significant advantage over IPv6 for multihomed hosts and networks in terms of resiliency, scalability, load balancing and allocative efficiency. IPv6 multihoming solutions either forgo scalability (BGP-based) or sacrifice resiliency (NAT-based), while XIA's fallback-based multihoming provides fault tolerance without a heavy-weight protocol. XIA also allows fine-grained incoming load-balancing and QoS-matching by supporting flow migration. Flow migration is not possible using Mobile IPv6 when a single IPv6 address is associated with multiple flows. From a protocol design and architectural perspective, the key enablers of these benefits are flow-level migration, XIA's DAG-based locators and self-certifying identifiers.

2018-02-15

Dong, H., Ma, T., He, B., Zheng, J., Liu, G.. 2017. Multiple-fault diagnosis of analog circuit with fault tolerance. 2017 6th Data Driven Control and Learning Systems (DDCLS). :292–296.

A novel method, consisting of fault detection, rough set generation, element isolation and parameter estimation is presented for multiple-fault diagnosis on analog circuit with tolerance. Firstly, a linear-programming concept is developed to transform fault detection of circuit with limited accessible terminals into measurement to check existence of a feasible solution under tolerance constraints. Secondly, fault characteristic equation is deduced to generate a fault rough set. It is proved that the node voltages of nominal circuit can be used in fault characteristic equation with fault tolerance. Lastly, fault detection of circuit with revised deviation restriction for suspected fault elements is proceeded to locate faulty elements and estimate their parameters. The diagnosis accuracy and parameter identification precision of the method are verified by simulation results.