Visible to the public Biblio

Filters: Keyword is self-healing  [Clear All Filters]
2022-04-20
Ratasich, Denise, Khalid, Faiq, Geissler, Florian, Grosu, Radu, Shafique, Muhammad, Bartocci, Ezio.  2019.  A Roadmap Toward the Resilient Internet of Things for Cyber-Physical Systems. IEEE Access. 7:13260–13283.
The Internet of Things (IoT) is a ubiquitous system connecting many different devices - the things - which can be accessed from the distance. The cyber-physical systems (CPSs) monitor and control the things from the distance. As a result, the concepts of dependability and security get deeply intertwined. The increasing level of dynamicity, heterogeneity, and complexity adds to the system's vulnerability, and challenges its ability to react to faults. This paper summarizes the state of the art of existing work on anomaly detection, fault-tolerance, and self-healing, and adds a number of other methods applicable to achieve resilience in an IoT. We particularly focus on non-intrusive methods ensuring data integrity in the network. Furthermore, this paper presents the main challenges in building a resilient IoT for the CPS, which is crucial in the era of smart CPS with enhanced connectivity (an excellent example of such a system is connected autonomous vehicles). It further summarizes our solutions, work-in-progress and future work to this topic to enable ``Trustworthy IoT for CPS''. Finally, this framework is illustrated on a selected use case: a smart sensor infrastructure in the transport domain.
Conference Name: IEEE Access
2021-11-29
Lyons, D., Zahra, S..  2020.  Using Taint Analysis and Reinforcement Learning (TARL) to Repair Autonomous Robot Software. 2020 IEEE Security and Privacy Workshops (SPW). :181–184.
It is important to be able to establish formal performance bounds for autonomous systems. However, formal verification techniques require a model of the environment in which the system operates; a challenge for autonomous systems, especially those expected to operate over longer timescales. This paper describes work in progress to automate the monitor and repair of ROS-based autonomous robot software written for an apriori partially known and possibly incorrect environment model. A taint analysis method is used to automatically extract the dataflow sequence from input topic to publish topic, and instrument that code. A unique reinforcement learning approximation of MDP utility is calculated, an empirical and non-invasive characterization of the inherent objectives of the software designers. By comparing design (a-priori) utility with deploy (deployed system) utility, we show, using a small but real ROS example, that it's possible to monitor a performance criterion and relate violations of the criterion to parts of the software. The software is then patched using automated software repair techniques and evaluated against the original off-line utility.
2021-03-22
Li, C.-Y., Chang, C.-H., Lu, D.-Y..  2020.  Full-Duplex Self-Recovery Optical Fibre Transport System Based on a Passive Single-Line Bidirectional Optical Add/Drop Multiplexer. IEEE Photonics Journal. 12:1–10.
A full-duplex self-recovery optical fibre transport system is proposed on the basis of a novel passive single-line bidirectional optical add/drop multiplexer (SBOADM). This system aims to achieve an access network with low complexity and network protection capability. Polarisation division multiplexing technique, optical double-frequency application and wavelength reuse method are also employed in the transport system to improve wavelength utilisation efficiency and achieve colourless optical network unit. When the network comprises a hybrid tree-ring topology, the downstream signals can be bidirectionally transmitted and the upstream signals can continuously be sent back to the central office in the reverse pathways due to the remarkable routing function of the SBOADM. Thus, no complicated optical multiplexer/de-multiplexer components or massive optical switches are required in the transport system. If a fibre link failure occurs in the ring topology, then the blocked network connections can be recovered by switching only a single optical switch preinstalled in the remote node. Simulation results show that the proposed architecture can recover the network function effectively and provide identical transmission performance to overcome the impact of a breakpoint in the network. The proposed transport system presents remarkable flexibility and convenience in expandability and breakpoint self-recovery.
2020-02-17
Zhao, Guowei, Zhao, Rui, Wang, Qiang, Xue, Hui, Luo, Fang.  2019.  Virtual Network Mapping Algorithm for Self-Healing of Distribution Network. 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). :1442–1445.
This paper focuses on how to provide virtual network (VN) with the survivability of node failure. In the SVNE that responds to node failures, the backup mechanism provided by the VN initial mapping method should be as flexible as possible, so that backup resources can be shared among the VNs, thereby providing survivability support for the most VNs with the least backup overhead, which can improve The utilization of backup resources can also improve the survivability of VN to deal with multi-node failures. For the remapping method of virtual networks, it needs to be higher because it involves both remapping of virtual nodes and remapping of related virtual links. The remapping efficiency, so as to restore the affected VN to a normal state as soon as possible, to avoid affecting the user's business experience. Considering that the SVNE method that actively responds to node failures always has a certain degree of backup resource-specific phenomenon, this section provides a SVNE method that passively responds to node failures. This paper mainly introduces the survivability virtual network initial mapping method based on physical node recoverability in this method.
Maykot, Arthur S., Aranha Neto, Edison A. C., Oliva, Neimar A..  2019.  Automation of Manual Switches in Distribution Networks Focused on Self-Healing: A Step toward Smart Grids. 2019 IEEE PES Innovative Smart Grid Technologies Conference - Latin America (ISGT Latin America). :1–4.
This work describes the self-healing systems and their benefits in the power distribution networks, with the objective of indicating which manual switch should become, as a matter of priority, automatic. The computational tool used is based on graph theory, genetic algorithms and multicriteria evaluation. There are benefits for consumers, that will benefit from a more reliable and stable system, and for the utility, that can reduce costs with team field and financial compensations payed to consumers in case of continuity indexes violation. Data from a real distribution network from the state of Sao Paulo will be used as a case study for the application of the methodology.
Leite, Leonardo H. M., do Couto Boaventura, Wallace, de Errico, Luciano, Machado Alessi, Pedro.  2019.  Self-Healing in Distribution Grids Supported by Photovoltaic Dispersed Generation in a Voltage Regulation Perspective. 2019 IEEE PES Innovative Smart Grid Technologies Conference - Latin America (ISGT Latin America). :1–6.
Distributed Generation Photovoltaic Systems -DGPV - connected to the power distribution grid through electronic inverters can contribute, in an aggregate scenario, to the performance of several power system control functions, notably in self-healing and voltage regulation along a distribution feeder. This paper proposes the use of an optimization method for voltage regulation, focused on reactive power injection control, based on a comprehensive architecture model that coordinates multiple photovoltaic distributed sources to support grid reconfiguration after self-healing action. A sensitivity analysis regarding the performance of voltage regulation, based on a co-simulation of PSCAD and MatLab, shows the effectiveness of using dispersed generation sources to assist grid reconfiguration after disturbances caused by severe faults.
Khalil, Kasem, Eldash, Omar, Kumar, Ashok, Bayoumi, Magdy.  2019.  Self-Healing Approach for Hardware Neural Network Architecture. 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS). :622–625.
Neural Network is used in many applications and guarding its performance against faults is a research challenge. Self-healing neural network is a promising concept for achieving reliability, which is the ability to detect and fix a fault in the system automatically. Most of the current self-healing neural network are based on replication of hardware nodes which causes significant area overhead. The proposed self-healing approach results in a modest area overhead and it is suitable for complex neural network. The proposed method is based on a shared operation and a spare node in each layer which compensates for any faulty node in the layer. Each faulty node will be compensated by its neighbor node, and the neighbor node performs the faulty node as well as its own operations sequentially. In the case the neighbor is faulty, the spare node will compensate for it. The proposed method is implemented using VHDL and the simulation results are obtained using Altira 10 GX FPGA for a different number of nodes. The area overhead is very small for a complex network. The reliability of the proposed method is studied and compared with the traditional neural network.
Firdaus, Muhammad, Haryadi, Sigit, Shalannanda, Wervyan.  2019.  Sleeping Cell Analysis in LTE Network with Self-Healing Approach. 2019 IEEE 13th International Conference on Telecommunication Systems, Services, and Applications (TSSA). :261–266.
In cellular network systems, it is commonly found that many errors or failures are caused by non-functioning components or human errors. Most failures are detected by a centralized Operation and Maintenance (OAM) software which will trigger an alarm as a form of warning. In fact, there are conditions when a failure or error occurs, but it cannot be detected by OAM software, which in turn will result in many complaints coming from customers. An event like this is called a sleeping cell, which is a condition where the network has a poor performance but does not generate alarm notifications in the Operation and Maintenance Center. In this paper, sleeping cell analysis was carried out on the LTE network using a self-healing approach to speed up the cell outage detection process. The process of sleeping cell analysis was based on the database of cell performance daily for all eNodeB located in West Java, referring the uplink and downlink values as the main parameters. The acquired database would then be processed and analyzed by the measurement method based on inference statistics, where this method would process a portion of the research data (sample), to draw the conclusions regarding the characteristics of the overall data population. Furthermore, data analysis was performed with signaling ladder diagram (SLD) approach to observe the signaling flow on the network, specifically in the uplink and downlink process, which is the initial indication of a sleeping cell.
Broomandi, Fateme, Ghasemi, Abdorasoul.  2019.  An Improved Cooperative Cell Outage Detection in Self-Healing Het Nets Using Optimal Cooperative Range. 2019 27th Iranian Conference on Electrical Engineering (ICEE). :1956–1960.
Heterogeneous Networks (Het Nets) are introduced to fulfill the increasing demands of wireless communications. To be manageable, it is expected that these networks are self-organized and in particular, self-healing to detect and relief faults autonomously. In the Cooperative Cell Outage Detection (COD), the Macro-Base Station (MBS) and a group of Femto-Base Stations (FBSs) in a specific range are cooperatively communicating to find out if each FBS is working properly or not. In this paper, we discuss the impacts of the cooperation range on the detection delay and accuracy and then conclude that there is an optimal amount for cooperation range which maximizes detection accuracy. We then derive the optimal cooperative range that improves the detection accuracy by using network parameters such as FBS's transmission power, noise power, shadowing fading factor, and path-loss exponent and investigate the impacts of these parameters on the optimal cooperative range. The simulation results show the optimal cooperative range that we proposed maximizes the detection accuracy.
2019-12-09
Tomić, Ivana, Chen, Po-Yu, Breza, Michael J., McCann, Julie A..  2018.  Antilizer: Run Time Self-Healing Security for Wireless Sensor Networks. Proceedings of the 15th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services. :107–116.
Wireless Sensor Network (WSN) applications range from domestic Internet of Things systems like temperature monitoring of homes to the monitoring and control of large-scale critical infrastructures. The greatest risk with the use of WSNs in critical infrastructure is their vulnerability to malicious network level attacks. Their radio communication network can be disrupted, causing them to lose or delay data which will compromise system functionality. This paper presents Antilizer, a lightweight, fully-distributed solution to enable WSNs to detect and recover from common network level attack scenarios. In Antilizer each sensor node builds a self-referenced trust model of its neighbourhood using network overhearing. The node uses the trust model to autonomously adapt its communication decisions. In the case of a network attack, a node can make neighbour collaboration routing decisions to avoid affected regions of the network. Mobile agents further bound the damage caused by attacks. These agents enable a simple notification scheme which propagates collaborative decisions from the nodes to the base station. A filtering mechanism at the base station further validates the authenticity of the information shared by mobile agents. We evaluate Antilizer in simulation against several routing attacks. Our results show that Antilizer reduces data loss down to 1% (4% on average), with operational overheads of less than 1% and provides fast network-wide convergence.
2019-03-25
Refaat, S. S., Mohamed, A., Kakosimos, P..  2018.  Self-Healing control strategy; Challenges and opportunities for distribution systems in smart grid. 2018 IEEE 12th International Conference on Compatibility, Power Electronics and Power Engineering (CPE-POWERENG 2018). :1–6.
Implementation of self-healing control system in smart grid is a persisting challenge. Self-Healing control strategy is the important guarantee to implement the smart grid. In addition, it is the support of achieving the secure operation, improving the reliability and security of distribution grid, and realizing the smart distribution grid. Although self-healing control system concept is presented in smart grid context, but the complexity of distribution network structure recommended to choose advanced control and protection system using a self-healing, this system must be able to heal any disturbance in the distribution system of smart grid to improve efficiency, resiliency, continuity, and reliability of the smart grid. This review focuses mostly on the key technology of self-healing control, gives an insight into the role of self-healing in distribution system advantages, study challenges and opportunities in the prospect of utilities. The main contribution of this paper is demonstrating proposed architecture, control strategy for self-healing control system includes fault detection, fault localization, faulted area isolation, and power restoration in the electrical distribution system.
Ali-Tolppa, J., Kocsis, S., Schultz, B., Bodrog, L., Kajo, M..  2018.  SELF-HEALING AND RESILIENCE IN FUTURE 5G COGNITIVE AUTONOMOUS NETWORKS. 2018 ITU Kaleidoscope: Machine Learning for a 5G Future (ITU K). :1–8.
In the Self-Organizing Networks (SON) concept, self-healing functions are used to detect, diagnose and correct degraded states in the managed network functions or other resources. Such methods are increasingly important in future network deployments, since ultra-high reliability is one of the key requirements for the future 5G mobile networks, e.g. in critical machine-type communication. In this paper, we discuss the considerations for improving the resiliency of future cognitive autonomous mobile networks. In particular, we present an automated anomaly detection and diagnosis function for SON self-healing based on multi-dimensional statistical methods, case-based reasoning and active learning techniques. Insights from both the human expert and sophisticated machine learning methods are combined in an iterative way. Additionally, we present how a more holistic view on mobile network self-healing can improve its performance.
Pournaras, E., Ballandies, M., Acharya, D., Thapa, M., Brandt, B..  2018.  Prototyping Self-Managed Interdependent Networks - Self-Healing Synergies against Cascading Failures. 2018 IEEE/ACM 13th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS). :119–129.
The interconnection of networks between several techno-socio-economic sectors such as energy, transport, and communication, questions the manageability and resilience of the digital society. System interdependencies alter the fundamental dynamics that govern isolated systems, which can unexpectedly trigger catastrophic instabilities such as cascading failures. This paper envisions a general-purpose, yet simple prototyping of self-management software systems that can turn system interdependencies from a cause of instability to an opportunity for higher resilience. Such prototyping proves to be challenging given the highly interdisciplinary scope of interdependent networks. Different system dynamics and organizational constraints such as the distributed nature of interdependent networks or the autonomy and authority of system operators over their controlled infrastructure perplex the design for a general prototyping approach, which earlier work has not yet addressed. This paper contributes such a modular design solution implemented as an open source software extension of SFINA, the Simulation Framework for Intelligent Network Adaptations. The applicability of the software artifact is demonstrated with the introduction of a novel self-healing mechanism for interdependent power networks, which optimizes power flow exchanges between a damaged and a healer network to mitigate power cascading failures. Results show a significant decrease in the damage spread by self-healing synergies, while the degree of interconnectivity between the power networks indicates a tradeoff between links survivability and load served. The contributions of this paper aspire to bring closer several research communities working on modeling and simulation of different domains with an economic and societal impact on the resilience of real-world interdependent networks.
2018-03-05
Mfula, H., Nurminen, J. K..  2017.  Adaptive Root Cause Analysis for Self-Healing in 5G Networks. 2017 International Conference on High Performance Computing Simulation (HPCS). :136–143.

Root cause analysis (RCA) is a common and recurring task performed by operators of cellular networks. It is done mainly to keep customers satisfied with the quality of offered services and to maximize return on investment (ROI) by minimizing and where possible eliminating the root causes of faults in cellular networks. Currently, the actual detection and diagnosis of faults or potential faults is still a manual and slow process often carried out by network experts who manually analyze and correlate various pieces of network data such as, alarms, call traces, configuration management (CM) and key performance indicator (KPI) data in order to come up with the most probable root cause of a given network fault. In this paper, we propose an automated fault detection and diagnosis solution called adaptive root cause analysis (ARCA). The solution uses measurements and other network data together with Bayesian network theory to perform automated evidence based RCA. Compared to the current common practice, our solution is faster due to automation of the entire RCA process. The solution is also cheaper because it needs fewer or no personnel in order to operate and it improves efficiency through domain knowledge reuse during adaptive learning. As it uses a probabilistic Bayesian classifier, it can work with incomplete data and it can handle large datasets with complex probability combinations. Experimental results from stratified synthesized data affirmatively validate the feasibility of using such a solution as a key part of self-healing (SH) especially in emerging self-organizing network (SON) based solutions in LTE Advanced (LTE-A) and 5G.

Khalil, K., Eldash, O., Bayoumi, M..  2017.  Self-Healing Router Architecture for Reliable Network-on-Chips. 2017 24th IEEE International Conference on Electronics, Circuits and Systems (ICECS). :330–333.

NoCs are a well established research topic and several Implementations have been proposed for Self-healing. Self-healing refers to the ability of a system to detect faults or failures and fix them through healing or repairing. The main problems in current self-healing approaches are area overhead and scalability for complex structure since they are based on redundancy and spare blocks. Also, faulty router can isolate PE from other router nodes which can reduce the overall performance of the system. This paper presents a self-healing for a router to avoid denied fault PE function and isolation PE from other nodes. In the proposed design, the neighbor routers receive signal from a faulty router which keeps them to send the data packet which has only faulted router destination to a faulty router. Control unite turns on switches to connect four input ports to local ports successively to send coming packets to PE. The reliability of the proposed technique is studied and compared to conventional system with different failure rates. This approach is capable of healing 50% of the router. The area overhead is 14% for the proposed approach which is much lower compared to other approaches using redundancy.

Nogueira, Carlos E. R., Boaventura, Wallace C., Takahashi, Ricardo H. C., Carrano, Eduardo G..  2017.  Restoration of Power Distribution Networks: A Fast Evolutionary Approach Based on Practical Perspectives. Proceedings of the Genetic and Evolutionary Computation Conference Companion. :1295–1302.

The restoration of power distribution systems has a crucial role in the electric utility environment, taking into account both the pressure experienced by the operators that must choose the corrective actions to be followed in emergency restoration plans and the goals imposed by the regulatory agencies. In this sense, decision-aiding systems and self-healing networks may be good alternatives since they either perform an automated analysis of the situation, providing consistent and high-quality restoration plans, or even directly perform the restoration fast and automatically in both cases reducing the impacts caused by network disturbances. This work proposes a new restoration strategy which is novel in the sense it deals with the problem from the operator viewpoint, without simplifications that are used in most literature works. In this proposal, a permutation based genetic algorithm is employed to restore the maximum amount of loads, in real time, without depending on a priori knowledge of the location of the fault. To validate the proposed methodology two large real systems were tested: one with 2 substations, 5 feeders, 703 buses, and 132 switches, and; the other with 3 substations, 7 feeders, 21,633 buses, and 2,808 switches. These networks were tested considering situations of single and multiple failures. The results obtained were achieved with very low processing time (of the order of ten seconds), while compliance with all operational requirements was ensured.

2017-05-30
Castañeda, Armando, Dolev, Danny, Trehan, Amitabh.  2016.  Compact Routing Messages in Self-healing Trees. Proceedings of the 17th International Conference on Distributed Computing and Networking. :23:1–23:10.

Existing compact routing schemes, e.g., Thorup and Zwick [SPAA 2001] and Chechik [PODC 2013], often have no means to tolerate failures, once the system has been setup and started. This paper presents, to our knowledge, the first self-healing compact routing scheme. Besides, our schemes are developed for low memory nodes, i.e., nodes need only O(log2 n) memory, and are thus, compact schemes. We introduce two algorithms of independent interest: The first is CompactFT, a novel compact version (using only O(log n) local memory) of the self-healing algorithm Forgiving Tree of Hayes et al. [PODC 2008]. The second algorithm (CompactFTZ) combines CompactFT with Thorup-Zwick's tree-based compact routing scheme [SPAA 2001] to produce a fully compact self-healing routing scheme. In the self-healing model, the adversary deletes nodes one at a time with the affected nodes self-healing locally by adding few edges. CompactFT recovers from each attack in only O(1) time and Δ messages, with only +3 degree increase and O(logΔ) graph diameter increase, over any sequence of deletions (Δ is the initial maximum degree). Additionally, CompactFTZ guarantees delivery of a packet sent from sender s as long as the receiver t has not been deleted, with only an additional O(y logΔ) latency, where y is the number of nodes that have been deleted on the path between s and t. If t has been deleted, s gets informed and the packet removed from the network.

Angarita, Rafael, Rukoz, Marta, Manouvrier, Maude, Cardinale, Yudith.  2016.  A Knowledge-based Approach for Self-healing Service-oriented Applications. Proceedings of the 8th International Conference on Management of Digital EcoSystems. :1–8.

In the context of service-oriented applications, the self-healing property provides reliable execution in order to support failures and assist automatic recovery techniques. This paper presents a knowledge-based approach for self-healing Composite Service (CS) applications. A CS is an application composed by a set of services interacting each other and invoked on the Web. Our approach is supported by Service Agents, which are in charge of the CS fault-tolerance execution control, making decisions about the selection of recovery and proactive strategies. Service Agents decisions are based on the information they have about the whole application, about themselves, and about what it is expected and what it is really happening at run-time. Hence, application knowledge for decision making comprises off-line precomputed global and local information, user QoS preferences, and propagated actual run-time information. Our approach is evaluated experimentally using a case study.

Azaiez, Meriem, Chainbi, Walid.  2016.  A Multi-agent System Architecture for Self-Healing Cloud Infrastructure. Proceedings of the International Conference on Internet of Things and Cloud Computing. :7:1–7:6.

The popularity of Cloud computing has considerably increased during the last years. The increase of Cloud users and their interactions with the Cloud infrastructure raise the risk of resources faults. Such a problem can lead to a bad reputation of the Cloud environment which slows down the evolution of this technology. To address this issue, the dynamic and the complex architecture of the Cloud should be taken into account. Indeed, this architecture requires that resources protection and healing must be transparent and without external intervention. Unlike previous work, we suggest integrating the fundamental aspects of autonomic computing in the Cloud to deal with the self-healing of Cloud resources. Starting from the high degree of match between autonomic computing systems and multiagent systems, we propose to take advantage from the autonomous behaviour of agent technology to create an intelligent Cloud that supports autonomic aspects. Our proposed solution is a multi-agent system which interacts with the Cloud infrastructure to analyze the resources state and execute Checkpoint/Replication strategy or migration technique to solve the problem of failed resources.

2015-05-04
Jantsch, A., Tammemae, K..  2014.  A framework of awareness for artificial subjects. Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2014 International Conference on. :1-3.

A small battery driven bio-patch, attached to the human body and monitoring various vital signals such as temperature, humidity, heart activity, muscle and brain activity, is an example of a highly resource constrained system, that has the demanding task to assess correctly the state of the monitored subject (healthy, normal, weak, ill, improving, worsening, etc.), and its own capabilities (attached to subject, working sensors, sufficient energy supply, etc.). These systems and many other systems would benefit from a sense of itself and its environment to improve robustness and sensibility of its behavior. Although we can get inspiration from fields like neuroscience, robotics, AI, and control theory, the tight resource and energy constraints imply that we have to understand accurately what technique leads to a particular feature of awareness, how it contributes to improved behavior, and how it can be implemented cost-efficiently in hardware or software. We review the concepts of environment- and self-models, semantic interpretation, semantic attribution, history, goals and expectations, prediction, and self-inspection, how they contribute to awareness and self-awareness, and how they contribute to improved robustness and sensibility of behavior.