Biblio

Filters: Author is Rui Abreu  [Clear All Filters]
2016-12-05
Paulo Casanova, David Garlan, Bradley Schmerl, Rui Abreu.  2014.  Diagnosing Unobserved Components in Self-Adaptive Systems. SEAMS 2014 Proceedings of the 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. :75-84.

Availability is an increasingly important quality for today's software-based systems and it has been successfully addressed by the use of closed-loop control systems in self-adaptive systems. Probes are inserted into a running system to obtain information and the information is fed to a controller that, through provided interfaces, acts on the system to alter its behavior. When a failure is detected, pinpointing the source of the failure is a critical step for a repair action. However, information obtained from a running system is commonly incomplete due to probing costs or unavailability of probes. In this paper we address the problem of fault localization in the presence of incomplete system monitoring. We may not be able to directly observe a component but we may be able to infer its health state. We provide formal criteria to determine when health states of unobservable components can be inferred and establish formal theoretical bounds for accuracy when using any spectrum-based fault localization algorithm.

Paulo Casanova, David Garlan, Bradley Schmerl, Rui Abreu.  2013.  Diagnosing architectural run-time failures. SEAMS '13 Proceedings of the 8th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. :103-112.

Self-diagnosis is a fundamental capability of self-adaptive systems. In order to recover from faults, systems need to know which part is responsible for the incorrect behavior. In previous work we showed how to apply a design-time diagnosis technique at run time to identify faults at the architectural level of a system. Our contributions address three major shortcomings of our previous work: 1) we present an expressive, hierarchical language to describe system behavior that can be used to diagnose when a system is behaving different to expectation; the hierarchical language facilitates mapping low level system events to architecture level events; 2) we provide an automatic way to determine how much data to collect before an accurate diagnosis can be produced; and 3) we develop a technique that allows the detection of correlated faults between components. Our results are validated experimentally by injecting several failures in a system and accurately diagnosing them using our algorithm.

2016-12-06
Paulo Casanova, Bradley Schmerl, David Garlan, Rui Abreu.  2011.  Architecture-Based Run-Time Fault Diagnosis. Proceedings of the 5th European Conference on Software Architecture.

An important step in achieving robustness to run-time faults is the ability to detect and repair problems when they arise in a running system. Effective fault detection and repair could be greatly enhanced by run-time fault diagnosis and localization, since it would allow the repair mechanisms to focus adaptation effort on the parts most in need of attention. In this paper we describe an approach to run-time fault diagnosis that combines architectural models with spectrum-based reasoning for multiple fault localization. Spectrum-based reasoning is a lightweight technique that takes a form of trace abstraction and produces a list (ordered by probability) of likely fault candidates. We show how this technique can be combined with architectural models to support run-time diagnosis that can (a) scale to modern distributed software systems; (b) accommodate the use of black-box components and proprietary infrastructure for which one has neither a specification nor source code; and (c) handle inherent uncertainty about the probable cause of a problem even in the face of transient faults and faults that arise only when certain combinations of system components interact.