Biblio
The modular multilevel converter with series and parallel connectivity was shown to provide advantages in several industrial applications. Its reliability largely depends on the absence of failures in the power semiconductors. We propose and analyze a fault-diagnosis technique to identify shorted switches based on features generated through wavelet transform of the converter output and subsequent classification in support vector machines. The multi-class support vector machine is trained with multiple recordings of the output of each fault condition as well as the converter under normal operation. Simulation results reveal that the proposed method has high classification latency and high robustness. Except for the monitoring of the output, which is required for the converter control in any case, this method does not require additional module sensors.
This paper presents a novel sensor parameter fault diagnosis method for generally multiple-input multiple-output (MIMO) affine nonlinear systems based on adaptive observer. Firstly, the affine nonlinear systems are transformed into the particular systems via diffeomorphic transformation using Lie derivative. Then, based on the techniques of high-gain observer and adaptive estimation, an adaptive observer structure is designed with simple method for jointly estimating the states and the unknown parameters in the output equation of the nonlinear systems. And an algorithm of the fault estimation is derived. The global exponential convergence of the proposed observer is proved succinctly. Also the proposed method can be applied to the fault diagnosis of generally affine nonlinear systems directly by the reversibility of aforementioned coordinate transformation. Finally, a numerical example is presented to illustrate the efficiency of the proposed fault diagnosis scheme.
Complex systems are prevalent in many fields such as finance, security and industry. A fundamental problem in system management is to perform diagnosis in case of system failure such that the causal anomalies, i.e., root causes, can be identified for system debugging and repair. Recently, invariant network has proven a powerful tool in characterizing complex system behaviors. In an invariant network, a node represents a system component, and an edge indicates a stable interaction between two components. Recent approaches have shown that by modeling fault propagation in the invariant network, causal anomalies can be effectively discovered. Despite their success, the existing methods have a major limitation: they typically assume there is only a single and global fault propagation in the entire network. However, in real-world large-scale complex systems, it's more common for multiple fault propagations to grow simultaneously and locally within different node clusters and jointly define the system failure status. Inspired by this key observation, we propose a two-phase framework to identify and rank causal anomalies. In the first phase, a probabilistic clustering is performed to uncover impaired node clusters in the invariant network. Then, in the second phase, a low-rank network diffusion model is designed to backtrack causal anomalies in different impaired clusters. Extensive experimental results on real-life datasets demonstrate the effectiveness of our method.
As a consequence of the recent development of situational awareness technologies for smart grids, the gathering and analysis of data from multiple sources offer a significant opportunity for enhanced fault diagnosis. In order to achieve improved accuracy for both fault detection and classification, a novel combined data analytics technique is presented and demonstrated in this paper. The proposed technique is based on a segmented approach to Bayesian modelling that provides probabilistic graphical representations of both electrical power and data communication networks. In this manner, the reliability of both the data communication and electrical power networks are considered in order to improve overall power system transmission line fault diagnosis.
In Energy Internet mode, a large number of alarm information is generated when equipment exception and multiple faults in large power grid, which seriously affects the information collection, fault analysis and delays the accident treatment for the monitors. To this point, this paper proposed a method for power grid monitoring to monitor and diagnose fault in real time, constructed the equipment fault logical model based on five section alarm information, built the standard fault information set, realized fault information optimization, fault equipment location, fault type diagnosis, false-report message and missing-report message analysis using matching algorithm. The validity and practicality of the proposed method by an actual case was verified, which can shorten the time of obtaining and analyzing fault information, accelerate the progress of accident treatment, ensure the safe and stable operation of power grid.
Transmission lines' monitoring systems produce a large amount of data that hinders faults diagnosis. For this reason, approaches that can acquire and automatically interpret the information coming from lines' monitoring are needed. Furthermore, human errors stemming from operator dependent real-time decision need to be reduced. In this paper a multiple faults diagnosis method to determine transmission lines' operating conditions is proposed. Different scenarios, including insulator chains contamination with different types and concentrations of pollutants were modeled by equivalents circuits. Their performance were characterized by leakage current (LC) measurements and related to specific fault modes. Features extraction's algorithm relying on the difference between normal and faulty conditions were used to define qualitative trends for the diagnosis of various fault modes.
This paper presents a solution to a multiple-model based stochastic active fault diagnosis problem over the infinite-time horizon. A general additive detection cost criterion is considered to reflect the objectives. Since the system state is unknown, the design consists of a perfect state information reformulation and optimization problem solution by approximate dynamic programming. An adaptive particle filter state estimation algorithm based on the efficient sample size is proposed to maintain the estimate quality while reducing computational costs. A reduction of information statistics of the state is carried out using non-resampled particles to make the solution feasible. Simulation results illustrate the effectiveness of the proposed design.
Electro-hydraulic servo actuation system is a mechanical, electrical and hydraulic mixing complex system. If it can't be repaired for a long time, it is necessary to consider the possibility of occurrence of multiple faults. Considering this possibility, this paper presents an extended Kalman filter (EKF) based method for multiple faults diagnosis. Through analysing the failure modes and mechanism of the electro-hydraulic servo actuation system and modelling selected typical failure modes, the relationship between the key parameters of the system and the faults is obtained. The extended Kalman filter which is a commonly used algorithm for estimating parameters is used to on-line fault diagnosis. Then use the extended Kalman filter to diagnose potential faults. The simulation results show that the multi-fault diagnosis method based on extended Kalman filter is effective for multi-fault diagnosis of electro-hydraulic servo actuation system.
This paper proposes a design method of a support tool for detection and diagnosis of failures in discrete event systems (DES). The design of this diagnoser goes through three phases: an identification phase and finding paths and temporal parameters of the model describing the two modes of normal and faulty operation, a detection phase provided by the comparison and monitoring time operation and a location phase based on the combination of the temporal evolution of the parameters and thresholds exceeded technique. Our contribution lays in the application of this technique in the presence of faults arising simultaneously, sensors and actuators. The validation of the proposed approach is illustrated in a filling system through a simulation.
A novel method, consisting of fault detection, rough set generation, element isolation and parameter estimation is presented for multiple-fault diagnosis on analog circuit with tolerance. Firstly, a linear-programming concept is developed to transform fault detection of circuit with limited accessible terminals into measurement to check existence of a feasible solution under tolerance constraints. Secondly, fault characteristic equation is deduced to generate a fault rough set. It is proved that the node voltages of nominal circuit can be used in fault characteristic equation with fault tolerance. Lastly, fault detection of circuit with revised deviation restriction for suspected fault elements is proceeded to locate faulty elements and estimate their parameters. The diagnosis accuracy and parameter identification precision of the method are verified by simulation results.
A method for the multiple faults diagnosis in linear analog circuits is presented in this paper. The proposed approach is based upon the concept named by the indirect compensation theorem. This theorem is reducing the procedure of fault diagnosis in the analog circuit to the symbolic analysis process. An extension of the indirect compensation theorem for the linear subcircuit is proposed. The indirect compensation provides equivalent replacement of the n-ports subcircuit by n norators and n fixators of voltages and currents. The proposed multiple faults diagnosis techniques can be used for evaluation of any kind of terminal characteristics of the two-port network. For calculation of the circuit determinant expressions, the Generalized Parameter Extraction Method is implemented. The main advantage of the analysis method is that it is cancellation free. It requires neither matrix nor ordinary graph description of the circuit. The process of symbolic circuit analysis is automated by the freeware computer program Cirsym which can be used online. The experimental results are presented to show the efficiency and reliability of the proposed technique.
When supporting commercial or defense systems, a perennial challenge is providing effective test and diagnosis strategies to minimize downtime, thereby maximizing system availability. Potentially one of the most effective ways to maximize downtime is to be able to detect and isolate as many faults in a system at one time as possible. This is referred to as the "multiple-fault diagnosis" problem. While several tools have been developed over the years to assist in performing multiple-fault diagnosis, considerable work remains to provide the best diagnosis possible. Recently, a new model for evolutionary computation has been developed called the "Factored Evolutionary Algorithm" (FEA). In this paper, we combine our prior work in deriving diagnostic Bayesian networks from static fault isolation manuals and fault trees with the FEA strategy to perform abductive inference as a way of addressing the multiple-fault diagnosis problem. We demonstrate the effectiveness of this approach on several networks derived from existing, real-world FIMs.
Call traces, i.e., sequences of function calls and returns, are fundamental to a wide range of program analyses such as bug reproduction, fault diagnosis, performance analysis, and many others. The conventional approach to collect call traces that instruments each function call and return site incurs large space and time overhead. Our approach aims at reducing the recording overheads by instrumenting only a small amount of call sites while keeping the capability of recovering the full trace. We propose a call trace model and a logged call trace model based on an LL(1) grammar, which enables us to define the criteria of a feasible solution to call trace collection. Based on the two models, we prove that to collect call traces with minimal instrumentation is an NP-hard problem. We then propose an efficient approach to obtaining a suboptimal solution. We implemented our approach as a tool Casper and evaluated it using the DaCapo benchmark suite. The experiment results show that our approach causes significantly lower runtime (and space) overhead than two state-of-the-arts approaches.
At the core of the "Big Data" revolution lie frameworks and systems that allow for the massively parallel processing of large amounts of data. Ironically, while they have been designed for processing large amounts of data, these systems are at the same time major producers of data: to support the administration and management of these huge-scale systems, they are configured to generate detailed log and monitoring data, periodically capturing the system state across all nodes, components and jobs in the system. While such logging information is used routinely by sysadmins for ad-hoc trouble-shooting and problem diagnosis, we point out that there is a tremendous value in analyzing such data from a research point of view. In this talk, we will go over several case studies that demonstrate how measuring and analyzing measurement data from production systems can provide new insights into how systems work and fail, and how these new insights can help in designing better systems.
After a program has crashed and terminated abnormally, it typically leaves behind a snapshot of its crashing state in the form of a core dump. While a core dump carries a large amount of information, which has long been used for software debugging, it barely serves as informative debugging aids in locating software faults, particularly memory corruption vulnerabilities. A memory corruption vulnerability is a special type of software faults that an attacker can exploit to manipulate the content at a certain memory. As such, a core dump may contain a certain amount of corrupted data, which increases the difficulty in identifying useful debugging information (e.g. , a crash point and stack traces). Without a proper mechanism to deal with this problem, a core dump can be practically useless for software failure diagnosis. In this work, we develop CREDAL, an automatic tool that employs the source code of a crashing program to enhance core dump analysis and turns a core dump to an informative aid in tracking down memory corruption vulnerabilities. Specifically, CREDAL systematically analyzes a core dump potentially corrupted and identifies the crash point and stack frames. For a core dump carrying corrupted data, it goes beyond the crash point and stack trace. In particular, CREDAL further pinpoints the variables holding corrupted data using the source code of the crashing program along with the stack frames. To assist software developers (or security analysts) in tracking down a memory corruption vulnerability, CREDAL also performs analysis and highlights the code fragments corresponding to data corruption. To demonstrate the utility of CREDAL, we use it to analyze 80 crashes corresponding to 73 memory corruption vulnerabilities archived in Offensive Security Exploit Database. We show that, CREDAL can accurately pinpoint the crash point and (fully or partially) restore a stack trace even though a crashing program stack carries corrupted data. In addition, we demonstrate CREDAL can potentially reduce the manual effort of finding the code fragment that is likely to contain memory corruption vulnerabilities.
Modern world has witnessed a dramatic increase in our ability to collect, transmit and distribute real-time monitoring and surveillance data from large-scale information systems and cyber-physical systems. Detecting system anomalies thus attracts significant amount of interest in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be a powerful way in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components. Structures and evolutions of the invariance network, in particular the vanishing correlations, can shed important light on locating causal anomalies and performing diagnosis. However, existing approaches to detect causal anomalies with the invariant network often use the percentage of vanishing correlations to rank possible casual components, which have several limitations: 1) fault propagation in the network is ignored; 2) the root casual anomalies may not always be the nodes with a high-percentage of vanishing correlations; 3) temporal patterns of vanishing correlations are not exploited for robust detection. To address these limitations, in this paper we propose a network diffusion based framework to identify significant causal anomalies and rank them. Our approach can effectively model fault propagation over the entire invariant network, and can perform joint inference on both the structural, and the time-evolving broken invariance patterns. As a result, it can locate high-confidence anomalies that are truly responsible for the vanishing correlations, and can compensate for unstructured measurement noise in the system. Extensive experiments on synthetic datasets, bank information system datasets, and coal plant cyber-physical system datasets demonstrate the effectiveness of our approach.
Workflow-centric tracing captures the workflow of causally-related events (e.g., work done to process a request) within and among the components of a distributed system. As distributed systems grow in scale and complexity, such tracing is becoming a critical tool for understanding distributed system behavior. Yet, there is a fundamental lack of clarity about how such infrastructures should be designed to provide maximum benefit for important management tasks, such as resource accounting and diagnosis. Without research into this important issue, there is a danger that workflow-centric tracing will not reach its full potential. To help, this paper distills the design space of workflow-centric tracing and describes key design choices that can help or hinder a tracing infrastructures utility for important tasks. Our design space and the design choices we suggest are based on our experiences developing several previous workflow-centric tracing infrastructures.
We propose an exact algorithm to model-free diagnosis with an application to fault localization in digital circuits. We assume that a faulty circuit and a correctness specification, e.g., in terms of an un-optimized reference circuit, are available. Our algorithm computes the exact set of all minimal diagnoses up to cardinality k considering all possible assignments to the primary inputs of the circuit. This exact diagnosis problem can be naturally formulated and solved using an oracle for Quantified Boolean Satisfiability (QSAT). Our algorithm uses Boolean Satisfiability (SAT) instead to compute the exact result more efficiently. We implemented the approach and present experimental results for determining fault candidates of digital circuits with seeded faults on the gate level. The experiments show that the presented SAT-based approach outperforms state-of-the-art techniques from solving instances of the QSAT problem by several orders of magnitude while having the same accuracy. Moreover, in contrast to QSAT, the SAT-based algorithm has any-time behavior, i.e., at any-time of the computation, an approximation of the exact result is available that can be used as a starting point for debugging. The result improves while time progresses until eventually the exact result is obtained.
Energy use of buildings represents roughly 40% of the overall energy consumption. Most of the national agendas contain goals related to reducing the energy consumption and carbon footprint. Timely and accurate fault detection and diagnosis (FDD) in building management systems (BMS) have the potential to reduce energy consumption cost by approximately 15-30%. Most of the FDD methods are data-based, meaning that their performance is tightly linked to the quality and availability of relevant data. Based on our experience, faults and relevant events data is very sparse and inadequate, mostly because of the lack of will and incentive for those that would need to keep track of faults. In this paper we introduce the idea of using crowdsourcing to support FDD data collection processes, and illustrate our idea through a mobile application that has been implemented for this purpose. Furthermore, we propose a strategy of how to successfully deploy this building occupants' crowdsourcing application.