Visible to the public Predicting gray fault based on context graph in container-based cloud

TitlePredicting gray fault based on context graph in container-based cloud
Publication TypeConference Paper
Year of Publication2021
AuthorsYu, Siyu, Chen, Ningjiang, Liang, Birui
Conference Name2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)
Date Publishedoct
Keywordscloud computing, container-based cloud, Containers, context graph, fault prediction, Fault tolerance, Fault tolerant systems, gray fault, human factors, intrusion tolerance, Loss measurement, Prediction methods, pubcrawl, resilience, Resiliency, Scalability
AbstractDistributed Container-based cloud system has the advantages of rapid deployment, efficient virtualization, simplified configuration, and well-scalability. However, good scalability may slow down container-based cloud because it is more vulnerable to gray faults. As a new fault model similar with fail-slow and limping, gray fault has so many root causes that current studies focus only on a certain type of fault are not sufficient. And unlike traditional cloud, container is a black box provided by service providers, making it difficult for traditional API intrusion-based diagnosis methods to implement. A better approach should shield low-level causes from high-level processing. A Gray Fault Prediction Strategy based on Context Graph is proposed according to the correlation between gray faults and application scenarios. From historical data, the performance metrics related to how above context evolve to fault scenarios are established, and scenarios represented by corresponding data are stored in a graph. A scenario will be predicted as a fault scenario, if its isomorphic scenario is found in the graph. The experimental results show that the success rate of prediction is stable at more than 90%, and it is verified the overhead is optimized well.
DOI10.1109/ISSREW53611.2021.00067
Citation Keyyu_predicting_2021