Biblio
Fault-tolerance has huge impact on embedded safety-critical systems. As technology that assists to the development of such improvement, Safe Node Sequence Protocol (SNSP) is designed to make part of such impact. In this paper, we present a mechanism for fault-tolerance and recovery based on the Safe Node Sequence Protocol (SNSP) to strengthen the system robustness, from which the correctness of a fault-tolerant prototype system is analyzed and verified. In order to verify the correctness of more than thirty failure modes, we have partitioned the complete protocol state machine into several subsystems, followed to the injection of corresponding fault classes into dedicated independent models. Experiments demonstrate that this method effectively reduces the size of overall state space, and verification results indicate that the protocol is able to recover from the fault model in a fault-tolerant system and continue to operate as errors occur.