Visible to the public Biblio

Found 133 results

Filters: Keyword is formal verification  [Clear All Filters]
2018-02-06
Resch, S., Paulitsch, M..  2017.  Using TLA+ in the Development of a Safety-Critical Fault-Tolerant Middleware. 2017 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). :146–152.

Creating and implementing fault-tolerant distributed algorithms is a challenging task in highly safety-critical industries. Using formal methods supports design and development of complex algorithms. However, formal methods are often perceived as an unjustifiable overhead. This paper presents the experience and insights when using TLA+ and PlusCal to model and develop fault-tolerant and safety-critical modules for TAS Control Platform, a platform for railway control applications up to safety integrity level (SIL) 4. We show how formal methods helped us improve the correctness of the algorithms, improved development efficiency and how part of the gap between model and implementation has been closed by translation to C code. Additionally, we describe how we gained trust in the formal model and tools by following a specific design process called property-driven design, which also implicitly addresses software quality metrics such as code coverage metrics.

2018-02-02
Brunner, M., Huber, M., Sauerwein, C., Breu, R..  2017.  Towards an Integrated Model for Safety and Security Requirements of Cyber-Physical Systems. 2017 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C). :334–340.

Increasing interest in cyber-physical systems with integrated computational and physical capabilities that can interact with humans can be identified in research and practice. Since these systems can be classified as safety- and security-critical systems the need for safety and security assurance and certification will grow. Moreover, these systems are typically characterized by fragmentation, interconnectedness, heterogeneity, short release cycles, cross organizational nature and high interference between safety and security requirements. These properties combined with the assurance of compliance to multiple standards, carrying out certification and re-certification, and the lack of an approach to model, document and integrate safety and security requirements represent a major challenge. In order to address this gap we developed a domain agnostic approach to model security and safety requirements in an integrated view to support certification processes during design and run-time phases of cyber-physical systems.

2018-01-10
Garcia, R., Modesti, P..  2017.  An IDE for the Design, Verification and Implementation of Security Protocols. 2017 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). :157–163.

Security protocols are critical components for the construction of secure and dependable distributed applications, but their implementation is challenging and error prone. Therefore, tools for formal modelling and analysis of security protocols can be potentially very useful to support software engineers. However, despite such tools have been available for a long time, their adoption outside the research community has been very limited. In fact, most practitioners find such applications too complex and hardly usable for their daily work. In this paper, we present an Integrated Development Environment for the design, verification and implementation of security protocols, aimed at lowering the adoption barrier of formal methods tools for security. In the spirit of Model Driven Development, the environment supports the user in the specification of the model using the simple and intuitive language AnB (and its extension AnBx). Moreover, it provides a push-button solution for the formal verification of the abstract and concrete models, and for the automatic generation of Java implementation. This Eclipse-based IDE leverages on existing languages and tools for modelling and verification of security protocols, such as the AnBx Compiler and Code Generator, the model checker OFMC and the protocol verifier ProVerif.

2017-12-28
Kwiatkowska, M..  2016.  Advances and challenges of quantitative verification and synthesis for cyber-physical systems. 2016 Science of Security for Cyber-Physical Systems Workshop (SOSCYPS). :1–5.

We are witnessing a huge growth of cyber-physical systems, which are autonomous, mobile, endowed with sensing, controlled by software, and often wirelessly connected and Internet-enabled. They include factory automation systems, robotic assistants, self-driving cars, and wearable and implantable devices. Since they are increasingly often used in safety- or business-critical contexts, to mention invasive treatment or biometric authentication, there is an urgent need for modelling and verification technologies to support the design process, and hence improve the reliability and reduce production costs. This paper gives an overview of quantitative verification and synthesis techniques developed for cyber-physical systems, summarising recent achievements and future challenges in this important field.

Ouffoué, G., Zaidi, F., Cavalli, A. R., Lallali, M..  2017.  Model-Based Attack Tolerance. 2017 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA). :68–73.

Software-based systems are nowadays complex and highly distributed. In contrast, existing intrusion detection mechanisms are not always suitable for protecting these systems against new and sophisticated attacks that increasingly appear. In this paper, we present a new generic approach that combines monitoring and formal methods in order to ensure attack-tolerance at a high level of abstraction. Our experiments on an authentication Web application show that this method is effective and realistic to tolerate a variety of attacks.

2017-11-03
Mercaldo, F., Nardone, V., Santone, A..  2016.  Ransomware Inside Out. 2016 11th International Conference on Availability, Reliability and Security (ARES). :628–637.

Android is currently the most widely used mobile environment. This trend encourages malware writers to develop specific attacks targeting this platform with threats designed to covertly collect data or financially extort victims, the so-called ransomware. In this paper we use formal methods, in particular model checking, to automatically dissect ransomware samples. Starting from manual inspection of few samples, we define a set of rule in order to check whether the behaviours we find are representative of ransomware functionalities.

2017-10-13
Schäfer, Steven, Schneider, Sigurd, Smolka, Gert.  2016.  Axiomatic Semantics for Compiler Verification. Proceedings of the 5th ACM SIGPLAN Conference on Certified Programs and Proofs. :188–196.

Based on constructive type theory, we study two idealized imperative languages GC and IC and verify the correctness of a compiler from GC to IC. GC is a guarded command language with underspecified execution order defined with an axiomatic semantics. IC is a deterministic low-level language with linear sequential composition and lexically scoped gotos defined with a small-step semantics. We characterize IC with an axiomatic semantics and prove that the compiler from GC to IC preserves specifications. The axiomatic semantics we consider model total correctness and map programs to continuous predicate transformers. We define the axiomatic semantics of GC and IC with elementary inductive predicates and show that the predicate transformer described by a program can be obtained compositionally by recursion on the syntax of the program using a fixed point operator for loops and continuations. We also show that two IC programs are contextually equivalent if and only if their predicate transformers are equivalent.

2017-09-26
St-Martin, Michel, Felty, Amy P..  2016.  A Verified Algorithm for Detecting Conflicts in XACML Access Control Rules. Proceedings of the 5th ACM SIGPLAN Conference on Certified Programs and Proofs. :166–175.

We describe the formalization of a correctness proof for a conflict detection algorithm for XACML (eXtensible Access Control Markup Language). XACML is a standardized declarative access control policy language that is increasingly used in industry. In practice it is common for rule sets to grow large, and contain unintended errors, often due to conflicting rules. A conflict occurs in a policy when one rule permits a request and another denies that same request. Such errors can lead to serious risks involving both allowing access to an unauthorized user as well as denying access to someone who needs it. Removing conflicts is thus an important aspect of debugging policies, and the use of a verified algorithm provides the highest assurance in a domain where security is important. In this paper, we focus on several complex XACML constructs, including time ranges and integer intervals, as well as ways to combine any number of functions using the boolean operators and, or, and not. The latter are the most complex, and add significant expressive power to the language. We propose an algorithm to find conflicts and then use the Coq Proof Assistant to prove the algorithm correct. We develop a library of tactics to help automate the proof.

Madi, Taous, Majumdar, Suryadipta, Wang, Yushun, Jarraya, Yosr, Pourzandi, Makan, Wang, Lingyu.  2016.  Auditing Security Compliance of the Virtualized Infrastructure in the Cloud: Application to OpenStack. Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy. :195–206.

Cloud service providers typically adopt the multi-tenancy model to optimize resources usage and achieve the promised cost-effectiveness. Sharing resources between different tenants and the underlying complex technology increase the necessity of transparency and accountability. In this regard, auditing security compliance of the provider's infrastructure against standards, regulations and customers' policies takes on an increasing importance in the cloud to boost the trust between the stakeholders. However, virtualization and scalability make compliance verification challenging. In this work, we propose an automated framework that allows auditing the cloud infrastructure from the structural point of view while focusing on virtualization-related security properties and consistency between multiple control layers. Furthermore, to show the feasibility of our approach, we integrate our auditing system into OpenStack, one of the most used cloud infrastructure management systems. To show the scalability and validity of our framework, we present our experimental results on assessing several properties related to auditing inter-layer consistency, virtual machines co-residence, and virtual resources isolation.

Woos, Doug, Wilcox, James R., Anton, Steve, Tatlock, Zachary, Ernst, Michael D., Anderson, Thomas.  2016.  Planning for Change in a Formal Verification of the Raft Consensus Protocol. Proceedings of the 5th ACM SIGPLAN Conference on Certified Programs and Proofs. :154–165.

We present the first formal verification of state machine safety for the Raft consensus protocol, a critical component of many distributed systems. We connected our proof to previous work to establish an end-to-end guarantee that our implementation provides linearizable state machine replication. This proof required iteratively discovering and proving 90 system invariants. Our verified implementation is extracted to OCaml and runs on real networks. The primary challenge we faced during the verification process was proof maintenance, since proving one invariant often required strengthening and updating other parts of our proof. To address this challenge, we propose a methodology of planning for change during verification. Our methodology adapts classical information hiding techniques to the context of proof assistants, factors out common invariant-strengthening patterns into custom induction principles, proves higher-order lemmas that show any property proved about a particular component implies analogous properties about related components, and makes proofs robust to change using structural tactics. We also discuss how our methodology may be applied to systems verification more broadly.

Kim, Woobin, Jin, Jungha, Kim, Keecheon.  2016.  A Routing Protocol Method That Sets Up Multi-hops in the Ad-hoc Network. Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication. :70:1–70:6.

In infrastructure wireless network technology, communication between users is provided within a certain area supported by access points (APs) or base station communication networks, but in ad-hoc networks, communication between users is provided only through direct connections between nodes. Ad-hoc network technology supports mobility directly through routing algorithms. However, when a connected node is lost owing to the node's movement, the routing protocol transfers this traffic to another node. The routing table in the node that is receiving the traffic detects any changes that occur and manages them. This paper proposes a routing protocol method that sets up multi-hops in the ad-hoc network and verifies the performance, which provides more effective connection persistence than existing methods.

2017-08-02
Madi, Taous, Majumdar, Suryadipta, Wang, Yushun, Jarraya, Yosr, Pourzandi, Makan, Wang, Lingyu.  2016.  Auditing Security Compliance of the Virtualized Infrastructure in the Cloud: Application to OpenStack. Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy. :195–206.

Cloud service providers typically adopt the multi-tenancy model to optimize resources usage and achieve the promised cost-effectiveness. Sharing resources between different tenants and the underlying complex technology increase the necessity of transparency and accountability. In this regard, auditing security compliance of the provider's infrastructure against standards, regulations and customers' policies takes on an increasing importance in the cloud to boost the trust between the stakeholders. However, virtualization and scalability make compliance verification challenging. In this work, we propose an automated framework that allows auditing the cloud infrastructure from the structural point of view while focusing on virtualization-related security properties and consistency between multiple control layers. Furthermore, to show the feasibility of our approach, we integrate our auditing system into OpenStack, one of the most used cloud infrastructure management systems. To show the scalability and validity of our framework, we present our experimental results on assessing several properties related to auditing inter-layer consistency, virtual machines co-residence, and virtual resources isolation.

2017-06-05
Abdulla, Parosh Aziz, Aiswarya, C., Atig, Mohamed Faouzi, Montali, Marco, Rezine, Othmane.  2016.  Recency-Bounded Verification of Dynamic Database-Driven Systems. Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. :195–210.

We propose a formalism to model database-driven systems, called database manipulating systems (DMS). The actions of a (DMS) modify the current instance of a relational database by adding new elements into the database, deleting tuples from the relations and adding tuples to the relations. The elements which are modified by an action are chosen by (full) first-order queries. (DMS) is a highly expressive model and can be thought of as a succinct representation of an infinite state relational transition system, in line with similar models proposed in the literature. We propose monadic second order logic (MSO-FO) to reason about sequences of database instances appearing along a run. Unsurprisingly, the linear-time model checking problem of (DMS) against (MSO-FO) is undecidable. Towards decidability, we propose under-approximate model checking of (DMS), where the under-approximation parameter is the "bound on recency". In a k-recency-bounded run, only the most recent k elements in the current active domain may be modified by an action. More runs can be verified by increasing the bound on recency. Our main result shows that recency-bounded model checking of (DMS) against (MSO-FO) is decidable, by a reduction to the satisfiability problem of MSO over nested words.

2017-05-22
Sinha, Rohit, Costa, Manuel, Lal, Akash, Lopes, Nuno P., Rajamani, Sriram, Seshia, Sanjit A., Vaswani, Kapil.  2016.  A Design and Verification Methodology for Secure Isolated Regions. Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. :665–681.

Hardware support for isolated execution (such as Intel SGX) enables development of applications that keep their code and data confidential even while running in a hostile or compromised host. However, automatically verifying that such applications satisfy confidentiality remains challenging. We present a methodology for designing such applications in a way that enables certifying their confidentiality. Our methodology consists of forcing the application to communicate with the external world through a narrow interface, compiling it with runtime checks that aid verification, and linking it with a small runtime that implements the narrow interface. The runtime includes services such as secure communication channels and memory management. We formalize this restriction on the application as Information Release Confinement (IRC), and we show that it allows us to decompose the task of proving confidentiality into (a) one-time, human-assisted functional verification of the runtime to ensure that it does not leak secrets, (b) automatic verification of the application's machine code to ensure that it satisfies IRC and does not directly read or corrupt the runtime's internal state. We present /CONFIDENTIAL: a verifier for IRC that is modular, automatic, and keeps our compiler out of the trusted computing base. Our evaluation suggests that the methodology scales to real-world applications.

2017-05-17
Wang, Timothy E., Garoche, Pierre-Loïc, Roux, Pierre, Jobredeaux, Romain, Féron, Éric.  2016.  Formal Analysis of Robustness at Model and Code Level. Proceedings of the 19th International Conference on Hybrid Systems: Computation and Control. :125–134.

Robustness analyses play a major role in the synthesis and analysis of controllers. For control systems, robustness is a measure of the maximum tolerable model inaccuracies or perturbations that do not destabilize the system. Analyzing the robustness of a closed-loop system can be performed with multiple approaches: gain and phase margin computation for single-input single-output (SISO) linear systems, mu analysis, IQC computations, etc. However, none of these techniques consider the actual code in their analyses. The approach presented here relies on an invariant computation on the discrete system dynamics. Using semi-definite programming (SDP) solvers, a Lyapunov-based function is synthesized that captures the vector margins of the closed-loop linear system considered. This numerical invariant expressed over the state variables of the system is compatible with code analysis and enables its validation on the code artifact. This automatic analysis extends verification techniques focused on controller implementation, addressing validation of robustness at model and code level. It has been implemented in a tool analyzing discrete SISO systems and generating over-approximations of phase and gain margins. The analysis will be integrated in our toolchain for Simulink and Lustre models autocoding and formal analysis.

2017-05-16
Koskinen, Eric, Yang, Junfeng.  2016.  Reducing Crash Recoverability to Reachability. Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. :97–108.

Software applications run on a variety of platforms (filesystems, virtual slices, mobile hardware, etc.) that do not provide 100% uptime. As such, these applications may crash at any unfortunate moment losing volatile data and, when re-launched, they must be able to correctly recover from potentially inconsistent states left on persistent storage. From a verification perspective, crash recovery bugs can be particularly frustrating because, even when it has been formally proved for a program that it satisfies a property, the proof is foiled by these external events that crash and restart the program. In this paper we first provide a hierarchical formal model of what it means for a program to be crash recoverable. Our model captures the recoverability of many real world programs, including those in our evaluation which use sophisticated recovery algorithms such as shadow paging and write-ahead logging. Next, we introduce a novel technique capable of automatically proving that a program correctly recovers from a crash via a reduction to reachability. Our technique takes an input control-flow automaton and transforms it into an encoding that blends the capture of snapshots of pre-crash states into a symbolic search for a proof that recovery terminates and every recovered execution simulates some crash-free execution. Our encoding is designed to enable one to apply existing abstraction techniques in order to do the work that is necessary to prove recoverability. We have implemented our technique in a tool called Eleven82, capable of analyzing C programs to detect recoverability bugs or prove their absence. We have applied our tool to benchmark examples drawn from industrial file systems and databases, including GDBM, LevelDB, LMDB, PostgreSQL, SQLite, VMware and ZooKeeper. Within minutes, our tool is able to discover bugs or prove that these fragments are crash recoverable.

2017-02-21
J. Qadir, O. Hasan.  2015.  "Applying Formal Methods to Networking: Theory, Techniques, and Applications". IEEE Communications Surveys Tutorials. 17:256-291.

Despite its great importance, modern network infrastructure is remarkable for the lack of rigor in its engineering. The Internet, which began as a research experiment, was never designed to handle the users and applications it hosts today. The lack of formalization of the Internet architecture meant limited abstractions and modularity, particularly for the control and management planes, thus requiring for every new need a new protocol built from scratch. This led to an unwieldy ossified Internet architecture resistant to any attempts at formal verification and to an Internet culture where expediency and pragmatism are favored over formal correctness. Fortunately, recent work in the space of clean slate Internet design-in particular, the software defined networking (SDN) paradigm-offers the Internet community another chance to develop the right kind of architecture and abstractions. This has also led to a great resurgence in interest of applying formal methods to specification, verification, and synthesis of networking protocols and applications. In this paper, we present a self-contained tutorial of the formidable amount of work that has been done in formal methods and present a survey of its applications to networking.

2015-05-06
Castro Marquez, C.I., Strum, M., Wang Jiang Chau.  2014.  A unified sequential equivalence checking approach to verify high-level functionality and protocol specification implementations in RTL designs. Test Workshop - LATW, 2014 15th Latin American. :1-6.

Formal techniques provide exhaustive design verification, but computational margins have an important negative impact on its efficiency. Sequential equivalence checking is an effective approach, but traditionally it has been only applied between circuit descriptions with one-to-one correspondence for states. Applying it between RTL descriptions and high-level reference models requires removing signals, variables and states exclusive of the RTL description so as to comply with the state correspondence restriction. In this paper, we extend a previous formal methodology for RTL verification with high-level models, to check also the signals and protocol implemented in the RTL design. This protocol implementation is compared formally to a description captured from the specification. Thus, we can prove thoroughly the sequential behavior of a design under verification.
 

Alrabaee, S., Bataineh, A., Khasawneh, F.A., Dssouli, R..  2014.  Using model checking for Trivial File Transfer Protocol validation. Communications and Networking (ComNet), 2014 International Conference on. :1-7.

This paper presents verification and model based checking of the Trivial File Transfer Protocol (TFTP). Model checking is a technique for software verification that can detect concurrency defects within appropriate constraints by performing an exhaustive state space search on a software design or implementation and alert the implementing organization to potential design deficiencies that are otherwise difficult to be discovered. The TFTP is implemented on top of the Internet User Datagram Protocol (UDP) or any other datagram protocol. We aim to create a design model of TFTP protocol, with adding window size, using Promela to simulate it and validate some specified properties using spin. The verification has been done by using the model based checking tool SPIN which accepts design specification written in the verification language PROMELA. The results show that TFTP is free of live locks.
 

Huaqun Wang.  2015.  Identity-Based Distributed Provable Data Possession in Multicloud Storage. Services Computing, IEEE Transactions on. 8:328-340.

Remote data integrity checking is of crucial importance in cloud storage. It can make the clients verify whether their outsourced data is kept intact without downloading the whole data. In some application scenarios, the clients have to store their data on multicloud servers. At the same time, the integrity checking protocol must be efficient in order to save the verifier's cost. From the two points, we propose a novel remote data integrity checking model: ID-DPDP (identity-based distributed provable data possession) in multicloud storage. The formal system model and security model are given. Based on the bilinear pairings, a concrete ID-DPDP protocol is designed. The proposed ID-DPDP protocol is provably secure under the hardness assumption of the standard CDH (computational Diffie-Hellman) problem. In addition to the structural advantage of elimination of certificate management, our ID-DPDP protocol is also efficient and flexible. Based on the client's authorization, the proposed ID-DPDP protocol can realize private verification, delegated verification, and public verification.
 

Voskuilen, G., Vijaykumar, T.N..  2014.  Fractal++: Closing the performance gap between fractal and conventional coherence. Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on. :409-420.

Cache coherence protocol bugs can cause multicores to fail. Existing coherence verification approaches incur state explosion at small scales or require considerable human effort. As protocols' complexity and multicores' core counts increase, verification continues to be a challenge. Recently, researchers proposed fractal coherence which achieves scalable verification by enforcing observational equivalence between sub-systems in the coherence protocol. A larger sub-system is verified implicitly if a smaller sub-system has been verified. Unfortunately, fractal protocols suffer from two fundamental limitations: (1) indirect-communication: sub-systems cannot directly communicate and (2) partially-serial-invalidations: cores must be invalidated in a specific, serial order. These limitations disallow common performance optimizations used by conventional directory protocols: reply-forwarding where caches communicate directly and parallel invalidations. Therefore, fractal protocols lack performance scalability while directory protocols lack verification scalability. To enable both performance and verification scalability, we propose Fractal++ which employs a new class of protocol optimizations for verification-constrained architectures: decoupled-replies, contention-hints, and fully-parallel-fractal-invalidations. The first two optimizations allow reply-forwarding-like performance while the third optimization enables parallel invalidations in fractal protocols. Unlike conventional protocols, Fractal++ preserves observational equivalence and hence is scalably verifiable. In 32-core simulations of single- and four-socket systems, Fractal++ performs nearly as well as a directory protocol while providing scalable verifiability whereas the best-performing previous fractal protocol performs 8% on average and up to 26% worse with a single-socket and 12% on average and up to 34% worse with a longer-latency multi-socket system.
 

Kammuller, F..  2014.  Verification of DNSsec Delegation Signatures. Telecommunications (ICT), 2014 21st International Conference on. :298-392.

In this paper, we present a formal model for the verification of the DNSsec Protocol in the interactive theorem prover Isabelle/HOL. Relying on the inductive approach to security protocol verification, this formal analysis provides a more expressive representation than the widely accepted model checking analysis. Our mechanized model allows to represent the protocol, all its possible traces and the attacker and his knowledge. The fine grained model allows to show origin authentication, and replay attack prevention. Most prominently, we succeed in expressing Delegation Signatures and proving their authenticity formally.

Kasraoui, M., Cabani, A., Chafouk, H..  2014.  Formal Verification of Wireless Sensor Key Exchange Protocol Using AVISPA. Computer, Consumer and Control (IS3C), 2014 International Symposium on. :387-390.

For efficient deployment of sensor nodes required in many logistic applications, it's necessary to build security mechanisms for a secure wireless communication. End-to-end security plays a crucial role for the communication in these networks. This provides the confidentiality, the authentication and mostly the prevention from many attacks at high level. In this paper, we propose a lightweight key exchange protocol WSKE (Wireless Sensor Key Exchange) for IP-based wireless sensor networks. This protocol proposes techniques that allows to adapt IKEv2 (Internet Key Exchange version 2) mechanisms of IPSEC/6LoWPAN networks. In order to check these security properties, we have used a formal verification tools called AVISPA.
 

Meng Zhang, Bingham, J.D., Erickson, J., Sorin, D.J..  2014.  PVCoherence: Designing flat coherence protocols for scalable verification. High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on. :392-403.

The goal of this work is to design cache coherence protocols with many cores that can be verified with state-of-the-art automated verification methodologies. In particular, we focus on flat (non-hierarchical) coherence protocols, and we use a mostly-automated methodology based on parametric verification (PV). We propose several design guidelines that architects should follow if they want to design protocols that can be parametrically verified. We experimentally evaluate performance, storage overhead, and scalability of a protocol verified with PV compared to a highly optimized protocol that cannot be verified with PV.

Rui Zhou, Rong Min, Qi Yu, Chanjuan Li, Yong Sheng, Qingguo Zhou, Xuan Wang, Kuan-Ching Li.  2014.  Formal Verification of Fault-Tolerant and Recovery Mechanisms for Safe Node Sequence Protocol. Advanced Information Networking and Applications (AINA), 2014 IEEE 28th International Conference on. :813-820.

Fault-tolerance has huge impact on embedded safety-critical systems. As technology that assists to the development of such improvement, Safe Node Sequence Protocol (SNSP) is designed to make part of such impact. In this paper, we present a mechanism for fault-tolerance and recovery based on the Safe Node Sequence Protocol (SNSP) to strengthen the system robustness, from which the correctness of a fault-tolerant prototype system is analyzed and verified. In order to verify the correctness of more than thirty failure modes, we have partitioned the complete protocol state machine into several subsystems, followed to the injection of corresponding fault classes into dedicated independent models. Experiments demonstrate that this method effectively reduces the size of overall state space, and verification results indicate that the protocol is able to recover from the fault model in a fault-tolerant system and continue to operate as errors occur.