Visible to the public Analysis of Checkpointing Overhead in Parallel State Machine Replication

TitleAnalysis of Checkpointing Overhead in Parallel State Machine Replication
Publication TypeConference Paper
Year of Publication2016
AuthorsMendizabal, Odorico M., Dotti, Fernando Luís, Pedone, Fernando
Conference NameProceedings of the 31st Annual ACM Symposium on Applied Computing
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-3739-7
Keywordscheckpointing, Distributed Systems, Fault tolerance, pubcrawl, Resiliency, System recovery
Abstract

State machine replication (SMR) is a well-established technique to fault-tolerant systems. In part, this is explained by the simplicity of the approach and its strong consistency guarantees. Recently, several proposals have suggested parallelizing the execution of state machine replicas to achieve high throughput. Concurrent execution of commands has many implications, including the recovery of replicas from failures. Conventional checkpointing techniques, for example, must be revisited in parallelized models. In this paper, we review parallel variations of state machine replication and discuss how checkpointing procedures apply to these models. Moreover, we evaluate the impact caused by checkpointing techniques on recovery through simulations.

URLhttp://doi.acm.org/10.1145/2851613.2851879
DOI10.1145/2851613.2851879
Citation Keymendizabal_analysis_2016