Visible to the public Leveraging Compiler Optimizations to Reduce Runtime Fault Recovery Overhead

TitleLeveraging Compiler Optimizations to Reduce Runtime Fault Recovery Overhead
Publication TypeConference Paper
Year of Publication2017
AuthorsHosseini, Fateme S., Fotouhi, Pouya, Yang, Chengmo, Gao, Guang R.
Conference NameProceedings of the 54th Annual Design Automation Conference 2017
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4927-7
Keywordspubcrawl, Resiliency, System recovery
Abstract

Smaller feature size, lower supply voltage, and faster clock rates have made modern computer systems more susceptible to faults. Although previous fault tolerance techniques usually target a relatively low fault rate and consider error recovery less critical, with the advent of higher fault rates, recovery overhead is no longer negligible. In this paper, we propose a scheme that leverages and revises a set of compiler optimizations to design, for each application hotspot, a smart recovery plan that identifies the minimal set of instructions to be re-executed in different fault scenarios. Such fault scenario and recovery plan information is efficiently delivered to the processor for runtime fault recovery. The proposed optimizations are implemented in LLVM and GEM5. The results show that the proposed scheme can significantly reduce runtime recovery overhead by 72%.

URLhttp://doi.acm.org/10.1145/3061639.3062273
DOI10.1145/3061639.3062273
Citation Keyhosseini_leveraging_2017