Leveraging Compiler Optimizations to Reduce Runtime Fault Recovery Overhead
Title | Leveraging Compiler Optimizations to Reduce Runtime Fault Recovery Overhead |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Hosseini, Fateme S., Fotouhi, Pouya, Yang, Chengmo, Gao, Guang R. |
Conference Name | Proceedings of the 54th Annual Design Automation Conference 2017 |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4927-7 |
Keywords | pubcrawl, Resiliency, System recovery |
Abstract | Smaller feature size, lower supply voltage, and faster clock rates have made modern computer systems more susceptible to faults. Although previous fault tolerance techniques usually target a relatively low fault rate and consider error recovery less critical, with the advent of higher fault rates, recovery overhead is no longer negligible. In this paper, we propose a scheme that leverages and revises a set of compiler optimizations to design, for each application hotspot, a smart recovery plan that identifies the minimal set of instructions to be re-executed in different fault scenarios. Such fault scenario and recovery plan information is efficiently delivered to the processor for runtime fault recovery. The proposed optimizations are implemented in LLVM and GEM5. The results show that the proposed scheme can significantly reduce runtime recovery overhead by 72%. |
URL | http://doi.acm.org/10.1145/3061639.3062273 |
DOI | 10.1145/3061639.3062273 |
Citation Key | hosseini_leveraging_2017 |