IPAS: Intelligent Protection Against Silent Output Corruption in Scientific Applications

Submitted by grigby1 on Mon, 04/24/2017 - 11:49am

Title	IPAS: Intelligent Protection Against Silent Output Corruption in Scientific Applications
Publication Type	Conference Paper
Year of Publication	2016
Authors	Laguna, Ignacio, Schulz, Martin, Richards, David F., Calhoun, Jon, Olson, Luke
Conference Name	Proceedings of the 2016 International Symposium on Code Generation and Optimization
Publisher	ACM
Conference Location	New York, NY, USA
ISBN Number	978-1-4503-3778-6
Keywords	compiler analysis, high-performance computing, machine learning, pubcrawl, resilience, Resiliency
Abstract	This paper presents IPAS, an instruction duplication technique that protects scientific applications from silent data corruption (SDC) in their output. The motivation for IPAS is that, due to natural error masking, only a subset of SDC errors actually affects the output of scientific codes--we call these errors silent output corruption (SOC) errors. Thus applications require duplication only on code that, when affected by a fault, yields SOC. We use machine learning to learn code instructions that must be protected to avoid SOC, and, using a compiler, we protect only those vulnerable instructions by duplication, thus significantly reducing the overhead that is introduced by instruction duplication. In our experiments with five workloads, IPAS reduces the percentage of SOC by up to 90% with a slowdown that ranges between 1.04x and 1.35x, which corresponds to as much as 47% less slowdown than state-of-the-art instruction duplication techniques.
URL	http://doi.acm.org/10.1145/2854038.2854059
DOI	10.1145/2854038.2854059
Citation Key	laguna_ipas:_2016

Groups:

Science of Security VO