Visible to the public Hiding the Long Latency of Persist Barriers Using Speculative Execution

TitleHiding the Long Latency of Persist Barriers Using Speculative Execution
Publication TypeConference Paper
Year of Publication2017
AuthorsShin, S., Tuck, J., Solihin, Y.
Conference Name2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
ISBN Number978-1-4503-4892-8
Keywordscache storage, checkpoint-based processing, checkpointing, clflushopt, clwb, Collaboration, common data structures, consistent state, data structures, DRAM, DRAM chips, expensive fence operations, fail-safe code, Failure Safety, file system, Force, human factors, logging based transactions, long latency persistency operations, Metrics, modern systems reorder memory operations, non-volatile main memory, nonpersistent implementations, NonVolatile Main Memory, Nonvolatile memory, nonvolatile memory technology, NVMM, pcommit, performance bottleneck, performance overhead, persist barriers, persistence instructions, policy-based governance, Policy-Governed Secure Collaboration, pubcrawl, Random access memory, random-access storage, resilience, Resiliency, Safe Coding, Safety, significant execution time overhead, Software, speculative execution, Speculative Persistence, speculative persistence architecture, storage management, substantial performance boost, volatile caches
Abstract

Byte-addressable non-volatile memory technology is emerging as an alternative for DRAM for main memory. This new Non-Volatile Main Memory (NVMM) allows programmers to store important data in data structures in memory instead of serializing it to the file system, thereby providing a substantial performance boost. However, modern systems reorder memory operations and utilize volatile caches for better performance, making it difficult to ensure a consistent state in NVMM. Intel recently announced a new set of persistence instructions, clflushopt, clwb, and pcommit. These new instructions make it possible to implement fail-safe code on NVMM, but few workloads have been written or characterized using these new instructions. In this work, we describe how these instructions work and how they can be used to implement write-ahead logging based transactions. We implement several common data structures and kernels and evaluate the performance overhead incurred over traditional non-persistent implementations. In particular, we find that persistence instructions occur in clusters along with expensive fence operations, they have long latency, and they add a significant execution time overhead, on average by 20.3% over code with logging but without fence instructions to order persists. To deal with this overhead and alleviate the performance bottleneck, we propose to speculate past long latency persistency operations using checkpoint-based processing. Our speculative persistence architecture reduces the execution time overheads to only 3.6%.

URLhttps://dl.acm.org/citation.cfm?doid=3079856.3080240
DOI10.1145/3079856.3080240
Citation Keyshin_hiding_2017