Visible to the public Dissecting On-Node Memory Access Performance: A Semantic Approach

TitleDissecting On-Node Memory Access Performance: A Semantic Approach
Publication TypeConference Paper
Year of Publication2014
AuthorsGimenez, A., Gamblin, T., Rountree, B., Bhatele, A., Jusufi, I., Bremer, P.-T., Hamann, B.
Conference NameHigh Performance Computing, Networking, Storage and Analysis, SC14: International Conference for
Date PublishedNov
Keywordsattribute semantic information, code regions, Context, CPU manufacturers, data motion, data objects, design decisions, distributed memory systems, domain decomposition, fine-grained memory access performance data, Hardware, Kernel, Libraries, memory access optimization, memory behaviour, multi-threading, multithreading, on-node memory access performance, performance ramifications, PMU, power efficiency, Program processors, sampled memory accesses, sampling-based performance measurement units, semantic approach, Semantics, storage management, Topology
Abstract

Optimizing memory access is critical for performance and power efficiency. CPU manufacturers have developed sampling-based performance measurement units (PMUs) that report precise costs of memory accesses at specific addresses. However, this data is too low-level to be meaningfully interpreted and contains an excessive amount of irrelevant or uninteresting information. We have developed a method to gather fine-grained memory access performance data for specific data objects and regions of code with low overhead and attribute semantic information to the sampled memory accesses. This information provides the context necessary to more effectively interpret the data. We have developed a tool that performs this sampling and attribution and used the tool to discover and diagnose performance problems in real-world applications. Our techniques provide useful insight into the memory behaviour of applications and allow programmers to understand the performance ramifications of key design decisions: domain decomposition, multi-threading, and data motion within distributed memory systems.

URLhttps://ieeexplore.ieee.org/document/7013001
DOI10.1109/SC.2014.19
Citation Key7013001