Improving MPI Collective I/O for High Volume Non-Contiguous Requests With Intra-Node Aggregation

Submitted by grigby1 on Fri, 10/30/2020 - 12:19pm

Title	Improving MPI Collective I/O for High Volume Non-Contiguous Requests With Intra-Node Aggregation
Publication Type	Journal Article
Year of Publication	2020
Authors	Kang, Qiao, Lee, Sunwoo, Hou, Kaiyuan, Ross, Robert, Agrawal, Ankit, Choudhary, Alok, Liao, Wei-keng
Journal	IEEE Transactions on Parallel and Distributed Systems
Volume	31
Pagination	2682—2695
ISSN	1558-2183
Keywords	Aggregates, Benchmark testing, collective MPI-IO functions, Cray XC40 parallel computers, Haswell processors, high volume noncontiguous requests, I-O Systems, i-o systems security, input-output programs, Intel KNL, internode communication contention, intranode aggregation, Libraries, message passing, MPI collective I/O, MPI processes, non-contiguous I/O, Parallel I/O, parallel jobs, parallel processing, performance evaluation, Production, Program processors, pubcrawl, request redistribution, Scalability, security, two-phase I/O, Writing
Abstract	Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistributes I/O requests among the calling processes into a form that minimizes the file access costs. As modern parallel computers continue to grow into the exascale era, the communication cost of such request redistribution can quickly overwhelm collective I/O performance. This effect has been observed from parallel jobs that run on multiple compute nodes with a high count of MPI processes on each node. To reduce the communication cost, we present a new design for collective I/O by adding an extra communication layer that performs request aggregation among processes within the same compute nodes. This approach can significantly reduce inter-node communication contention when redistributing the I/O requests. We evaluate the performance and compare it with the original two-phase I/O on Cray XC40 parallel computers (Theta and Cori) with Intel KNL and Haswell processors. Using I/O patterns from two large-scale production applications and an I/O benchmark, we show our proposed method effectively reduces the communication cost and hence maintains the scalability for a large number of processes.
URL	https://ieeexplore.ieee.org/document/9109678
DOI	10.1109/TPDS.2020.3000458
Citation Key	kang_improving_2020

Groups:

Science of Security VO