Improving MPI Collective I/O for High Volume Non-Contiguous Requests With Intra-Node Aggregation
Title | Improving MPI Collective I/O for High Volume Non-Contiguous Requests With Intra-Node Aggregation |
Publication Type | Journal Article |
Year of Publication | 2020 |
Authors | Kang, Qiao, Lee, Sunwoo, Hou, Kaiyuan, Ross, Robert, Agrawal, Ankit, Choudhary, Alok, Liao, Wei-keng |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 31 |
Pagination | 2682—2695 |
ISSN | 1558-2183 |
Keywords | Aggregates, Benchmark testing, collective MPI-IO functions, Cray XC40 parallel computers, Haswell processors, high volume noncontiguous requests, I-O Systems, i-o systems security, input-output programs, Intel KNL, internode communication contention, intranode aggregation, Libraries, message passing, MPI collective I/O, MPI processes, non-contiguous I/O, Parallel I/O, parallel jobs, parallel processing, performance evaluation, Production, Program processors, pubcrawl, request redistribution, Scalability, security, two-phase I/O, Writing |
Abstract | Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistributes I/O requests among the calling processes into a form that minimizes the file access costs. As modern parallel computers continue to grow into the exascale era, the communication cost of such request redistribution can quickly overwhelm collective I/O performance. This effect has been observed from parallel jobs that run on multiple compute nodes with a high count of MPI processes on each node. To reduce the communication cost, we present a new design for collective I/O by adding an extra communication layer that performs request aggregation among processes within the same compute nodes. This approach can significantly reduce inter-node communication contention when redistributing the I/O requests. We evaluate the performance and compare it with the original two-phase I/O on Cray XC40 parallel computers (Theta and Cori) with Intel KNL and Haswell processors. Using I/O patterns from two large-scale production applications and an I/O benchmark, we show our proposed method effectively reduces the communication cost and hence maintains the scalability for a large number of processes. |
URL | https://ieeexplore.ieee.org/document/9109678 |
DOI | 10.1109/TPDS.2020.3000458 |
Citation Key | kang_improving_2020 |
- MPI collective I/O
- Writing
- two-phase I/O
- security
- Scalability
- request redistribution
- pubcrawl
- Program processors
- Production
- performance evaluation
- parallel processing
- parallel jobs
- Parallel I/O
- non-contiguous I/O
- MPI processes
- Aggregates
- message passing
- Libraries
- intranode aggregation
- internode communication contention
- Intel KNL
- input-output programs
- i-o systems security
- I-O Systems
- high volume noncontiguous requests
- Haswell processors
- Cray XC40 parallel computers
- collective MPI-IO functions
- Benchmark testing