Visible to the public Biblio

Filters: Author is Ross, Robert  [Clear All Filters]
2022-07-01
Matri, Pierre, Ross, Robert.  2021.  Neon: Low-Latency Streaming Pipelines for HPC. 2021 IEEE 14th International Conference on Cloud Computing (CLOUD). :698—707.
Real time data analysis in the context of e.g. realtime monitoring or computational steering is an important tool in many fields of science, allowing scientists to make the best use of limited resources such as sensors and HPC platforms. These tools typically rely on large amounts of continuously collected data that needs to be processed in near-real time to avoid wasting compute, storage, and networking resources. Streaming pipelines are a natural fit for this use case but are inconvenient to use on high-performance computing (HPC) systems because of the diverging system software environment with big data, increasing both the cost and the complexity of the solution. In this paper we propose Neon, a clean-slate design of a streaming data processing framework for HPC systems that enables users to create arbitrarily large streaming pipelines. The experimental results on the Bebop supercomputer show significant performance improvements compared with Apache Storm, with up to 2x increased throughput and reduced latency.
2020-10-30
Kang, Qiao, Lee, Sunwoo, Hou, Kaiyuan, Ross, Robert, Agrawal, Ankit, Choudhary, Alok, Liao, Wei-keng.  2020.  Improving MPI Collective I/O for High Volume Non-Contiguous Requests With Intra-Node Aggregation. IEEE Transactions on Parallel and Distributed Systems. 31:2682—2695.

Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistributes I/O requests among the calling processes into a form that minimizes the file access costs. As modern parallel computers continue to grow into the exascale era, the communication cost of such request redistribution can quickly overwhelm collective I/O performance. This effect has been observed from parallel jobs that run on multiple compute nodes with a high count of MPI processes on each node. To reduce the communication cost, we present a new design for collective I/O by adding an extra communication layer that performs request aggregation among processes within the same compute nodes. This approach can significantly reduce inter-node communication contention when redistributing the I/O requests. We evaluate the performance and compare it with the original two-phase I/O on Cray XC40 parallel computers (Theta and Cori) with Intel KNL and Haswell processors. Using I/O patterns from two large-scale production applications and an I/O benchmark, we show our proposed method effectively reduces the communication cost and hence maintains the scalability for a large number of processes.

2017-05-18
Ross, Caitlin, Carothers, Christopher D., Mubarak, Misbah, Carns, Philip, Ross, Robert, Li, Jianping Kelvin, Ma, Kwan-Liu.  2016.  Visual Data-analytics of Large-scale Parallel Discrete-event Simulations. Proceedings of the 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems. :87–97.

Parallel discrete-event simulation (PDES) is an important tool in the codesign of extreme-scale systems because PDES provides a cost-effective way to evaluate designs of high-performance computing systems. Optimistic synchronization algorithms for PDES, such as Time Warp, allow events to be processed without global synchronization among the processing elements. A rollback mechanism is provided when events are processed out of timestamp order. Although optimistic synchronization protocols enable the scalability of large-scale PDES, the performance of the simulations must be tuned to reduce the number of rollbacks and provide an improved simulation runtime. To enable efficient large-scale optimistic simulations, one has to gain insight into the factors that affect the rollback behavior and simulation performance. We developed a tool for ROSS model developers that gives them detailed metrics on the performance of their large-scale optimistic simulations at varying levels of simulation granularity. Model developers can use this information for parameter tuning of optimistic simulations in order to achieve better runtime and fewer rollbacks. In this work, we instrument the ROSS optimistic PDES framework to gather detailed statistics about the simulation engine. We have also developed an interactive visualization interface that uses the data collected by the ROSS instrumentation to understand the underlying behavior of the simulation engine. The interface connects real time to virtual time in the simulation and provides the ability to view simulation data at different granularities. We demonstrate the usefulness of our framework by performing a visual analysis of the dragonfly network topology model provided by the CODES simulation framework built on top of ROSS. The instrumentation needs to minimize overhead in order to accurately collect data about the simulation performance. To ensure that the instrumentation does not introduce unnecessary overhead, we perform a scaling study that compares instrumented ROSS simulations with their noninstrumented counterparts in order to determine the amount of perturbation when running at different simulation scales.