Visible to the public Biblio

Filters: Author is Xian-He Sun, Illinois Institute of Technology  [Clear All Filters]
2015-11-11
Ning Liu, Illinois Institute of Technology, Xian-He Sun, Illinois Institute of Technology, Dong Jin, Illinois Institute of Technology.  2015.  On Massively Parallel Simulation of Large-Scale Fat-Tree Networks for HPC Systems and Data Centers (poster). ACM SIGSIM Conference on Principles of Advanced Discrete Simulation.

Best Poster Award, ACM SIGCOMM Conference on Principles of Advanced Discrete Simulation, London, UK, June 10-12, 2015.

Ning Liu, Illinois Institute of Technology, Adnan Haider, Illinois Institute of Technology, Xian-He Sun, Illinois Institute of Technology, Dong Jin, Illinois Institute of Technology.  2015.  FatTreeSim: Modeling a Large-scale Fat-Tree Network for HPC Systems and Data Centers Using Parallel and Discrete Even Simulation. ACM SIGSIM Conference on Principles of Advanced Discrete Simulation.

Fat-tree topologies have been widely adopted as the communication network in data centers in the past decade. Nowa- days, high-performance computing (HPC) system designers are considering using fat-tree as the interconnection network for the next generation supercomputers. For extreme-scale computing systems like the data centers and supercomput- ers, the performance is highly dependent on the intercon- nection networks. In this paper, we present FatTreeSim, a PDES-based toolkit consisting of a highly scalable fat-tree network model, with the goal of better understanding the de- sign constraints of fat-tree networking architectures in data centers and HPC systems, as well as evaluating the applica- tions running on top of the network. FatTreeSim is designed to model and simulate large-scale fat-tree networks up to millions of nodes with protocol-level fidelity. We have con- ducted extensive experiments to validate and demonstrate the accuracy, scalability and usability of FatTreeSim. On Argonne Leadership Computing Facility’s Blue Gene/Q sys- tem, Mira, FatTreeSim is capable of achieving a peak event rate of 305 M/s for a 524,288-node fat-tree model with a total of 567 billion committed events. The strong scaling experiments use up to 32,768 cores and show a near linear scalability. Comparing with a small-scale physical system in Emulab, FatTreeSim can accurately model the latency in the same fat-tree network with less than 10% error rate for most cases. Finally, we demonstrate FatTreeSim’s usability through a case study in which FatTreeSim serves as the net- work module of the YARNsim system, and the error rates for all test cases are less than 13.7%.

Best Paper Award