Visible to the public Biblio

Filters: Author is Ning Liu, Illinois Institute of Technology  [Clear All Filters]
2017-09-01
Ning Liu, Illinois Institute of Technology, Adnan Haider, Illinois Institute of Technology, Dong Jin, Illinois Institute of Technology, Xian He Sun, Illinois Institute of Technology.  2017.  Modeling and Simulation of Extreme-Scale Fat-Tree Networks for HPC Systems and Data Centers. ACM Transactions on Modeling and Computer Simulation (TOMACS). 27(July 2017):2.

As parallel and distributed systems are evolving toward extreme scale, for example, high-performance computing systems involve millions of cores and billion-way parallelism, and high- capacity storage systems require efficient access to petabyte or exabyte of data, many new challenges are posed on designing and deploying next-generation interconnection communication networks in these systems. Fat-tree networks have been widely used in both data centers and high-performance computing (HPC) systems in the past decades and are promising candidates of the next-generation extreme-scale networks. In this article, we present FatTreeSim, a simulation framework that supports modeling and simulation of extreme-scale fattree networks with the goal of understanding the design constraints of next-generation HPC and distributed systems and aiding the design and performance optimization of the applications running on these systems. We have systematically experimented FatTreeSim on Emulab and Blue Gene/Q and analyzed the scalability and fidelity of FatTreeSim with various network configurations. On the Blue Gene/Q Mira, FatTreeSim can achieve a peak performance of 305 million events per second using 16,384 cores. Finally, we have applied FatTreeSim to simulate several large-scale Hadoop YARN applications to demonstrate its usability.

2015-11-11
Ning Liu, Illinois Institute of Technology, Xian-He Sun, Illinois Institute of Technology, Dong Jin, Illinois Institute of Technology.  2015.  On Massively Parallel Simulation of Large-Scale Fat-Tree Networks for HPC Systems and Data Centers (poster). ACM SIGSIM Conference on Principles of Advanced Discrete Simulation.

Best Poster Award, ACM SIGCOMM Conference on Principles of Advanced Discrete Simulation, London, UK, June 10-12, 2015.

Ning Liu, Illinois Institute of Technology, Adnan Haider, Illinois Institute of Technology, Xian-He Sun, Illinois Institute of Technology, Dong Jin, Illinois Institute of Technology.  2015.  FatTreeSim: Modeling a Large-scale Fat-Tree Network for HPC Systems and Data Centers Using Parallel and Discrete Even Simulation. ACM SIGSIM Conference on Principles of Advanced Discrete Simulation.

Fat-tree topologies have been widely adopted as the communication network in data centers in the past decade. Nowa- days, high-performance computing (HPC) system designers are considering using fat-tree as the interconnection network for the next generation supercomputers. For extreme-scale computing systems like the data centers and supercomput- ers, the performance is highly dependent on the intercon- nection networks. In this paper, we present FatTreeSim, a PDES-based toolkit consisting of a highly scalable fat-tree network model, with the goal of better understanding the de- sign constraints of fat-tree networking architectures in data centers and HPC systems, as well as evaluating the applica- tions running on top of the network. FatTreeSim is designed to model and simulate large-scale fat-tree networks up to millions of nodes with protocol-level fidelity. We have con- ducted extensive experiments to validate and demonstrate the accuracy, scalability and usability of FatTreeSim. On Argonne Leadership Computing Facility’s Blue Gene/Q sys- tem, Mira, FatTreeSim is capable of achieving a peak event rate of 305 M/s for a 524,288-node fat-tree model with a total of 567 billion committed events. The strong scaling experiments use up to 32,768 cores and show a near linear scalability. Comparing with a small-scale physical system in Emulab, FatTreeSim can accurately model the latency in the same fat-tree network with less than 10% error rate for most cases. Finally, we demonstrate FatTreeSim’s usability through a case study in which FatTreeSim serves as the net- work module of the YARNsim system, and the error rates for all test cases are less than 13.7%.

Best Paper Award