Biblio

Found 369 results

Filters: Keyword is science of security  [Clear All Filters]
2015-11-12
Xia, Weiyi, Kantarcioglu, Murat, Wan, Zhiyu, Heatherly, Raymond, Vorobeychik, Yevgeniy, Malin, Bradley.  2015.  Process-Driven Data Privacy. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. :1021–1030.

The quantity of personal data gathered by service providers via our daily activities continues to grow at a rapid pace. The sharing, and the subsequent analysis of, such data can support a wide range of activities, but concerns around privacy often prompt an organization to transform the data to meet certain protection models (e.g., k-anonymity or E-differential privacy). These models, however, are based on simplistic adversarial frameworks, which can lead to both under- and over-protection. For instance, such models often assume that an adversary attacks a protected record exactly once. We introduce a principled approach to explicitly model the attack process as a series of steps. Specically, we engineer a factored Markov decision process (FMDP) to optimally plan an attack from the adversary's perspective and assess the privacy risk accordingly. The FMDP captures the uncertainty in the adversary's belief (e.g., the number of identied individuals that match the de-identified data) and enables the analysis of various real world deterrence mechanisms beyond a traditional protection model, such as a penalty for committing an attack. We present an algorithm to solve the FMDP and illustrate its efficiency by simulating an attack on publicly accessible U.S. census records against a real identied resource of over 500,000 individuals in a voter registry. Our results demonstrate that while traditional privacy models commonly expect an adversary to attack exactly once per record, an optimal attack in our model may involve exploiting none, one, or more indiviuals in the pool of candidates, depending on context.

2016-12-09
2016-11-17
Zbigniew Kalbarczyk, University of Illinois at Urbana-Champaign.  2015.  Resilience of Cyber Physical Systems and Technologies.

Presented at a tutorial at the Symposium and Bootcamp on the Science of Security (HotSoS 2015), April 2015.

2015-11-11
John C. Mace, Newcastle University, Charles Morisset, Newcastle University, Aad Van Moorsel, Newcastle University.  2015.  Resiliency Variance in Workflows with Choice. International Workshop on Software Engineering for Resilient Systems (SERENE 2015).

Computing a user-task assignment for a workflow coming with probabilistic user availability provides a measure of completion rate or resiliency. To a workflow designer this indicates a risk of failure, espe- cially useful for workflows which cannot be changed due to rigid security constraints. Furthermore, resiliency can help outline a mitigation strategy which states actions that can be performed to avoid workflow failures. A workflow with choice may have many different resiliency values, one for each of its execution paths. This makes understanding failure risk and mitigation requirements much more complex. We introduce resiliency variance, a new analysis metric for workflows which indicates volatility from the resiliency average. We suggest this metric can help determine the risk taken on by implementing a given workflow with choice. For instance, high average resiliency and low variance would suggest a low risk of workflow failure.

2016-11-17
Eric Badger, University of Illinois at Urbana-Champaign, Phuong Cao, University of Illinois at Urbana-Champaign, Alex Withers, University of Illinois at Urbana-Champaign, Adam Slagell, University of Illinois at Urbana-Champaign, Zbigniew Kalbarczyk, University of Illinois at Urbana-Champaign, Ravishankar Iyer, University of Illinois at Urbana-Champaign.  2015.  Scalable Data Analytics Pipeline for Real-Time Attack Detection; Design, Validation, and Deployment in a Honey Pot Environment.

This talk will explore a scalable data analytics pipeline for real-time attack detection through the use of customized honeypots at the National Center for Supercomputing Applications (NCSA). Attack detection tools are common and are constantly improving, but validating these tools is challenging. You must: (i) identify data (e.g., system-level events) that is essential for detecting attacks, (ii) extract this data from multiple data logs collected by runtime monitors, and (iii) present the data to the attack detection tools. On top of this, such an approach must scale with an ever-increasing amount of data, while allowing integration of new monitors and attack detection tools. All of these require an infrastructure to host and validate the developed tools before deployment into a production environment.

We will present a generalized architecture that aims for a real-time, scalable, and extensible pipeline that can be deployed in diverse infrastructures to validate arbitrary attack detection tools. To motivate our approach, we will show an example deployment of our pipeline based on open-sourced tools. The example deployment uses as its data sources: (i) a customized honeypot environment at NCSA and (ii) a container-based testbed infrastructure for interactive attack replay. Each of these data sources is equipped with network and host-based monitoring tools such as Bro (a network-based intrusion detection system) and OSSEC (a host-based intrusion detection system) to allow for the runtime collection of data on system/user behavior. Finally, we will present an attack detection tool that we developed and that we look to validate through our pipeline. In conclusion, the talk will discuss the challenges of transitioning attack detection from theory to practice and how the proposed data analytics pipeline can help that transition.

Presented at the Illinois Information Trust Institute Joint Trust and Security/Science of Security Seminar, October 6, 2016.

Presented at the NSA SoS Quarterly Lablet Meeting, October 2015.

2016-04-08
Abbas, Waseem, Laszka, Aron, Vorobeychik, Yevgeniy, Koutsoukos, Xenofon.  2015.  Scheduling Intrusion Detection Systems in Resource-Bounded Cyber-Physical Systems. Proceedings of the First ACM Workshop on Cyber-Physical Systems-Security and/or PrivaCy. :55–66.

In order to be resilient to attacks, a cyber-physical system (CPS) must be able to detect attacks before they can cause significant damage. To achieve this, \emph{intrusion detection systems} (IDS) may be deployed, which can detect attacks and alert human operators, who can then intervene. However, the resource-constrained nature of many CPS poses a challenge, since reliable IDS can be computationally expensive. Consequently, computational nodes may not be able to perform intrusion detection continuously, which means that we have to devise a schedule for performing intrusion detection. While a uniformly random schedule may be optimal in a purely cyber system, an optimal schedule for protecting CPS must also take into account the physical properties of the system, since the set of adversarial actions and their consequences depend on the physical systems. Here, in the context of water distribution networks, we study IDS scheduling problems in two settings and under the constraints on the available battery supplies. In the first problem, the objective is to design, for a given duration of time $T$, scheduling schemes for IDS so that the probability of detecting an attack is maximized within that duration. We propose efficient heuristic algorithms for this general problem and evaluate them on various networks. In the second problem, our objective is to design scheduling schemes for IDS so that the overall lifetime of the network is maximized while ensuring that an intruder attack is always detected. Various strategies to deal with this problem are presented and evaluated for various networks.

2016-12-09
2016-04-08
Mathieu Dahan, Saurabh Amin.  2015.  Security Games in Network Flow Problems. CoRR. abs/1512.09335

This paper considers a 2-player strategic game for network routing under link disruptions. Player 1 (defender) routes flow through a network to maximize her value of effective flow while facing transportation costs. Player 2 (attacker) simultaneously disrupts one or more links to maximize her value of lost flow but also faces cost of disrupting links. This game is strategically equivalent to a zero-sum game. Linear programming duality and the max-flow min-cut theorem are applied to obtain properties that are satisfied in any mixed Nash equilibrium. In any equilibrium, both players achieve identical payoffs. While the defender's expected transportation cost decreases in attacker's marginal value of lost flow, the attacker's expected cost of attack increases in defender's marginal value of effective flow. Interestingly, the expected amount of effective flow decreases in both these parameters. These results can be viewed as a generalization of the classical max-flow with minimum transportation cost problem to adversarial environments.

2016-04-07
Gan, Jiarui, An, Bo, Vorobeychik, Yevgeniy.  2015.  Security Games with Protection Externalities. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. :914–920.

Stackelberg security games have been widely deployed in recent years to schedule security resources. An assumption in most existing security game models is that one security resource assigned to a target only protects that target. However, in many important real-world security scenarios, when a resource is assigned to a target, it exhibits protection externalities: that is, it also protects other "neighbouring" targets. We investigate such Security Games with Protection Externalities (SPEs). First, we demonstrate that computing a strong Stackelberg equilibrium for an SPE is NP-hard, in contrast with traditional Stackelberg security games which can be solved in polynomial time. On the positive side, we propose a novel column generation based approach—CLASPE—to solve SPEs. CLASPE features the following novelties: 1) a novel mixed-integer linear programming formulation for the slave problem; 2) an extended greedy approach with a constant-factor approximation ratio to speed up the slave problem; and 3) a linear-scale linear programming that efficiently calculates the upper bounds of target-defined subproblems for pruning. Our experimental evaluation demonstrates that CLASPE enable us to scale to realistic-sized SPE problem instances.

2016-04-11
Lina Sela Perelman, Waseem Abbas, Xenofon D. Koutsoukos, Saurabh Amin.  2015.  Sensor placement for fault location identification in water networks: A minimum test cover approach. CoRR. abs/1507.07134

This paper focuses on the optimal sensor placement problem for the identification of pipe failure locations in large-scale urban water systems. The problem involves selecting the minimum number of sensors such that every pipe failure can be uniquely localized. This problem can be viewed as a minimum test cover (MTC) problem, which is NP-hard. We consider two approaches to obtain approximate solutions to this problem. In the first approach, we transform the MTC problem to a minimum set cover (MSC) problem and use the greedy algorithm that exploits the submodularity property of the MSC problem to compute the solution to the MTC problem. In the second approach, we develop a new \textit{augmented greedy} algorithm for solving the MTC problem. This approach does not require the transformation of the MTC to MSC. Our augmented greedy algorithm provides in a significant computational improvement while guaranteeing the same approximation ratio as the first approach. We propose several metrics to evaluate the performance of the sensor placement designs. Finally, we present detailed computational experiments for a number of real water distribution networks.

2015-11-17
Zhenqi Huang, University of Illinois at Urbana-Champaign, Chuchu Fan, University of Illinois at Urbana-Champaign, Alexandru Mereacre, University of Oxford, Sayan Mitra, University of Illinois at Urbana-Champaign, Marta Kwiatkowska, University of Oxford.  2015.  Simulation-based Verification of Cardiac Pacemakers with Guaranteed Coverage. Special Issue of IEEE Design and Test. 32(5)

Design and testing of pacemaker is challenging because of the need to capture the interaction between the physical processes (e.g. voltage signal in cardiac tissue) and the embedded software (e.g. a pacemaker). At the same time, there is a growing need for design and certification methodologies that can provide quality assurance for the embedded software. We describe recent progress in simulation-based techniques that are capable of ensuring guaranteed coverage. Our methods employ discrep- ancy functions, which impose bounds on system dynamics, and proceed through iteratively constructing over-approximations of the reachable set of states. We are able to prove time bounded safety or produce counterexamples. We illustrate the techniques by analyzing a family of pacemaker designs against time duration requirements and synthesize safe parameter ranges. We conclude by outlining the potential uses of this technology to improve the safety of medical device designs.

2016-12-14
Zhenqi Huang, University of Illinois at Urbana-Champaign, Yu Wang, University of Illinois at Urbana-Champaign.  2015.  SMT-Based Controller Synthesis for Linear Dynamical Systems with Adversary.

We present a controller synthesis algorithm for a discrete time reach-avoid problem in the presence of adversaries. Our model of the adversary captures typical malicious attacks envisioned on cyber-physical systems such as sensor spoofing, controller corruption, and actuator intrusion. After formulating the problem in a general setting, we present a sound and complete algorithm for the case with linear dynamics and an adversary with a budget on the total L2-norm of its actions. The algorithm relies on a result from linear control theory that enables us to decompose and precisely compute the reachable states of the system in terms of a symbolic simulation of the adversary-free dynamics and the total uncertainty induced by the adversary. We provide constraint-based synthesis algorithms for synthesizing open-loop and a closed-loop controllers using SMT solvers.

Prestented at the Joint Trust and Security/Science of Security Seminar, November 3, 2015.

2016-12-13
2016-11-15
2016-04-07
Pavlovic, Dusko.  2015.  Towards a Science of Trust. Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. :3:1–3:9.

The diverse views of science of security have opened up several alleys towards applying the methods of science to security. We pursue a different kind of connection between science and security. This paper explores the idea that security is not just a suitable subject for science,. but that the process of security is also similar to the process of science. This similarity arises from the fact that both science and security depend on the methods of inductive inference. Because of this dependency, a scientific theory can never be definitely proved, but can only be disproved by new evidence, and improved into a better theory. Because of the same dependency, every security claim and method has a lifetime, and always eventually needs to be improved.

In this general framework of security-as-science, we explore the ways to apply the methods of scientific induction in the process of trust. The process of trust building and updating is viewed as hypothesis testing. We propose to formulate the trust hypotheses by the methods of algorithmic learning, and to build more robust trust testing and vetting methodologies on the solid foundations of statistical inference.

2015-11-16
Phuong Cao, University of Illinois at Urbana-Champaign, Eric C. Badger, University of Illinois at Urbana-Champaign, Zbigniew Kalbarczyk, University of Illinois at Urbana-Champaign, Ravishankar K. Iyer, University of Illinois at Urbana-Champaign, Alexander Withers, University of Illinois at Urbana-Champaign, Adam J. Slagell, University of Illinois at Urbana-Champaign.  2015.  Towards an Unified Security Testbed and Security Analytics Framework. Symposium and Bootcamp for the Science of Security (HotSoS 2015).

This paper presents the architecture of an end-to-end secu- rity testbed and security analytics framework, which aims to: i) understand real-world exploitation of known security vulnerabilities and ii) preemptively detect multi-stage at- tacks, i.e., before the system misuse. With the increasing number of security vulnerabilities, it is necessary for secu- rity researchers and practitioners to understand: i) system and network behaviors under attacks and ii) potential ef- fects of attacks to the target infrastructure. To safely em- ulate and instrument exploits of known vulnerabilities, we use virtualization techniques to isolate attacks in contain- ers, e.g., Linux-based containers or Virtual Machines, and to deploy monitors, e.g., kernel probes or network packet captures, across a system and network stack. To infer the evolution of attack stages from monitoring data, we use a probabilistic graphical model, namely AttackTagger, that represents learned knowledge of simulated attacks in our se- curity testbed and real-world attacks. Experiments are be- ing run on a real-world deployment of the framework at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign.

2016-12-09
Jim Blythe, University of Southern California, Sean Smith, Dartmouth College.  2015.  Understanding and Accounting for Human Behavior.

Since computers are machines, it's tempting to think of computer security as purely a technical problem. However, computing systems are created, used, and maintained by humans, and exist to serve the goals of human and institutional stakeholders. Consequently, effectively addressing the security problem requires understanding this human dimension.


In this tutorial, we discuss this challenge and survey principal research approaches to it.
 

Invited Tutorial, Symposium and Bootcamp on the Science of Security (HotSoS 2015), April 2015, Urbana, IL.

2015-11-17
Xusheng Xiao, NEC Laboratories America, Nikolai Tillmann, Microsoft Research, Manuel Fahndrich, Microsoft Research, Jonathan de Halleux, Microsoft Research, Michal Moskal, Microsoft Research, Tao Xie, University of Illinois at Urbana-Champaign.  2015.  User-Aware Privacy Control via Extended Static-Information-Flow Analysis. Automated Software Engineering Journal. 22(3)

Applications in mobile marketplaces may leak private user information without notification. Existing mobile platforms provide little information on how applications use private user data, making it difficult for experts to validate appli- cations and for users to grant applications access to their private data. We propose a user-aware-privacy-control approach, which reveals how private information is used inside applications. We compute static information flows and classify them as safe/un- safe based on a tamper analysis that tracks whether private data is obscured before escaping through output channels. This flow information enables platforms to provide default settings that expose private data for only safe flows, thereby preserving privacy and minimizing decisions required from users. We build our approach into TouchDe- velop, an application-creation environment that allows users to write scripts on mobile devices and install scripts published by other users. We evaluate our approach by studying 546 scripts published by 194 users, and the results show that our approach effectively reduces the need to make access-granting choices to only 10.1 % (54) of all scripts. We also conduct a user survey that involves 50 TouchDevelop users to assess the effectiveness and usability of our approach. The results show that 90 % of the users consider our approach useful in protecting their privacy, and 54 % prefer our approach over other privacy-control approaches.

2015-11-11
Jiaqi Yan, Illinois Institute of Technology, Dong Jin, Illinois Institute of Technology.  2015.  VT-Miniet: Virtual-time-enabled Mininet for Scalable and Accurate Software-Define Network Emulation. ACM SIGCOMM Symposium on SDN Research.

The advancement of software-defined networking (SDN) technology is highly dependent on the successful transformations from in-house research ideas to real-life products. To enable such transformations, a testbed offering scalable and high fidelity networking environment for testing and evaluating new/existing designs is extremely valuable. Mininet, the most popular SDN emulator by far, is designed to achieve both accuracy and scalability by running unmodified code of network applications in lightweight Linux Containers. How- ever, Mininet cannot guarantee performance fidelity under high workloads, in particular when the number of concurrent active events is more than the number of parallel cores. In this project, we develop a lightweight virtual time system in Linux container and integrate the system with Mininet, so that all the containers have their own virtual clocks rather than using the physical system clock which reflects the se- rialized execution of multiple containers. With the notion of virtual time, all the containers perceive virtual time as if they run independently and concurrently. As a result, inter- actions between the containers and the physical system are artificially scaled, making a network appear to be ten times faster from the viewpoint of applications within the contain- ers than it actually is. We also design an adaptive virtual time scheduling subsystem in Mininet, which is responsible to balance the experiment speed and fidelity. Experimen- tal results demonstrate that embedding virtual time into Mininet significantly enhances its performance fidelity, and therefore, results in a useful platform for the SDN community to conduct scalable experiments with high fidelity.

2017-02-15
Ross Koppel, University of Pennsylvania, Sean W. Smith, Dartmouth College, Jim Blythe, University of Southern California, Vijay Kothari, Dartmouth College.  2015.  Workarounds to Computer Access in Healthcare Organizations: You Want My Password or a Dead Patient? Studies in Health Technology and Informatics Driving Quality Informatics: Fulfilling the Promise . 208

Workarounds to computer access in healthcare are sufficiently common that they often go unnoticed. Clinicians focus on patient care, not cybersecurity. We argue and demonstrate that understanding workarounds to healthcare workers’ computer access requires not only analyses of computer rules, but also interviews and observations with clinicians. In addition, we illustrate the value of shadowing clinicians and conducing focus groups to understand their motivations and tradeoffs for circumvention. Ethnographic investigation of the medical workplace emerges as a critical method of research because in the inevitable conflict between even well-intended people versus the machines, it’s the people who are the more creative, flexible, and motivated. We conducted interviews and observations with hundreds of medical workers and with 19 cybersecurity experts, CIOs, CMIOs, CTO, and IT workers to obtain their perceptions of computer security. We also shadowed clinicians as they worked. We present dozens of ways workers ingeniously circumvent security rules. The clinicians we studied were not “black hat” hackers, but just professionals seeking to accomplish their work despite the security technologies and regulations.
 

Ross Koppel, University of Pennsylvania, Sean W. Smith, Dartmouth College, Jim Blythe, University of Southern California, Vijay Kothari, Dartmouth College.  2015.  Workarounds to Computer Access in Healthcare Organizations: You Want My Password or a Dead Patient? Information Technology and Communications in Health.

Workarounds to computer access in healthcare are sufficiently common that they often go unnoticed. Clinicians focus on patient care, not cybersecurity. We argue and demonstrate that understanding workarounds to healthcare workers’ computer access requires not only analyses of computer rules, but also interviews and observations with clinicians. In addition, we illustrate the value of shadowing clinicians and conducing focus groups to understand their motivations and tradeoffs for circumvention. Ethnographic investigation of the medical workplace emerges as a critical method of research because in the inevitable conflict between even well-intended people versus the machines, it’s the people who are the more creative, flexible, and motivated. We conducted interviews and observations with hundreds of medical workers and with 19 cybersecurity experts, CIOs, CMIOs, CTO, and IT workers to obtain their perceptions of computer security. We also shadowed clinicians as they worked. We present dozens of ways workers ingeniously circumvent security rules. The clinicians we studied were not “black hat” hackers, but just professionals seeking to accomplish their work despite the security technologies and regulations.

Wenxuan Zhou, University of Illinois at Urbana-Champaign, Dong Jin, Illinois Institute of Technology, Jason Croft, University of Illinois at Urbana-Champaign, Matthew Caesar, University of Illinois at Urbana-Champaign, P. Brighten Godfrey, University of Illinois at Urbana-Champaign.  2015.  Enforcing Generalized Consistency Properties in Software-Defined Networks. 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2015).

It is critical to ensure that network policy remains consistent during state transitions. However, existing techniques impose a high cost in update delay, and/or FIB space. We propose the Customizable Consistency Generator (CCG), a fast and generic framework to support customizable consistency policies during network updates. CCG effectively reduces the task of synthesizing an update plan under the constraint of a given consistency policy to a verification problem, by checking whether an update can safely be installed in the network at a particular time, and greedily processing network state transitions to heuristically minimize transition delay. We show a large class of consistency policies are guaranteed by this greedy jeuristic alone; in addition, CCG makes judicious use of existing heavier-weight network update mechanisms to provide guarantees when necessary. As such, CCG nearly achieves the “best of both worlds”: the efficiency of simply passing through updates in most cases, with the consistency guarantees of more heavyweight techniques. Mininet and physical testbed evaluations demonstrate CCG’s capability to achieve various types of consistency, such as path and bandwidth properties, with zero switch memory overhead and up to a 3× delay reduction compared to previous solutions.

2015-11-12
Li, Bo, Vorobeychik, Yevgeniy, Li, Muqun, Malin, Bradley.  2015.  Iterative Classification for Sanitizing Large-Scale Datasets. SIAM International Conference on Data Mining.

Cheap ubiquitous computing enables the collectionof massive amounts of personal data in a wide variety of domains.Many organizations aim to share such data while obscuring fea-tures that could disclose identities or other sensitive information.Much of the data now collected exhibits weak structure (e.g.,natural language text) and machine learning approaches havebeen developed to identify and remove sensitive entities in suchdata. Learning-based approaches are never perfect and relyingupon them to sanitize data can leak sensitive information as aconsequence. However, a small amount of risk is permissiblein practice, and, thus, our goal is to balance the value ofdata published and the risk of an adversary discovering leakedsensitive information. We model data sanitization as a gamebetween 1) a publisher who chooses a set of classifiers to applyto data and publishes only instances predicted to be non-sensitiveand 2) an attacker who combines machine learning and manualinspection to uncover leaked sensitive entities (e.g., personal names). We introduce an iterative greedy algorithm for thepublisher that provably executes no more than a linear numberof iterations, and ensures a low utility for a resource-limitedadversary. Moreover, using several real world natural languagecorpora, we illustrate that our greedy algorithm leaves virtuallyno automatically identifiable sensitive instances for a state-of-the-art learning algorithm, while sharing over 93% of the original data, and completes after at most 5 iterations.

Nika Haghtalab, Aron Laszka, Ariel D. Procaccia, Yevgeniy Vorobeychik, Xenofon D. Koutsoukos.  2015.  Monitoring Stealthy Diffusion. SIAM International Conference on Data Mining.

Starting with the seminal work by Kempe et al., a broad variety of problems, such as targeted marketing and the spread of viruses and malware, have been modeled as selecting
a subset of nodes to maximize diffusion through a network. In
cyber-security applications, however, a key consideration largely ignored in this literature is stealth. In particular, an attacker often has a specific target in mind, but succeeds only if the target is reached (e.g., by malware) before the malicious payload is detected and corresponding countermeasures deployed. The dual side of this problem is deployment of a limited number of monitoring units, such as cyber-forensics specialists, so as to limit the likelihood of such targeted and stealthy diffusion processes reaching their intended targets. We investigate the problem of optimal monitoring of targeted stealthy diffusion processes, and show that a number of natural variants of this problem are NP-hard to approximate. On the positive side, we show that if stealthy diffusion starts from randomly selected nodes, the defender’s objective is submodular, and a fast greedy algorithm has provable approximation guarantees. In addition, we present approximation algorithms for the setting in which an attacker optimally responds to the placement of monitoring nodes by adaptively selecting the starting nodes for the diffusion process. Our experimental results show that the proposed algorithms are highly effective and scalable.
 

2015-11-11
Ning Liu, Illinois Institute of Technology, Adnan Haider, Illinois Institute of Technology, Xian-He Sun, Illinois Institute of Technology, Dong Jin, Illinois Institute of Technology.  2015.  FatTreeSim: Modeling a Large-scale Fat-Tree Network for HPC Systems and Data Centers Using Parallel and Discrete Even Simulation. ACM SIGSIM Conference on Principles of Advanced Discrete Simulation.

Fat-tree topologies have been widely adopted as the communication network in data centers in the past decade. Nowa- days, high-performance computing (HPC) system designers are considering using fat-tree as the interconnection network for the next generation supercomputers. For extreme-scale computing systems like the data centers and supercomput- ers, the performance is highly dependent on the intercon- nection networks. In this paper, we present FatTreeSim, a PDES-based toolkit consisting of a highly scalable fat-tree network model, with the goal of better understanding the de- sign constraints of fat-tree networking architectures in data centers and HPC systems, as well as evaluating the applica- tions running on top of the network. FatTreeSim is designed to model and simulate large-scale fat-tree networks up to millions of nodes with protocol-level fidelity. We have con- ducted extensive experiments to validate and demonstrate the accuracy, scalability and usability of FatTreeSim. On Argonne Leadership Computing Facility’s Blue Gene/Q sys- tem, Mira, FatTreeSim is capable of achieving a peak event rate of 305 M/s for a 524,288-node fat-tree model with a total of 567 billion committed events. The strong scaling experiments use up to 32,768 cores and show a near linear scalability. Comparing with a small-scale physical system in Emulab, FatTreeSim can accurately model the latency in the same fat-tree network with less than 10% error rate for most cases. Finally, we demonstrate FatTreeSim’s usability through a case study in which FatTreeSim serves as the net- work module of the YARNsim system, and the error rates for all test cases are less than 13.7%.

Best Paper Award