Multi-model Testbed for the Simulation-based Evaluation of Resilience (Apr '22)

Submitted by himanshu on Thu, 04/07/2022 - 4:59pm

PI(s), Co-PI(s), Researchers:

Peter Volgyesi (PI)
Himanshu Neema (Co-PI)

HARD PROBLEM(S) ADDRESSED
This refers to Hard Problems, released in November 2012.

Security Metrics Driven Evaluation, Design, Development, and Deployment
Resilient Architectures

The goal of the Multi-model Testbed is to provide a collaborative design tool for evaluating various cyber-attack / defense strategies and their effects on the physical infrastructure. The web-based, cloud-hosted environment integrates state-of-the-art simulation engines for the different CPS domains and presents interesting research challenges as ready-to-use scenarios. Input data, model parameters, and simulation results are archived, versioned with a strong emphasis on repeatability and provenance.

PUBLICATIONS

[1] Himanshu Neema, Thomas Roth, Chenli Wang, Wenqi Guo and Anirban Bhattacharjee. 2022. "Integrating Multiple HLA Federations for Effective Simulation-Based Evaluations of CPS", 4th International Workshop on Design Automation for CPS and IoT (DESTION 2022), Milan, Italy. May 2022.

[Abstract] Cyber-Physical Systems (CPS) are complex systems of computational, physical, and human components integrated to achieve some function over one or more networks. The use of distributed simulation, or co-simulation, is one method often used to analyze the behavior and properties of these systems. High-Level Architecture (HLA) is an IEEE co-simulation standard that supports the development and orchestration of distributed simulations. However, a simple HLA federation constructed with the component simulations (i.e., federates) does not satisfy several requirements that arise in real-world use cases such as the shared use of limited physical and computational resources, the need to selectively hide information from participating federates, the creation of reusable federates and federations for supporting configurable shared services, achieving performant distributed simulations, organizing federations across different model types or application concerns, and coordinating federations across organizations with different information technology policies. This paper describes these core requirements that necessitate the use of multiple HLA federations and presents various mechanisms for constructing such integrated HLA federations. An example use case is implemented using a model-based rapid simulation integration framework called the Universal CPS Environment for Federation (UCEF) to illustrate these requirements and demonstrate techniques for integrating multiple HLA federations.

[2] Meiyi Ma, Himanshu Neema, and Janos Sztipanovits. 2022. "Recovery Planning using Simulation-Based Predictive Monitoring". Chapter accepted in Springer book by Alexander Knott at Army Research Laboratory, titled, "Autonomous Intelligent Cyber-Defense Agents", Springer, 2022. (Accepted).

[Abstract] Despite the rapid development of cybersecurity, recovery of the operation of the impacted cyber-physical system (CPS) after a cyber-attack, as a core element of cyber resilience, is often left to human decision-makers. There is a high demand for an autonomous intelligent cyber defense agent (AICA) for planning a rapid recovery. In this chapter, we introduce and demonstrate a system for recovery planning using simulation-based predictive monitoring to recover the system from attacks (cyber, physical, or hardware) and disruptions automatically. The recovery planning system first evaluates the impact of system degradation and generates courses of actions (COAs) for recovery efficiently. Then, it evaluates these COAs through integrated heterogeneous simulations that accounts for unavoidable uncertainty. By formalizing security and safety requirements, it formally verifies recovery COAs with confidence guarantees, and obtains the optimal recovery COAs. We present two recovery scenarios in smart cities to demonstrate the effectiveness of our recovery planning system.

KEY HIGHLIGHTS

Threat Modeling and Risk Analysis in Industrial Control Systems

In this effort, we are working on developing a modeling and analysis framework for threats and cybersecurity risks in Industrial Control Systems (ICS). Identification of system vulnerabilities and implementation of appropriate risk mitigation strategies are crucial for ensuring the cybersecurity of Industrial Control Systems (ICS). These system vulnerabilities must be evaluated depending on their exploitability, impact, mitigation status, and target platform and environments. Therefore, in order to assess system vulnerabilities and risk mitigation strategies quantitatively, we are focusing on threat modeling and risk analysis methods for the cybersecurity of Railway Transportation Systems (RTS), which are real-world ICS and have become increasing vulnerable to cyber-attacks due to growing reliance on networked physical and computation components.

Another interesting aspects of RTS is that these systems have a continuously changing network topology due to moving locomotives. These systems, in general, are cyber-physical systems with integral but non-stationary components. The key challenge posed by non-stationarity is the evolving natural of threats and vulnerability propagation owing to dynamic network connections that form and disappear as components move!

Our framework dealing with this effort is called the Risk Analysis Framework (RAF). RAF has seven major components. The first component is modeling environment for system architecture where the ICS can be modeled with complete component hierarchy and the communication network topology. The second component allows for modeling cyber vulnerabilities, specifying attack ports and risk mitigation actions, and risk flows across components through attack ports. It also enables creating a library of cyber exploits and mitigations. The third component provides for validation of all models. The fourth component is for vulnerability assessment that propagates the risk with the system through network connections and hierarchy composition and generates the component attack trees and system attack graphs. It also rank orders the system vulnerabilities in order decreasing order of their impact on the overall system's cyber risk. The fifth component is for generation of code and artifacts from the risk assessments. The sixth component is a major tool for risk management planning which allows for cyber gaming various available risk mitigation actions against potential cyber exploits. The seventh component is for visualization of results and for analysis. We already visualize component attack trees and system attack trees. The work on visualization of risk management analysis is ongoing.

In our work, we have been successful in modeling the dynamic network connections and integrating it into dynamic vulnerability propagation algorithms. We previously show-cased our work at the HotSoS'21 symposium. Subsequently, we extended the framework to incorporate cyber-gaming of exploits versus mitigations to plan for worst-case attacks as well as developed methods to deal with dynamic network connections where the vulnerabilities and their propagation via changing network connectivity continually changes. We have published this work in the 16th International Conference on Critical Information Infrastructures Security (CRITIS' 2021) and presented it at the conference, where it was well-received.

We are working on further improving the methods and algorithms for dynamic risk management using cyber scenarios as well as on integrating this framework with our tools that enable integrated simulation based quantitative evaluations of cybersecurity of CPS.

General-Purpose ML Attack Library

Based on our previous work in CPSWT framework on general-purpose cyber-attack library and its use in resilience evaluation using courses-of-action, we started investigating the idea of creating a general-purpose ML attack library. The idea is that these ML attacks will be designed to be generic and can be quickly adapted to attack and test resilience of different ML models flexibly with simple configurations for customization. This work is in initial stages and we are targeting to use deepforge platform for developing these configurable, reusable ML attack library. The deepforge platform uses WebGME as the metamodeling environment and supports Keras ML library for developing ML pipelines. We will use the same platform for developing the ML attack library.

Resilient Consensus using Centerpoint Algorithm and Hashgraph Blockchain Based Communication

We have started to work on resilient multi-agent system that integrates a centerpoint algorithm and Hashgraph technology to counteract byzantine and DDos attacks. The centerpoint is an aggregate function that is used to find a safe point in the convex hull of normal agent estimates, ensuring that we will find global convergence of the true objective with byzantine agents who are trying to shift the objective. The centerpoint algorithm is being developed for a three-dimensional space. Currently, we have developed a three-dimensional algorithm that utilizes a centerpoint algorithm to converge without byzantine attackers. We are looking to further improve this algorithm to work with byzantine agents.

In tandem with the centerpoint algorithm, we are developing a method for agents to exchange messages by communicating through the Hashgraph Blockchain. The Hashgraph Blockchain ensures that the network will still be stable with up to one third of the agents attacked. This resilient communication technique will ensure that the agents will have the correct data to aggregate with the centerpoint algorithm.

For testing, we are using Microsoft AirSim as a simulator for a multi-drone system. To emulate the drones, we have setup three virtual machines with PX4 Flight Controller software-in-the-loop. PX4 acts as the drone, while the AirSim simulator will be running on another machine to visualize the movements of the drone. Also, each virtual machine will have an Hashgraph API installed to interact with the Hashgraph Blockchain client and send messages to other agents. The agents will utilize the Hashgraph client and the centerpoint algorithm to solve target pursuit and formation control problems during attacks.

Resilient Target Pursuit for Multi-UAV Systems

Unmanned Aerial Vehicles (UAVs) are gaining popularity for distributed systems that are used for a variety of tasks, such as inspection of dangerous environments, surveillance, and pursuit of a target. These systems use distributed machine learning algorithms to cooperate towards achieving an objective and are prone to denial of service (DoS) and integrity attacks. We worked on integrating a messaging mechanism and a distributed implementation of a stochastic gradient descent (SGD) algorithm in a cooperative network for target pursuit that is resilient against these attacks. The cooperative network contains agents that send messages containing local data and estimates and uses the SGD algorithm to optimize the global loss by aggregating immediate neighbors' estimates. Compromised agents can suffer from a DoS attack to disrupt the ordering of messages or an integrity attack where one agent sends arbitrary estimates to neighbors to disrupt the convergence of normal agents towards an optimum state. The messaging mechanism uses a novel Hashgraph blockchain-like consensus algorithm to guarantee a correct ordering of messages. The aggregation of estimates in the SGD algorithm uses a centerpoint-based aggregation that guarantees convergence with a small number of Byzantine agents. We evaluate our network using scenarios of target pursuit numerically and in a multi-UAV simulation in Microsoft AirSim with PX4 flight controllers. The evaluation results demonstrate cases with the distributed system, with the Hashgraph and centerpoint-based aggregation, under attack where the system is resilient and converges to the approximate optimum state.

To evaluate the performance of the integrated algorithm we set up a network of five network nodes. Each node is represented by a VirtualBox virtual machine (VM) with a unique internal IPv4 address. On each node resides a configuration and a software development toolkit (SDK) application for the SWIRLDS Hashgraph Consensus Algorithm. The configuration includes the IPv4 addresses of the nodes, including itself, in the network as well as the unique publisher-subscriber ports for each node. Furthermore, the cooperative SGD algorithm with hashgraph messaging and centerpoint resides on each node in an executable format. Each agent in the mobile adaptive network will be associated with its respective node connected to the hashgraph. For the cooperative algorithm to run successfully, the SWIRLDS Hashgraph application is run on all participating nodes to connect the nodes to the Hashgraph algorithm and allow the API to interact with it. The API used to interact with the SWIRLDS application in this paper is the ZeroMQ open-source universal messaging library. ZeroMQ is used to send and receive atomic messages over sockets to exchange data with hashgraph. The library is compatible with the publisher-subscriber asynchronous messaging format used in the approach.

We carried out similar experiments on AirSim, a simulator in Unreal Engine for drones developed by Microsoft. The simulator is compatible with flight controllers such as PX4 for software in the loop (SITL) virtual simulation and hardware in the loop (HITL) physical simulation. Experiments in AirSim with PX4 allow for a realistic simulation of a system of autonomous or manually controlled drones.

We consider the same experimental setup of five agents with an additional setup for AirSim and PX4. Also, we add a small factor γ = 0.01 for minimal collision and to ensure the drones do not converge at the same location but the center of the drone locations will be at the target. Each agent contains a PX4 flight controller with an AirSim API to interact with a respective drone in the AirSim simulator. The initial positions of the agents are selected arbitrarily around a global position of (0, 0, 0) with a target position of (-10, 10, 10). Before execution of the cooperative target pursuit algorithm, the drones takeoff and elevate to an altitude of 10 meters since movement in the algorithm is only two-dimensional. The algorithm and hashgraph messaging system then operate in the same manner as the previous experiment to converge to the target. For AirSim testing, we evaluate one scenario without attack and three different attack scenarios. The first attack scenario is a simulation of a denial of service attack on one randomly selected agent. The compromised agent is unable to send messages for 250 iterations. The second attack is similar to the first except the denial of service attack occurs around iteration 100. This means that the system mirrors the scenario without attack up until iteration 100 where it dynamically changes. The final attack scenario is a simulation of an integrity attack. One agent is randomly selected as a compromised agent that will send messages with estimates of a false target at (20, -4, 10) in an attempt to cause the other agents to not converge on the actual target. The byzantine agent does not use the cooperative algorithm and only travels towards the false target.

The following figures show the network deployments of cooperative SGD with centerpoint and hashgraph under no attack and with attack.

Initial Network

(a) Initial Network

(b) Final Network (Without Attack)

(c) Final Network (Initial DoS Attack)

(d) Final Network (Mid DoS Attack)

(e) Final Network (Integrity Attack)

Figure (a) shows the initial state of the simulation and Figures (e) are the final states of the simulation after 250 iterations. The small orange sphere on the right of each image represents the point above the target location of (-10, 10, 10). It was placed above to prevent collisions from the drones which would interfere with the experiment. Figures (b) and (c) show that the center of the system of drones converges to the target without adversaries and with the simple denial of service attack. For the more complex attacks in Figures (d) and (e) the center of the system is still able to converge near the target location. We interpret this as the problem of convergence being more challenging but the system is still able to achieve convergence to the approximate target.

EDUCATIONAL ADVANCES and OUTREACH

Presentations and discussion with US Army 101 st Airborne Division

January 18, 2022: CPTLauren Hansen, Deputy Innovation Officer

February 10, 2020: COL John Lubas, Deputy Commanding Officer

Discussion topics:

Overview of the Lablet project and results with potential future collaboration on wireless/RF security.

Presentations and discussion with Dynetics, Inc.

March 1, 2022: Project presentation and collaboration planning on AI and secure critical infrastrucutres.

Collaboration with NIST on threat modeling and risk analysis in ICS

Discussion topics:

    Threat modeling in Railway ICS
    Risk Analysis
    Quantitative Risk Evaluation
    Integration with Simulation-Based Evaluation

VU/ISIS Summer Internship Seminar Series

Dr. Himanshu Neema is currently advising an undergraduate student for his internship at our institute. Please note that these students are working with our technologies, but the internships are not funded by this project. The project of this internship is "Evaluation of Vector Control and Social Policies on Pathogen Spread within Communities." This project aims to utilize agent-based simulations for modeling arthropod behavior and human activities as well as social policies for vector control and for changing human behavior in order to evaluate how these affect the spread of pathogens in humans through mosquito bites. We plan to use integrated simulations for these evaluations. We have developed RESTful APIs for the creation, configuration, parameterization, execution, and control of the disease simulations. Currently, we are working on creating a model based experimentation environment using these REST APIs. The current work also involves developing a reinforcement learning algorithm for learning effective vector control policies within the constraints of the local county health department. Additionally, this work is being converted into a web-acessible design studio for other researchers to experment with the platform.

Groups:

Cyber-Physical Systems Virtual Organization

Read-only archive of site from September 29, 2023.

Multi-model Testbed for the Simulation-based Evaluation of Resilience (Apr '22)