Defense Against Advanced Persistent Threats: Optimal Network Security Hardening Using Multi-stage Maze Network Game

Submitted by grigby1 on Fri, 01/22/2021 - 1:37pm

Title	Defense Against Advanced Persistent Threats: Optimal Network Security Hardening Using Multi-stage Maze Network Game
Publication Type	Conference Paper
Year of Publication	2020
Authors	Zhang, H., Liu, H., Liang, J., Li, T., Geng, L., Liu, Y., Chen, S.
Conference Name	2020 IEEE Symposium on Computers and Communications (ISCC)
Keywords	advanced persistent threat, advanced persistent threats, APT, Attack Graphs, continuous method, game model, game theoretic security, game theory, game-theoretic framework, Human Behavior, human factors, learning (artificial intelligence), Markov processes, Metrics, Multistage Maze Network Game, multistage plan, Nash equilibrium, network attacks, optimal network security, policy hill-climbing, policy hill-climbing algorithm, Predictive Metrics, pubcrawl, reinforcement learning (RL), resilience, Resiliency, Scalability, security of data, serious privacy leakage, Stackelberg games, stealthy method
Abstract	Advanced Persistent Threat (APT) is a stealthy, continuous and sophisticated method of network attacks, which can cause serious privacy leakage and millions of dollars losses. In this paper, we introduce a new game-theoretic framework of the interaction between a defender who uses limited Security Resources(SRs) to harden network and an attacker who adopts a multi-stage plan to attack the network. The game model is derived from Stackelberg games called a Multi-stage Maze Network Game (M2NG) in which the characteristics of APT are fully considered. The possible plans of the attacker are compactly represented using attack graphs(AGs), but the compact representation of the attacker's strategies presents a computational challenge and reaching the Nash Equilibrium(NE) is NP-hard. We present a method that first translates AGs into Markov Decision Process(MDP) and then achieves the optimal SRs allocation using the policy hill-climbing(PHC) algorithm. Finally, we present an empirical evaluation of the model and analyze the scalability and sensitivity of the algorithm. Simulation results exhibit that our proposed reinforcement learning-based SRs allocation is feasible and efficient.
DOI	10.1109/ISCC50000.2020.9219722
Citation Key	zhang_defense_2020

Groups:

Science of Security VO