Visible to the public Biblio

Filters: Keyword is Markov Decision Process  [Clear All Filters]
2023-07-11
Hammar, Kim, Stadler, Rolf.  2022.  An Online Framework for Adapting Security Policies in Dynamic IT Environments. 2022 18th International Conference on Network and Service Management (CNSM). :359—363.

We present an online framework for learning and updating security policies in dynamic IT environments. It includes three components: a digital twin of the target system, which continuously collects data and evaluates learned policies; a system identification process, which periodically estimates system models based on the collected data; and a policy learning process that is based on reinforcement learning. To evaluate our framework, we apply it to an intrusion prevention use case that involves a dynamic IT infrastructure. Our results demonstrate that the framework automatically adapts security policies to changes in the IT infrastructure and that it outperforms a state-of-the-art method.

2023-06-09
Rizwan, Kainat, Ahmad, Mudassar, Habib, Muhammad Asif.  2022.  Cyber Automated Network Resilience Defensive Approach against Malware Images. 2022 International Conference on Frontiers of Information Technology (FIT). :237—242.
Cyber threats have been a major issue in the cyber security domain. Every hacker follows a series of cyber-attack stages known as cyber kill chain stages. Each stage has its norms and limitations to be deployed. For a decade, researchers have focused on detecting these attacks. Merely watcher tools are not optimal solutions anymore. Everything is becoming autonomous in the computer science field. This leads to the idea of an Autonomous Cyber Resilience Defense algorithm design in this work. Resilience has two aspects: Response and Recovery. Response requires some actions to be performed to mitigate attacks. Recovery is patching the flawed code or back door vulnerability. Both aspects were performed by human assistance in the cybersecurity defense field. This work aims to develop an algorithm based on Reinforcement Learning (RL) with a Convoluted Neural Network (CNN), far nearer to the human learning process for malware images. RL learns through a reward mechanism against every performed attack. Every action has some kind of output that can be classified into positive or negative rewards. To enhance its thinking process Markov Decision Process (MDP) will be mitigated with this RL approach. RL impact and induction measures for malware images were measured and performed to get optimal results. Based on the Malimg Image malware, dataset successful automation actions are received. The proposed work has shown 98% accuracy in the classification, detection, and autonomous resilience actions deployment.
2022-08-26
Nguyen, Lan K., Nguyen, Duy H. N., Tran, Nghi H., Bosler, Clayton, Brunnenmeyer, David.  2021.  SATCOM Jamming Resiliency under Non-Uniform Probability of Attacks. MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM). :85—90.
This paper presents a new framework for SATCOM jamming resiliency in the presence of a smart adversary jammer that can prioritize specific channels to attack with a non-uniform probability of distribution. We first develop a model and a defense action strategy based on a Markov decision process (MDP). We propose a greedy algorithm for the MDP-based defense algorithm's policy to optimize the expected user's immediate and future discounted rewards. Next, we remove the assumption that the user has specific information about the attacker's pattern and model. We develop a Q-learning algorithm-a reinforcement learning (RL) approach-to optimize the user's policy. We show that the Q-learning method provides an attractive defense strategy solution without explicit knowledge of the jammer's strategy. Computer simulation results show that the MDP-based defense strategies are very efficient; they offer a significant data rate advantage over the simple random hopping approach. Also, the proposed Q-learning performance can achieve close to the MDP approach without explicit knowledge of the jammer's strategy or attacking model.
2021-10-12
Paul, Shuva, Ni, Zhen, Ding, Fei.  2020.  An Analysis of Post Attack Impacts and Effects of Learning Parameters on Vulnerability Assessment of Power Grid. 2020 IEEE Power Energy Society Innovative Smart Grid Technologies Conference (ISGT). :1–5.
Due to the increasing number of heterogeneous devices connected to electric power grid, the attack surface increases the threat actors. Game theory and machine learning are being used to study the power system failures caused by external manipulation. Most of existing works in the literature focus on one-shot process of attacks and fail to show the dynamic evolution of the defense strategy. In this paper, we focus on an adversarial multistage sequential game between the adversaries of the smart electric power transmission and distribution system. We study the impact of exploration rate and convergence of the attack strategies (sequences of action that creates large scale blackout based on the system capacity) based on the reinforcement learning approach. We also illustrate how the learned attack actions disrupt the normal operation of the grid by creating transmission line outages, bus voltage violations, and generation loss. This simulation studies are conducted on IEEE 9 and 39 bus systems. The results show the improvement of the defense strategy through the learning process. The results also prove the feasibility of the learned attack actions by replicating the disturbances created in simulated power system.
2021-06-02
Avula, Ramana R., Oechtering, Tobias J..  2020.  On Design of Optimal Smart Meter Privacy Control Strategy Against Adversarial Map Detection. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :5845—5849.
We study the optimal control problem of the maximum a posteriori (MAP) state sequence detection of an adversary using smart meter data. The privacy leakage is measured using the Bayesian risk and the privacy-enhancing control is achieved in real-time using an energy storage system. The control strategy is designed to minimize the expected performance of a non-causal adversary at each time instant. With a discrete-state Markov model, we study two detection problems: when the adversary is unaware or aware of the control. We show that the adversary in the former case can be controlled optimally. In the latter case, where the optimal control problem is shown to be non-convex, we propose an adaptive-grid approximation algorithm to obtain a sub-optimal strategy with reduced complexity. Although this work focuses on privacy in smart meters, it can be generalized to other sensor networks.
2020-08-07
Pawlick, Jeffrey, Nguyen, Thi Thu Hang, Colbert, Edward, Zhu, Quanyan.  2019.  Optimal Timing in Dynamic and Robust Attacker Engagement During Advanced Persistent Threats. 2019 International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT). :1—8.
Advanced persistent threats (APTs) are stealthy attacks which make use of social engineering and deception to give adversaries insider access to networked systems. Against APTs, active defense technologies aim to create and exploit information asymmetry for defenders. In this paper, we study a scenario in which a powerful defender uses honeynets for active defense in order to observe an attacker who has penetrated the network. Rather than immediately eject the attacker, the defender may elect to gather information. We introduce an undiscounted, infinite-horizon Markov decision process on a continuous state space in order to model the defender's problem. We find a threshold of information that the defender should gather about the attacker before ejecting him. Then we study the robustness of this policy using a Stackelberg game. Finally, we simulate the policy for a conceptual network. Our results provide a quantitative foundation for studying optimal timing for attacker engagement in network defense.
2020-07-03
Shaout, Adnan, Crispin, Brennan.  2019.  Markov Augmented Neural Networks for Streaming Video Classification. 2019 International Arab Conference on Information Technology (ACIT). :1—7.

With the growing number of streaming services, internet providers are increasingly needing to be able to identify the types of data and content providers that are being used on their networks. Traditional methods, such as IP and port scanning, are not always available for clients using VPNs or with providers using varying IP addresses. As such, in this paper we explore a potential method using neural networks and Markov Decision Process in order to augment deep packet inspection techniques in identifying the source and class of video streaming services.

2020-03-02
Sahu, Abhijeet, Huang, Hao, Davis, Katherine, Zonouz, Saman.  2019.  SCORE: A Security-Oriented Cyber-Physical Optimal Response Engine. 2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm). :1–6.

Automatic optimal response systems are essential for preserving power system resilience and ensuring faster recovery from emergency under cyber compromise. Numerous research works have developed such response engine for cyber and physical system recovery separately. In this paper, we propose a novel cyber-physical decision support system, SCORE, that computes optimal actions considering pure and hybrid cyber-physical states, using Markov Decision Process (MDP). Such an automatic decision making engine can assist power system operators and network administrators to make a faster response to prevent cascading failures and attack escalation respectively. The hybrid nature of the engine makes the reward and state transition model of the MDP unique. Value iteration and policy iteration techniques are used to compute the optimal actions. Tests are performed on three and five substation power systems to recover from attacks that compromise relays to cause transmission line overflow. The paper also analyses the impact of reward and state transition model on computation. Corresponding results verify the efficacy of the proposed engine.

Arifeen, Md Murshedul, Islam, Al Amin, Rahman, Md Mustafizur, Taher, Kazi Abu, Islam, Md.Maynul, Kaiser, M Shamim.  2019.  ANFIS based Trust Management Model to Enhance Location Privacy in Underwater Wireless Sensor Networks. 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE). :1–6.
Trust management is a promising alternative solution to different complex security algorithms for Underwater Wireless Sensor Networks (UWSN) applications due to its several resource constraint behaviour. In this work, we have proposed a trust management model to improve location privacy of the UWSN. Adaptive Neuro Fuzzy Inference System (ANFIS) has been exploited to evaluate trustworthiness of a sensor node. Also Markov Decision Process (MDP) has been considered. At each state of the MDP, a sensor node evaluates trust behaviour of forwarding node utilizing the FIS learning rules and selects a trusted node. Simulation has been conducted in MATLAB and simulation results show that the detection accuracy of trustworthiness is 91.2% which is greater than Knowledge Discovery and Data Mining (KDD) 99 intrusion detection based dataset. So, in our model 91.2% trustworthiness is necessary to be a trusted node otherwise it will be treated as a malicious or compromised node. Our proposed model can successfully eliminate the possibility of occurring any compromised or malicious node in the network.
2020-02-18
Zheng, Jianjun, Siami Namin, Akbar.  2019.  Enforcing Optimal Moving Target Defense Policies. 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 1:753–759.
This paper introduces an approach based on control theory to model, analyze and select optimal security policies for Moving Target Defense (MTD) deployment strategies. A Markov Decision Process (MDP) scheme is presented to model states of the system from attacking point of view. The employed value iteration method is based on the Bellman optimality equation for optimal policy selection for each state defined in the system. The model is then utilized to analyze the impact of various costs on the optimal policy. The MDP model is then applied to two case studies to evaluate the performance of the model.
2020-02-10
Hoey, Jesse, Sheikhbahaee, Zahra, MacKinnon, Neil J..  2019.  Deliberative and Affective Reasoning: a Bayesian Dual-Process Model. 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW). :388–394.
The presence of artificial agents in human social networks is growing. From chatbots to robots, human experience in the developed world is moving towards a socio-technical system in which agents can be technological or biological, with increasingly blurred distinctions between. Given that emotion is a key element of human interaction, enabling artificial agents with the ability to reason about affect is a key stepping stone towards a future in which technological agents and humans can work together. This paper presents work on building intelligent computational agents that integrate both emotion and cognition. These agents are grounded in the well-established social-psychological Bayesian Affect Control Theory (BayesAct). The core idea of BayesAct is that humans are motivated in their social interactions by affective alignment: they strive for their social experiences to be coherent at a deep, emotional level with their sense of identity and general world views as constructed through culturally shared symbols. This affective alignment creates cohesive bonds between group members, and is instrumental for collaborations to solidify as relational group commitments. BayesAct agents are motivated in their social interactions by a combination of affective alignment and decision theoretic reasoning, trading the two off as a function of the uncertainty or unpredictability of the situation. This paper provides a high-level view of dual process theories and advances BayesAct as a plausible, computationally tractable model based in social-psychological and sociological theory.
2019-06-24
You, Y., Li, Z., Oechtering, T. J..  2018.  Optimal Privacy-Enhancing And Cost-Efficient Energy Management Strategies For Smart Grid Consumers. 2018 IEEE Statistical Signal Processing Workshop (SSP). :826–830.

The design of optimal energy management strategies that trade-off consumers' privacy and expected energy cost by using an energy storage is studied. The Kullback-Leibler divergence rate is used to assess the privacy risk of the unauthorized testing on consumers' behavior. We further show how this design problem can be formulated as a belief state Markov decision process problem so that standard tools of the Markov decision process framework can be utilized, and the optimal solution can be obtained by using Bellman dynamic programming. Finally, we illustrate the privacy-enhancement and cost-saving by numerical examples.

2019-06-17
Zheng, Jianjun, Siami Namin, Akbar.  2018.  A Markov Decision Process to Determine Optimal Policies in Moving Target. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :2321–2323.

Moving Target Defense (MTD) has been introduced as a new game changer strategy in cybersecurity to strengthen defenders and conversely weaken adversaries. The successful implementation of an MTD system can be influenced by several factors including the effectiveness of the employed technique, the deployment strategy, the cost of the MTD implementation, and the impact from the enforced security policies. Several efforts have been spent on introducing various forms of MTD techniques. However, insufficient research work has been conducted on cost and policy analysis and more importantly the selection of these policies in an MTD-based setting. This poster paper proposes a Markov Decision Process (MDP) modeling-based approach to analyze security policies and further select optimal policies for moving target defense implementation and deployment. The adapted value iteration method would solve the Bellman Optimality Equation for optimal policy selection for each state of the system. The results of some simulations indicate that such modeling can be used to analyze the impact of costs of possible actions towards the optimal policies.

2018-02-21
Diovu, R. C., Agee, J. T..  2017.  Quantitative analysis of firewall security under DDoS attacks in smart grid AMI networks. 2017 IEEE 3rd International Conference on Electro-Technology for National Development (NIGERCON). :696–701.

One of the key objectives of distributed denial of service (DDoS) attack on the smart grid advanced metering infrastructure is to threaten the availability of end user's metering data. This will surely disrupt the smooth operations of the grid and third party operators who need this data for billing and other grid control purposes. In previous work, we proposed a cloud-based Openflow firewall for mitigation against DDoS attack in a smart grid AMI. In this paper, PRISM model checker is used to perform a probabilistic best-and worst-case analysis of the firewall with regard to DDoS attack success under different firewall detection probabilities ranging from zero to 1. The results from this quantitative analysis can be useful in determining the extent the DDoS attack can undermine the correctness and performance of the firewall. In addition, the study can also be helpful in knowing the extent the firewall can be improved by applying the knowledge derived from the worst-case performance of the firewall.

2018-02-14
Nam, C., Walker, P., Lewis, M., Sycara, K..  2017.  Predicting trust in human control of swarms via inverse reinforcement learning. 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). :528–533.
In this paper, we study the model of human trust where an operator controls a robotic swarm remotely for a search mission. Existing trust models in human-in-the-loop systems are based on task performance of robots. However, we find that humans tend to make their decisions based on physical characteristics of the swarm rather than its performance since task performance of swarms is not clearly perceivable by humans. We formulate trust as a Markov decision process whose state space includes physical parameters of the swarm. We employ an inverse reinforcement learning algorithm to learn behaviors of the operator from a single demonstration. The learned behaviors are used to predict the trust level of the operator based on the features of the swarm.
2015-11-11
John C. Mace, Newcastle University, Charles Morisset, Newcastle University, Aad Van Moorsel, Newcastle University.  2015.  Modelling User Availability in Workflow Resiliency Analysis. Symposium and Bootcamp on the Science of Security (HotSoS).

Workflows capture complex operational processes and include security constraints limiting which users can perform which tasks. An improper security policy may prevent cer- tain tasks being assigned and may force a policy violation. Deciding whether a valid user-task assignment exists for a given policy is known to be extremely complex, especially when considering user unavailability (known as the resiliency problem). Therefore tools are required that allow automatic evaluation of workflow resiliency. Modelling well defined workflows is fairly straightforward, however user availabil- ity can be modelled in multiple ways for the same workflow. Correct choice of model is a complex yet necessary concern as it has a major impact on the calculated resiliency. We de- scribe a number of user availability models and their encod- ing in the model checker PRISM, used to evaluate resiliency. We also show how model choice can affect resiliency computation in terms of its value, memory and CPU time.