Visible to the public Biblio

Filters: Keyword is q-learning  [Clear All Filters]
2023-08-25
Zhang, Xue, Wei, Liang, Jing, Shan, Zhao, Chuan, Chen, Zhenxiang.  2022.  SDN-Based Load Balancing Solution for Deterministic Backbone Networks. 2022 5th International Conference on Hot Information-Centric Networking (HotICN). :119–124.
Traffic in a backbone network has high forwarding rate requirements, and as the network gets larger, traffic increases and forwarding rates decrease. In a Software Defined Network (SDN), the controller can manage a global view of the network and control the forwarding of network traffic. A deterministic network has different forwarding requirements for the traffic of different priority levels. Static traffic load balancing is not flexible enough to meet the needs of users and may lead to the overloading of individual links and even network collapse. In this paper, we propose a new backbone network load balancing architecture - EDQN (Edge Deep Q-learning Network), which implements queue-based gate-shaping algorithms at the edge devices and load balancing of traffic on the backbone links. With the advantages of SDN, the link utilization of the backbone network can be improved, the delay in traffic transmission can be reduced and the throughput of traffic during transmission can be increased.
ISSN: 2831-4395
2023-08-04
Ma, Yaodong, Liu, Kai, Luo, Xiling.  2022.  Game Theory Based Multi-agent Cooperative Anti-jamming for Mobile Ad Hoc Networks. 2022 IEEE 8th International Conference on Computer and Communications (ICCC). :901–905.
Currently, mobile ad hoc networks (MANETs) are widely used due to its self-configuring feature. However, it is vulnerable to the malicious jammers in practice. Traditional anti-jamming approaches, such as channel hopping based on deterministic sequences, may not be the reliable solution against intelligent jammers due to its fixed patterns. To address this problem, we propose a distributed game theory-based multi-agent anti-jamming (DMAA) algorithm in this paper. It enables each user to exploit all information from its neighboring users before the network attacks, and derive dynamic local policy knowledge to overcome intelligent jamming attacks efficiently as well as guide the users to cooperatively hop to the same channel with high probability. Simulation results demonstrate that the proposed algorithm can learn an optimal policy to guide the users to avoid malicious jamming more efficiently and rapidly than the random and independent Q-learning baseline algorithms,
2023-06-09
Rizwan, Kainat, Ahmad, Mudassar, Habib, Muhammad Asif.  2022.  Cyber Automated Network Resilience Defensive Approach against Malware Images. 2022 International Conference on Frontiers of Information Technology (FIT). :237—242.
Cyber threats have been a major issue in the cyber security domain. Every hacker follows a series of cyber-attack stages known as cyber kill chain stages. Each stage has its norms and limitations to be deployed. For a decade, researchers have focused on detecting these attacks. Merely watcher tools are not optimal solutions anymore. Everything is becoming autonomous in the computer science field. This leads to the idea of an Autonomous Cyber Resilience Defense algorithm design in this work. Resilience has two aspects: Response and Recovery. Response requires some actions to be performed to mitigate attacks. Recovery is patching the flawed code or back door vulnerability. Both aspects were performed by human assistance in the cybersecurity defense field. This work aims to develop an algorithm based on Reinforcement Learning (RL) with a Convoluted Neural Network (CNN), far nearer to the human learning process for malware images. RL learns through a reward mechanism against every performed attack. Every action has some kind of output that can be classified into positive or negative rewards. To enhance its thinking process Markov Decision Process (MDP) will be mitigated with this RL approach. RL impact and induction measures for malware images were measured and performed to get optimal results. Based on the Malimg Image malware, dataset successful automation actions are received. The proposed work has shown 98% accuracy in the classification, detection, and autonomous resilience actions deployment.
2023-05-12
Zhang, Qirui, Meng, Siqi, Liu, Kun, Dai, Wei.  2022.  Design of Privacy Mechanism for Cyber Physical Systems: A Nash Q-learning Approach. 2022 China Automation Congress (CAC). :6361–6365.

This paper studies the problem of designing optimal privacy mechanism with less energy cost. The eavesdropper and the defender with limited resources should choose which channel to eavesdrop and defend, respectively. A zero-sum stochastic game framework is used to model the interaction between the two players and the game is solved through the Nash Q-learning approach. A numerical example is given to verify the proposed method.

ISSN: 2688-0938

2023-01-05
Ma, Xiandong, Su, Zhou, Xu, Qichao, Ying, Bincheng.  2022.  Edge Computing and UAV Swarm Cooperative Task Offloading in Vehicular Networks. 2022 International Wireless Communications and Mobile Computing (IWCMC). :955–960.
Recently, unmanned aerial vehicle (UAV) swarm has been advocated to provide diverse data-centric services including data relay, content caching and computing task offloading in vehicular networks due to their flexibility and conveniences. Since only offloading computing tasks to edge computing devices (ECDs) can not meet the real-time demand of vehicles in peak traffic flow, this paper proposes to combine edge computing and UAV swarm for cooperative task offloading in vehicular networks. Specifically, we first design a cooperative task offloading framework that vehicles' computing tasks can be executed locally, offloaded to UAV swarm, or offloaded to ECDs. Then, the selection of offloading strategy is formulated as a mixed integer nonlinear programming problem, the object of which is to maximize the utility of the vehicle. To solve the problem, we further decompose the original problem into two subproblems: minimizing the completion time when offloading to UAV swarm and optimizing the computing resources when offloading to ECD. For offloading to UAV swarm, the computing task will be split into multiple subtasks that are offloaded to different UAVs simultaneously for parallel computing. A Q-learning based iterative algorithm is proposed to minimize the computing task's completion time by equalizing the completion time of its subtasks assigned to each UAV. For offloading to ECDs, a gradient descent algorithm is used to optimally allocate computing resources for offloaded tasks. Extensive simulations are lastly conducted to demonstrate that the proposed scheme can significantly improve the utility of vehicles compared with conventional schemes.
2022-10-20
Jiang, Luanjuan, Chen, Xin.  2021.  Understanding the impact of cyber-physical correlation on security analysis of Cyber-Physical Systems. 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :529—534.
Cyber-Physical Systems(CPS) have been experiencing a fast-growing process in recent decades, and related security issues also have become more important than ever before. To design an efficient defensive policy for operators and controllers is the utmost task to be considered. In this paper, a stochastic game-theoretic model is developed to study a CPS security problem by considering the interdependence between cyber and physical spaces of a CPS. The game model is solved with Minimax Q-learning for finding the mixed strategies equilibria. The numerical simulation revealed that the defensive factors and attack cost can affect the policies adopted by the system. From the perspective of the operator of a CPS, increasing successful defense probability in the phrase of disruption will help to improve the probability of defense strategy when there is a correlation between the cyber layer and the physical layer in a CPS. On the contrary side, the system defense probability will decrease as the total cost of the physical layer increases.
2022-08-26
Rajamalli Keerthana, R, Fathima, G, Florence, Lilly.  2021.  Evaluating the Performance of Various Deep Reinforcement Learning Algorithms for a Conversational Chatbot. 2021 2nd International Conference for Emerging Technology (INCET). :1–8.
Conversational agents are the most popular AI technology in IT trends. Domain specific chatbots are now used by almost every industry in order to upgrade their customer service. The Proposed paper shows the modelling and performance of one such conversational agent created using deep learning. The proposed model utilizes NMT (Neural Machine Translation) from the TensorFlow software libraries. A BiRNN (Bidirectional Recurrent Neural Network) is used in order to process input sentences that contain large number of tokens (20-40 words). In order to understand the context of the input sentence attention model is used along with BiRNN. The conversational models usually have one drawback, that is, they sometimes provide irrelevant answer to the input. This happens quite often in conversational chatbots as the chatbot doesn't realize that it is answering without context. This drawback is solved in the proposed system using Deep Reinforcement Learning technique. Deep reinforcement Learning follows a reward system that enables the bot to differentiate between right and wrong answers. Deep Reinforcement Learning techniques allows the chatbot to understand the sentiment of the query and reply accordingly. The Deep Reinforcement Learning algorithms used in the proposed system is Q-Learning, Deep Q Neural Network (DQN) and Distributional Reinforcement Learning with Quantile Regression (QR-DQN). The performance of each algorithm is evaluated and compared in this paper in order to find the best DRL algorithm. The dataset used in the proposed system is Cornell Movie-dialogs corpus and CoQA (A Conversational Question Answering Challenge). CoQA is a large dataset that contains data collected from 8000+ conversations in the form of questions and answers. The main goal of the proposed work is to increase the relevancy of the chatbot responses and to increase the perplexity of the conversational chatbot.
Nguyen, Lan K., Nguyen, Duy H. N., Tran, Nghi H., Bosler, Clayton, Brunnenmeyer, David.  2021.  SATCOM Jamming Resiliency under Non-Uniform Probability of Attacks. MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM). :85—90.
This paper presents a new framework for SATCOM jamming resiliency in the presence of a smart adversary jammer that can prioritize specific channels to attack with a non-uniform probability of distribution. We first develop a model and a defense action strategy based on a Markov decision process (MDP). We propose a greedy algorithm for the MDP-based defense algorithm's policy to optimize the expected user's immediate and future discounted rewards. Next, we remove the assumption that the user has specific information about the attacker's pattern and model. We develop a Q-learning algorithm-a reinforcement learning (RL) approach-to optimize the user's policy. We show that the Q-learning method provides an attractive defense strategy solution without explicit knowledge of the jammer's strategy. Computer simulation results show that the MDP-based defense strategies are very efficient; they offer a significant data rate advantage over the simple random hopping approach. Also, the proposed Q-learning performance can achieve close to the MDP approach without explicit knowledge of the jammer's strategy or attacking model.
2022-04-19
Hemmati, Mojtaba, Hadavi, Mohammad Ali.  2021.  Using Deep Reinforcement Learning to Evade Web Application Firewalls. 2021 18th International ISC Conference on Information Security and Cryptology (ISCISC). :35–41.
Web application firewalls (WAF) are the last line of defense in protecting web applications from application layer security threats like SQL injection and cross-site scripting. Currently, most evasion techniques from WAFs are still developed manually. In this work, we propose a solution, which automatically scans the WAFs to find payloads through which the WAFs can be bypassed. Our solution finds out rules defects, which can be further used in rule tuning for rule-based WAFs. Also, it can enrich the machine learning-based dataset for retraining. To this purpose, we provide a framework based on reinforcement learning with an environment compatible with OpenAI gym toolset standards, employed for training agents to implement WAF evasion tasks. The framework acts as an adversary and exploits a set of mutation operators to mutate the malicious payload syntactically without affecting the original semantics. We use Q-learning and proximal policy optimization algorithms with the deep neural network. Our solution is successful in evading signature-based and machine learning-based WAFs.
2021-05-25
Alabadi, Montdher, Albayrak, Zafer.  2020.  Q-Learning for Securing Cyber-Physical Systems : A survey. 2020 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA). :1–13.
A cyber-physical system (CPS) is a term that implements mainly three parts, Physical elements, communication networks, and control systems. Currently, CPS includes the Internet of Things (IoT), Internet of Vehicles (IoV), and many other systems. These systems face many security challenges and different types of attacks, such as Jamming, DDoS.CPS attacks tend to be much smarter and more dynamic; thus, it needs defending strategies that can handle this level of intelligence and dynamicity. Last few years, many researchers use machine learning as a base solution to many CPS security issues. This paper provides a survey of the recent works that utilized the Q-Learning algorithm in terms of security enabling and privacy-preserving. Different adoption of Q-Learning for security and defending strategies are studied. The state-of-the-art of Q-learning and CPS systems are classified and analyzed according to their attacks, domain, supported techniques, and details of the Q-Learning algorithm. Finally, this work highlight The future research trends toward efficient utilization of Q-learning and deep Q-learning on CPS security.
2021-04-27
Pozdniakov, K., Alonso, E., Stankovic, V., Tam, K., Jones, K..  2020.  Smart Security Audit: Reinforcement Learning with a Deep Neural Network Approximator. 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA). :1–8.
A significant challenge in modern computer security is the growing skill gap as intruder capabilities increase, making it necessary to begin automating elements of penetration testing so analysts can contend with the growing number of cyber threats. In this paper, we attempt to assist human analysts by automating a single host penetration attack. To do so, a smart agent performs different attack sequences to find vulnerabilities in a target system. As it does so, it accumulates knowledge, learns new attack sequences and improves its own internal penetration testing logic. As a result, this agent (AgentPen for simplicity) is able to successfully penetrate hosts it has never interacted with before. A computer security administrator using this tool would receive a comprehensive, automated sequence of actions leading to a security breach, highlighting potential vulnerabilities, and reducing the amount of menial tasks a typical penetration tester would need to execute. To achieve autonomy, we apply an unsupervised machine learning algorithm, Q-learning, with an approximator that incorporates a deep neural network architecture. The security audit itself is modelled as a Markov Decision Process in order to test a number of decision-making strategies and compare their convergence to optimality. A series of experimental results is presented to show how this approach can be effectively used to automate penetration testing using a scalable, i.e. not exhaustive, and adaptive approach.
2021-01-22
Sahabandu, D., Allen, J., Moothedath, S., Bushnell, L., Lee, W., Poovendran, R..  2020.  Quickest Detection of Advanced Persistent Threats: A Semi-Markov Game Approach. 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS). :9—19.
Advanced Persistent Threats (APTs) are stealthy, sophisticated, long-term, multi-stage attacks that threaten the security of sensitive information. Dynamic Information Flow Tracking (DIFT) has been proposed as a promising mechanism to detect and prevent various cyber attacks in computer systems. DIFT tracks suspicious information flows in the system and generates security analysis when anomalous behavior is detected. The number of information flows in a system is typically large and the amount of resources (such as memory, processing power and storage) required for analyzing different flows at different system locations varies. Hence, efficient use of resources is essential to maintain an acceptable level of system performance when using DIFT. On the other hand, the quickest detection of APTs is crucial as APTs are persistent and the damage caused to the system is more when the attacker spends more time in the system. We address the problem of detecting APTs and model the trade-off between resource efficiency and quickest detection of APTs. We propose a game model that captures the interaction of APT and a DIFT-based defender as a two-player, multi-stage, zero-sum, Stackelberg semi-Markov game. Our game considers the performance parameters such as false-negatives generated by DIFT and the time required for executing various operations in the system. We propose a two-time scale Q-learning algorithm that converges to a Stackelberg equilibrium under infinite horizon, limiting average payoff criteria. We validate our model and algorithm on a real-word attack dataset obtained using Refinable Attack INvestigation (RAIN) framework.
2020-12-02
Swain, P., Kamalia, U., Bhandarkar, R., Modi, T..  2019.  CoDRL: Intelligent Packet Routing in SDN Using Convolutional Deep Reinforcement Learning. 2019 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS). :1—6.

Software Defined Networking (SDN) provides opportunities for flexible and dynamic traffic engineering. However, in current SDN systems, routing strategies are based on traditional mechanisms which lack in real-time modification and less efficient resource utilization. To overcome these limitations, deep learning is used in this paper to improve the routing computation in SDN. This paper proposes Convolutional Deep Reinforcement Learning (CoDRL) model which is based on deep reinforcement learning agent for routing optimization in SDN to minimize the mean network delay and packet loss rate. The CoDRL model consists of Deep Deterministic Policy Gradients (DDPG) deep agent coupled with Convolution layer. The proposed model tends to automatically adapts the dynamic packet routing using network data obtained through the SDN controller, and provides the routing configuration that attempts to reduce network congestion and minimize the mean network delay. Hence, the proposed deep agent exhibits good convergence towards providing routing configurations that improves the network performance.

2020-10-05
Kanellopoulos, Aris, Vamvoudakis, Kyriakos G., Gupta, Vijay.  2019.  Decentralized Verification for Dissipativity of Cascade Interconnected Systems. 2019 IEEE 58th Conference on Decision and Control (CDC). :3629—3634.

In this paper, we consider the problem of decentralized verification for large-scale cascade interconnections of linear subsystems such that dissipativity properties of the overall system are guaranteed with minimum knowledge of the dynamics. In order to achieve compositionality, we distribute the verification process among the individual subsystems, which utilize limited information received locally from their immediate neighbors. Furthermore, to obviate the need for full knowledge of the subsystem parameters, each decentralized verification rule employs a model-free learning structure; a reinforcement learning algorithm that allows for online evaluation of the appropriate storage function that can be used to verify dissipativity of the system up to that point. Finally, we show how the interconnection can be extended by adding learning-enabled subsystems while ensuring dissipativity.

2020-05-15
Fan, Renshi, Du, Gaoming, Xu, Pengfei, Li, Zhenmin, Song, Yukun, Zhang, Duoli.  2019.  An Adaptive Routing Scheme Based on Q-learning and Real-time Traffic Monitoring for Network-on-Chip. 2019 IEEE 13th International Conference on Anti-counterfeiting, Security, and Identification (ASID). :244—248.
In the Network on Chip (NoC), performance optimization has always been a research focus. Compared with the static routing scheme, dynamical routing schemes can better reduce the data of packet transmission latency under network congestion. In this paper, we propose a dynamical Q-learning routing approach with real-time monitoring of NoC. Firstly, we design a real-time monitoring scheme and the corresponding circuits to record the status of traffic congestion for NoC. Secondly, we propose a novel method of Q-learning. This method finds an optimal path based on the lowest traffic congestion. Finally, we dynamically redistribute network tasks to increase the packet transmission speed and balance the traffic load. Compared with the C-XY routing and DyXY routing, our method achieved improvement in terms of 25.6%-49.5% and 22.9%-43.8%.
2020-03-02
Zhang, Yihan, Wu, Jiajing, Chen, Zhenhao, Huang, Yuxuan, Zheng, Zibin.  2019.  Sequential Node/Link Recovery Strategy of Power Grids Based on Q-Learning Approach. 2019 IEEE International Symposium on Circuits and Systems (ISCAS). :1–5.

Cascading failure, which can be triggered by both physical and cyber attacks, is among the most critical threats to the security and resilience of power grids. In current literature, researchers investigate the issue of cascading failure on smart grids mainly from the attacker's perspective. From the perspective of a grid defender or operator, however, it is also an important issue to restore the smart grid suffering from cascading failure back to normal operation as soon as possible. In this paper, we consider cascading failure in conjunction with the restoration process involving repairing of the failed nodes/links in a sequential fashion. Based on a realistic power flow cascading failure model, we exploit a Q-learning approach to develop a practical and effective policy to identify the optimal way of sequential restorations for large-scale smart grids. Simulation results on three power grid test benchmarks demonstrate the learning ability and the effectiveness of the proposed strategy.

2019-02-08
Yousefi, M., Mtetwa, N., Zhang, Y., Tianfield, H..  2018.  A Reinforcement Learning Approach for Attack Graph Analysis. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :212-217.

Attack graph approach is a common tool for the analysis of network security. However, analysis of attack graphs could be complicated and difficult depending on the attack graph size. This paper presents an approximate analysis approach for attack graphs based on Q-learning. First, we employ multi-host multi-stage vulnerability analysis (MulVAL) to generate an attack graph for a given network topology. Then we refine the attack graph and generate a simplified graph called a transition graph. Next, we use a Q-learning model to find possible attack routes that an attacker could use to compromise the security of the network. Finally, we evaluate the approach by applying it to a typical IT network scenario with specific services, network configurations, and vulnerabilities.

2016-11-15
Keywhan Chung, University of Illinois at Urbana-Champaign, Charles A. Kamhoua, Air Force Research Laboratory, Kevin A. Kwiat, Air Force Research Laboratory, Zbigniew Kalbarczyk, University of Illinois at Urbana-Champaign, Ravishankar K. Iyer, University of Illinois at Urbana-Champaign.  2016.  Game Theory with Learning for Cyber Security Monitoring. IEEE High Assurance Systems Engineering Symposium (HASE 2016).

Recent attacks show that threats to cyber infrastructure are not only increasing in volume, but are getting more sophisticated. The attacks may comprise multiple actions that are hard to differentiate from benign activity, and therefore common detection techniques have to deal with high false positive rates. Because of the imperfect performance of automated detection techniques, responses to such attacks are highly dependent on human-driven decision-making processes. While game theory has been applied to many problems that require rational decisionmaking, we find limitation on applying such method on security games. In this work, we propose Q-Learning to react automatically to the adversarial behavior of a suspicious user to secure the system. This work compares variations of Q-Learning with a traditional stochastic game. Simulation results show the possibility of Naive Q-Learning, despite restricted information on opponents.