Visible to the public Biblio

Filters: Keyword is reinforcement learning  [Clear All Filters]
2022-09-20
Yao, Pengchao, Hao, Weijie, Yan, Bingjing, Yang, Tao, Wang, Jinming, Yang, Qiang.  2021.  Game-Theoretic Model for Optimal Cyber-Attack Defensive Decision-Making in Cyber-Physical Power Systems. 2021 IEEE 5th Conference on Energy Internet and Energy System Integration (EI2). :2359—2364.

Cyber-Physical Power Systems (CPPSs) currently face an increasing number of security attacks and lack methods for optimal proactive security decisions to defend the attacks. This paper proposed an optimal defensive method based on game theory to minimize the system performance deterioration of CPPSs under cyberspace attacks. The reinforcement learning algorithmic solution is used to obtain the Nash equilibrium and a set of metrics of system vulnerabilities are adopted to quantify the cost of defense against cyber-attacks. The minimax-Q algorithm is utilized to obtain the optimal defense strategy without the availability of the attacker's information. The proposed solution is assessed through experiments based on a realistic power generation microsystem testbed and the numerical results confirmed its effectiveness.

Chen, Tong, Xiang, Yingxiao, Li, Yike, Tian, Yunzhe, Tong, Endong, Niu, Wenjia, Liu, Jiqiang, Li, Gang, Alfred Chen, Qi.  2021.  Protecting Reward Function of Reinforcement Learning via Minimal and Non-catastrophic Adversarial Trajectory. 2021 40th International Symposium on Reliable Distributed Systems (SRDS). :299—309.
Reward functions are critical hyperparameters with commercial values for individual or distributed reinforcement learning (RL), as slightly different reward functions result in significantly different performance. However, existing inverse reinforcement learning (IRL) methods can be utilized to approximate reward functions just based on collected expert trajectories through observing. Thus, in the real RL process, how to generate a polluted trajectory and perform an adversarial attack on IRL for protecting reward functions has become the key issue. Meanwhile, considering the actual RL cost, generated adversarial trajectories should be minimal and non-catastrophic for ensuring normal RL performance. In this work, we propose a novel approach to craft adversarial trajectories disguised as expert ones, for decreasing the IRL performance and realize the anti-IRL ability. Firstly, we design a reward clustering-based metric to integrate both advantages of fine- and coarse-grained IRL assessment, including expected value difference (EVD) and mean reward loss (MRL). Further, based on such metric, we explore an adversarial attack based on agglomerative nesting algorithm (AGNES) clustering and determine targeted states as starting states for reward perturbation. Then we employ the intrinsic fear model to predict the probability of imminent catastrophe, supporting to generate non-catastrophic adversarial trajectories. Extensive experiments of 7 state-of-the-art IRL algorithms are implemented on the Object World benchmark, demonstrating the capability of our proposed approach in (a) decreasing the IRL performance and (b) having minimal and non-catastrophic adversarial trajectories.
2022-09-09
Fu, Zhihan, Fan, Qilin, Zhang, Xu, Li, Xiuhua, Wang, Sen, Wang, Yueyang.  2021.  Policy Network Assisted Monte Carlo Tree Search for Intelligent Service Function Chain Deployment. 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1161—1168.
Network function virtualization (NFV) simplies the coniguration and management of security services by migrating the network security functions from dedicated hardware devices to software middle-boxes that run on commodity servers. Under the paradigm of NFV, the service function chain (SFC) consisting of a series of ordered virtual network security functions is becoming a mainstream form to carry network security services. Allocating the underlying physical network resources to the demands of SFCs under given constraints over time is known as the SFC deployment problem. It is a crucial issue for infrastructure providers. However, SFC deployment is facing new challenges in trading off between pursuing the objective of a high revenue-to-cost ratio and making decisions in an online manner. In this paper, we investigate the use of reinforcement learning to guide online deployment decisions for SFC requests and propose a Policy network Assisted Monte Carlo Tree search approach named PACT to address the above challenge, aiming to maximize the average revenue-to-cost ratio. PACT combines the strengths of the policy network, which evaluates the placement potential of physical servers, and the Monte Carlo Tree Search, which is able to tackle problems with large state spaces. Extensive experimental results demonstrate that our PACT achieves the best performance and is superior to other algorithms by up to 30% and 23.8% on average revenue-to-cost ratio and acceptance rate, respectively.
2022-08-26
Russo, Alessio, Proutiere, Alexandre.  2021.  Minimizing Information Leakage of Abrupt Changes in Stochastic Systems. 2021 60th IEEE Conference on Decision and Control (CDC). :2750—2757.
This work investigates the problem of analyzing privacy of abrupt changes for general Markov processes. These processes may be affected by changes, or exogenous signals, that need to remain private. Privacy refers to the disclosure of information of these changes through observations of the underlying Markov chain. In contrast to previous work on privacy, we study the problem for an online sequence of data. We use theoretical tools from optimal detection theory to motivate a definition of online privacy based on the average amount of information per observation of the stochastic system in consideration. Two cases are considered: the full-information case, where the eavesdropper measures all but the signals that indicate a change, and the limited-information case, where the eavesdropper only measures the state of the Markov process. For both cases, we provide ways to derive privacy upper-bounds and compute policies that attain a higher privacy level. It turns out that the problem of computing privacy-aware policies is concave, and we conclude with some examples and numerical simulations for both cases.
Chowdhury, Sayak Ray, Zhou, Xingyu, Shroff, Ness.  2021.  Adaptive Control of Differentially Private Linear Quadratic Systems. 2021 IEEE International Symposium on Information Theory (ISIT). :485—490.
In this paper we study the problem of regret minimization in reinforcement learning (RL) under differential privacy constraints. This work is motivated by the wide range of RL applications for providing personalized service, where privacy concerns are becoming paramount. In contrast to previous works, we take the first step towards non-tabular RL settings, while providing a rigorous privacy guarantee. In particular, we consider the adaptive control of differentially private linear quadratic (LQ) systems. We develop the first private RL algorithm, Private-OFU-RL which is able to attain a sub-linear regret while guaranteeing privacy protection. More importantly, the additional cost due to privacy is only on the order of \$\textbackslashtextbackslashfrac\textbackslashtextbackslashln(1/\textbackslashtextbackslashdelta)ˆ1/4\textbackslashtextbackslashvarepsilonˆ1/2\$ given privacy parameters \$\textbackslashtextbackslashvarepsilon, \textbackslashtextbackslashdelta \textbackslashtextgreater 0\$. Through this process, we also provide a general procedure for adaptive control of LQ systems under changing regularizers, which not only generalizes previous non-private controls, but also serves as the basis for general private controls.
Razack, Aquib Junaid, Ajith, Vysyakh, Gupta, Rajiv.  2021.  A Deep Reinforcement Learning Approach to Traffic Signal Control. 2021 IEEE Conference on Technologies for Sustainability (SusTech). :1–7.
Traffic Signal Control using Reinforcement Learning has been proved to have potential in alleviating traffic congestion in urban areas. Although research has been conducted in this field, it is still an open challenge to find an effective but low-cost solution to this problem. This paper presents multiple deep reinforcement learning-based traffic signal control systems that can help regulate the flow of traffic at intersections and then compares the results. The proposed systems are coupled with SUMO (Simulation of Urban MObility), an agent-based simulator that provides a realistic environment to explore the outcomes of the models.
Mao, Zeyu, Sahu, Abhijeet, Wlazlo, Patrick, Liu, Yijing, Goulart, Ana, Davis, Katherine, Overbye, Thomas J..  2021.  Mitigating TCP Congestion: A Coordinated Cyber and Physical Approach. 2021 North American Power Symposium (NAPS). :1–6.
The operation of the modern power grid is becoming increasingly reliant on its underlying communication network, especially within the context of the rapidly growing integration of Distributed Energy Resources (DERs). This tight cyber-physical coupling brings uncertainties and challenges for the power grid operation and control. To help operators manage the complex cyber-physical environment, ensure the integrity, and continuity of reliable grid operation, a two-stage approach is proposed that is compatible with current ICS protocols to improve the deliverability of time critical operations. With the proposed framework, the impact Denial of Service (DoS) attack can have on a Transmission Control Protocol (TCP) session could be effectively prevented and mitigated. This coordinated approach combines the efficiency of congestion window reconfiguration and the applicability of physical-only mitigation approaches. By expanding the state and action space to encompass both the cyber and physical domains. This approach has been proven to outperform the traditional, physical-only method, in multiple network congested scenarios that were emulated in a real-time cyber-physical testbed.
Rajamalli Keerthana, R, Fathima, G, Florence, Lilly.  2021.  Evaluating the Performance of Various Deep Reinforcement Learning Algorithms for a Conversational Chatbot. 2021 2nd International Conference for Emerging Technology (INCET). :1–8.
Conversational agents are the most popular AI technology in IT trends. Domain specific chatbots are now used by almost every industry in order to upgrade their customer service. The Proposed paper shows the modelling and performance of one such conversational agent created using deep learning. The proposed model utilizes NMT (Neural Machine Translation) from the TensorFlow software libraries. A BiRNN (Bidirectional Recurrent Neural Network) is used in order to process input sentences that contain large number of tokens (20-40 words). In order to understand the context of the input sentence attention model is used along with BiRNN. The conversational models usually have one drawback, that is, they sometimes provide irrelevant answer to the input. This happens quite often in conversational chatbots as the chatbot doesn't realize that it is answering without context. This drawback is solved in the proposed system using Deep Reinforcement Learning technique. Deep reinforcement Learning follows a reward system that enables the bot to differentiate between right and wrong answers. Deep Reinforcement Learning techniques allows the chatbot to understand the sentiment of the query and reply accordingly. The Deep Reinforcement Learning algorithms used in the proposed system is Q-Learning, Deep Q Neural Network (DQN) and Distributional Reinforcement Learning with Quantile Regression (QR-DQN). The performance of each algorithm is evaluated and compared in this paper in order to find the best DRL algorithm. The dataset used in the proposed system is Cornell Movie-dialogs corpus and CoQA (A Conversational Question Answering Challenge). CoQA is a large dataset that contains data collected from 8000+ conversations in the form of questions and answers. The main goal of the proposed work is to increase the relevancy of the chatbot responses and to increase the perplexity of the conversational chatbot.
Wadekar, Isha.  2021.  Artificial Conversational Agent using Robust Adversarial Reinforcement Learning. 2021 International Conference on Computer Communication and Informatics (ICCCI). :1–7.
Reinforcement learning (R.L.) is an effective and practical means for resolving problems where the broker possesses no information or knowledge about the environment. The agent acquires knowledge that is conditioned on two components: trial-and-error and rewards. An R.L. agent determines an effective approach by interacting directly with the setting and acquiring information regarding the circumstances. However, many modern R.L.-based strategies neglect to theorise considering there is an enormous rift within the simulation and the physical world due to which policy-learning tactics displease that stretches from simulation to physical world Even if design learning is achieved in the physical world, the knowledge inadequacy leads to failed generalization policies from suiting to test circumstances. The intention of robust adversarial reinforcement learning(RARL) is where an agent is instructed to perform in the presence of a destabilizing opponent(adversary agent) that connects impedance to the system. The combined trained adversary is reinforced so that the actual agent i.e. the protagonist is equipped rigorously.
Nguyen, Lan K., Nguyen, Duy H. N., Tran, Nghi H., Bosler, Clayton, Brunnenmeyer, David.  2021.  SATCOM Jamming Resiliency under Non-Uniform Probability of Attacks. MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM). :85—90.
This paper presents a new framework for SATCOM jamming resiliency in the presence of a smart adversary jammer that can prioritize specific channels to attack with a non-uniform probability of distribution. We first develop a model and a defense action strategy based on a Markov decision process (MDP). We propose a greedy algorithm for the MDP-based defense algorithm's policy to optimize the expected user's immediate and future discounted rewards. Next, we remove the assumption that the user has specific information about the attacker's pattern and model. We develop a Q-learning algorithm-a reinforcement learning (RL) approach-to optimize the user's policy. We show that the Q-learning method provides an attractive defense strategy solution without explicit knowledge of the jammer's strategy. Computer simulation results show that the MDP-based defense strategies are very efficient; they offer a significant data rate advantage over the simple random hopping approach. Also, the proposed Q-learning performance can achieve close to the MDP approach without explicit knowledge of the jammer's strategy or attacking model.
2022-07-29
Luo, Weifeng, Xiao, Liang.  2021.  Reinforcement Learning Based Vulnerability Analysis of Data Injection Attack for Smart Grids. 2021 40th Chinese Control Conference (CCC). :6788—6792.
Smart grids have to protect meter measurements against false data injection attacks. By modifying the meter measurements, the attacker misleads the control decisions of the control center, which results in physical damages of power systems. In this paper, we propose a reinforcement learning based vulnerability analysis scheme for data injection attack without relying on the power system topology. This scheme enables the attacker to choose the data injection attack vector based on the meter measurements, the power system status, the previous injected errors and the number of meters to compromise. By combining deep reinforcement learning with prioritized experience replay, the proposed scheme more frequently replays the successful vulnerability detection experiences while bypassing the bad data detection, which is able to accelerate the learning speed. Simulation results based on the IEEE 14 bus system show that this scheme increases the probability of successful vulnerability detection and reduce the number of meters to compromise compared with the benchmark scheme.
2022-07-15
Fan, Wenqi, Derr, Tyler, Zhao, Xiangyu, Ma, Yao, Liu, Hui, Wang, Jianping, Tang, Jiliang, Li, Qing.  2021.  Attacking Black-box Recommendations via Copying Cross-domain User Profiles. 2021 IEEE 37th International Conference on Data Engineering (ICDE). :1583—1594.
Recommender systems, which aim to suggest personalized lists of items for users, have drawn a lot of attention. In fact, many of these state-of-the-art recommender systems have been built on deep neural networks (DNNs). Recent studies have shown that these deep neural networks are vulnerable to attacks, such as data poisoning, which generate fake users to promote a selected set of items. Correspondingly, effective defense strategies have been developed to detect these generated users with fake profiles. Thus, new strategies of creating more ‘realistic’ user profiles to promote a set of items should be investigated to further understand the vulnerability of DNNs based recommender systems. In this work, we present a novel framework CopyAttack. It is a reinforcement learning based black-box attacking method that harnesses real users from a source domain by copying their profiles into the target domain with the goal of promoting a subset of items. CopyAttack is constructed to both efficiently and effectively learn policy gradient networks that first select, then further refine/craft user profiles from the source domain, and ultimately copy them into the target domain. CopyAttack’s goal is to maximize the hit ratio of the targeted items in the Top-k recommendation list of the users in the target domain. We conducted experiments on two real-world datasets and empirically verified the effectiveness of the proposed framework. The implementation of CopyAttack is available at https://github.com/wenqifan03/CopyAttack.
2022-07-01
Boloka, Tlou, Makondo, Ndivhuwo, Rosman, Benjamin.  2021.  Knowledge Transfer using Model-Based Deep Reinforcement Learning. 2021 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA). :1—6.
Deep reinforcement learning has recently been adopted for robot behavior learning, where robot skills are acquired and adapted from data generated by the robot while interacting with its environment through a trial-and-error process. Despite this success, most model-free deep reinforcement learning algorithms learn a task-specific policy from a clean slate and thus suffer from high sample complexity (i.e., they require a significant amount of interaction with the environment to learn reasonable policies and even more to reach convergence). They also suffer from poor initial performance due to executing a randomly initialized policy in the early stages of learning to obtain experience used to train a policy or value function. Model based deep reinforcement learning mitigates these shortcomings. However, it suffers from poor asymptotic performance in contrast to a model-free approach. In this work, we investigate knowledge transfer from a model-based teacher to a task-specific model-free learner to alleviate executing a randomly initialized policy in the early stages of learning. Our experiments show that this approach results in better asymptotic performance, enhanced initial performance, improved safety, better action effectiveness, and reduced sample complexity.
2022-06-09
Limouchi, Elnaz, Mahgoub, Imad.  2021.  Reinforcement Learning-assisted Threshold Optimization for Dynamic Honeypot Adaptation to Enhance IoBT Networks Security. 2021 IEEE Symposium Series on Computational Intelligence (SSCI). :1–7.
Internet of Battlefield Things (IoBT) is the application of Internet of Things (IoT) to a battlefield environment. IoBT networks operate in difficult conditions due to high mobility and unpredictable nature of battle fields and securing them is a challenge. There is increasing interest to use deception techniques to enhance the security of IoBT networks. A honeypot is a system installed on a network as a trap to attract the attention of an attacker and it does not store any valuable data. In this work, we introduce IoBT dual sensor gateways. We propose a Reinforcement Learning (RL)-assisted scheme, in which the IoBT dual sensor gateways intelligently switch between honeypot and real function based on a threshold. The optimal threshold is determined using reinforcement learning approach that adapts to nodes reputation. To focus on the impact of the mobile and uncertain behavior of IoBT networks on the proposed scheme, we consider the nodes as moving vehicles. We statistically analyze the results of our RL-based scheme obtained using ns-3 network simulation, and optimize value of the threshold.
2022-06-08
Chen, Lin, Qiu, Huijun, Kuang, Xiaoyun, Xu, Aidong, Yang, Yiwei.  2021.  Intelligent Data Security Threat Discovery Model Based on Grid Data. 2021 6th International Conference on Image, Vision and Computing (ICIVC). :458–463.
With the rapid construction and popularization of smart grid, the security of data in smart grid has become the basis for the safe and stable operation of smart grid. This paper proposes a data security threat discovery model for smart grid. Based on the prediction data analysis method, combined with migration learning technology, it analyzes different data, uses data matching process to classify the losses, and accurately predicts the analysis results, finds the security risks in the data, and prevents the illegal acquisition of data. The reinforcement learning and training process of this method distinguish the effective authentication and illegal access to data.
2022-05-24
Huang, Yudong, Wang, Shuo, Feng, Tao, Wang, Jiasen, Huang, Tao, Huo, Ru, Liu, Yunjie.  2021.  Towards Network-Wide Scheduling for Cyclic Traffic in IP-based Deterministic Networks. 2021 4th International Conference on Hot Information-Centric Networking (HotICN). :117–122.
The emerging time-sensitive applications, such as industrial automation, smart grids, and telesurgery, pose strong demands for enabling large-scale IP-based deterministic networks. The IETF DetNet working group recently proposes a Cycle Specified Queuing and Forwarding (CSQF) solution. However, CSQF only specifies an underlying device-level primitive while how to achieve network-wide flow scheduling remains undefined. Previous scheduling mechanisms are mostly oriented to the context of local area networks, making them inapplicable to the cyclic traffic in wide area networks. In this paper, we design the Cycle Tags Planning (CTP) mechanism, a first mathematical model to enable network-wide scheduling for cyclic traffic in large-scale deterministic networks. Then, a novel scheduling algorithm named flow offset and cycle shift (FO-CS) is designed to compute the flows' cycle tags. The FO-CS algorithm is evaluated under long-distance network topologies in remote industrial control scenarios. Compared with the Naive algorithm without using FO-CS, simulation results demonstrate that FO-CS improves the scheduling flow number by 31.2% in few seconds.
2022-05-05
Mukherjee, Sayak, Adetola, Veronica.  2021.  A Secure Learning Control Strategy via Dynamic Camouflaging for Unknown Dynamical Systems under Attacks. 2021 IEEE Conference on Control Technology and Applications (CCTA). :905—910.

This paper presents a secure reinforcement learning (RL) based control method for unknown linear time-invariant cyber-physical systems (CPSs) that are subjected to compositional attacks such as eavesdropping and covert attack. We consider the attack scenario where the attacker learns about the dynamic model during the exploration phase of the learning conducted by the designer to learn a linear quadratic regulator (LQR), and thereafter, use such information to conduct a covert attack on the dynamic system, which we refer to as doubly learning-based control and attack (DLCA) framework. We propose a dynamic camouflaging based attack-resilient reinforcement learning (ARRL) algorithm which can learn the desired optimal controller for the dynamic system, and at the same time, can inject sufficient misinformation in the estimation of system dynamics by the attacker. The algorithm is accompanied by theoretical guarantees and extensive numerical experiments on a consensus multi-agent system and on a benchmark power grid model.

2022-05-03
Ma, Weijun, Fang, Junyuan, Wu, Jiajing.  2021.  Sequential Node Attack of Complex Networks Based on Q-Learning Method. 2021 IEEE International Symposium on Circuits and Systems (ISCAS). :1—5.

The security issue of complex network systems, such as communication systems and power grids, has attracted increasing attention due to cascading failure threats. Many existing studies have investigated the robustness of complex networks against cascading failure from an attacker's perspective. However, most of them focus on the synchronous attack in which the network components under attack are removed synchronously rather than in a sequential fashion. Most recent pioneering work on sequential attack designs the attack strategies based on simple heuristics like degree and load information, which may ignore the inside functions of nodes. In the paper, we exploit a reinforcement learning-based sequential attack method to investigate the impact of different nodes on cascading failure. Besides, a candidate pool strategy is proposed to improve the performance of the reinforcement learning method. Simulation results on Barabási-Albert scale-free networks and real-world networks have demonstrated the superiority and effectiveness of the proposed method.

2022-04-19
Hemmati, Mojtaba, Hadavi, Mohammad Ali.  2021.  Using Deep Reinforcement Learning to Evade Web Application Firewalls. 2021 18th International ISC Conference on Information Security and Cryptology (ISCISC). :35–41.
Web application firewalls (WAF) are the last line of defense in protecting web applications from application layer security threats like SQL injection and cross-site scripting. Currently, most evasion techniques from WAFs are still developed manually. In this work, we propose a solution, which automatically scans the WAFs to find payloads through which the WAFs can be bypassed. Our solution finds out rules defects, which can be further used in rule tuning for rule-based WAFs. Also, it can enrich the machine learning-based dataset for retraining. To this purpose, we provide a framework based on reinforcement learning with an environment compatible with OpenAI gym toolset standards, employed for training agents to implement WAF evasion tasks. The framework acts as an adversary and exploits a set of mutation operators to mutate the malicious payload syntactically without affecting the original semantics. We use Q-learning and proximal policy optimization algorithms with the deep neural network. Our solution is successful in evading signature-based and machine learning-based WAFs.
2022-04-13
Mishra, Anupama, Gupta, B. B., Peraković, Dragan, Peñalvo, Francisco José García, Hsu, Ching-Hsien.  2021.  Classification Based Machine Learning for Detection of DDoS attack in Cloud Computing. 2021 IEEE International Conference on Consumer Electronics (ICCE). :1—4.
Distributed Denial of service attack(DDoS)is a network security attack and now the attackers intruded into almost every technology such as cloud computing, IoT, and edge computing to make themselves stronger. As per the behaviour of DDoS, all the available resources like memory, cpu or may be the entire network are consumed by the attacker in order to shutdown the victim`s machine or server. Though, the plenty of defensive mechanism are proposed, but they are not efficient as the attackers get themselves trained by the newly available automated attacking tools. Therefore, we proposed a classification based machine learning approach for detection of DDoS attack in cloud computing. With the help of three classification machine learning algorithms K Nearest Neighbor, Random Forest and Naive Bayes, the mechanism can detect a DDoS attack with the accuracy of 99.76%.
2022-03-25
Tan, Ziya, Karaköse, Mehmet.  2021.  Proximal Policy Based Deep Reinforcement Learning Approach for Swarm Robots. 2021 Zooming Innovation in Consumer Technologies Conference (ZINC). :166—170.
Artificial intelligence technology is becoming more active in all areas of our lives day by day. This technology affects our daily life by more developing in areas such as industry 4.0, security and education. Deep reinforcement learning is one of the most developed algorithms in the field of artificial intelligence. In this study, it is aimed that three different robots in a limited area learn to move without hitting each other, fixed obstacles and the boundaries of the field. These robots have been trained using the deep reinforcement learning approach and Proximal policy optimization (PPO) policy. Instead of uses value-based methods with the discrete action space, PPO that can easily manipulate the continuous action field and successfully determine the action of the robots has been proposed. PPO policy achieves successful results in multi-agent problems, especially with the use of the Actor-Critic network. In addition, information is given about environment control and learning approaches for swarm behavior. We propose parameter sharing and behavior-based method for this study. Finally, trained model is recorded and tested in 9 different environments where the obstacles are located differently. With our method, robots can perform their tasks in closed environments in the real world without damaging anyone or anything.
2022-03-15
Amir, Guy, Schapira, Michael, Katz, Guy.  2021.  Towards Scalable Verification of Deep Reinforcement Learning. 2021 Formal Methods in Computer Aided Design (FMCAD). :193—203.
Deep neural networks (DNNs) have gained significant popularity in recent years, becoming the state of the art in a variety of domains. In particular, deep reinforcement learning (DRL) has recently been employed to train DNNs that realize control policies for various types of real-world systems. In this work, we present the whiRL 2.0 tool, which implements a new approach for verifying complex properties of interest for DRL systems. To demonstrate the benefits of whiRL 2.0, we apply it to case studies from the communication networks domain that have recently been used to motivate formal verification of DRL systems, and which exhibit characteristics that are conducive for scalable verification. We propose techniques for performing k-induction and semi-automated invariant inference on such systems, and leverage these techniques for proving safety and liveness properties that were previously impossible to verify due to the scalability barriers of prior approaches. Furthermore, we show how our proposed techniques provide insights into the inner workings and the generalizability of DRL systems. whiRL 2.0 is publicly available online.
2022-02-07
Narayanankutty, Hrishikesh.  2021.  Self-Adapting Model-Based SDSec For IoT Networks Using Machine Learning. 2021 IEEE 18th International Conference on Software Architecture Companion (ICSA-C). :92–93.
IoT networks today face a myriad of security vulnerabilities in their infrastructure due to its wide attack surface. Large-scale networks are increasingly adopting a Software-Defined Networking approach, it allows for simplified network control and management through network virtualization. Since traditional security mechanisms are incapable of handling virtualized environments, SDSec or Software-Defined Security is introduced as a solution to support virtualized infrastructure, specifically aimed at providing security solutions to SDN frameworks. To further aid large scale design and development of SDN frameworks, Model-Driven Engineering (MDE) has been proposed to be used at the design phase, since abstraction, automation and analysis are inherently key aspects of MDE. This provides an efficient approach to reducing large problems through models that abstract away the complex technicality of the total system. Making adaptations to these models to address security issues faced in IoT networks, largely reduces cost and improves efficiency. These models can be simulated, analysed and supports architecture model adaptation; model changes are then reflected back to the real system. We propose a model-driven security approach for SDSec networks that can self-adapt using machine learning to mitigate security threats. The overall design time changes can be monitored at run time through machine learning techniques (e.g. deep, reinforcement learning) for real time analysis. This approach can be tested in IoT simulation environments, for instance using the CAPS IoT modeling and simulation framework. Using self-adaptation of models and advanced machine learning for data analysis would ensure that the SDSec architecture adapts and improves over time. This largely reduces the overall attack surface to achieve improved end-to-end security in IoT environments.
2022-02-03
Xu, Chengtao, Song, Houbing.  2021.  Mixed Initiative Balance of Human-Swarm Teaming in Surveillance via Reinforcement learning. 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC). :1—10.
Human-machine teaming (HMT) operates in a context defined by the mission. Varying from the complexity and disturbance in the cooperation between humans and machines, a single machine has difficulty handling work with humans in the scales of efficiency and workload. Swarm of machines provides a more feasible solution in such a mission. Human-swarm teaming (HST) extends the concept of HMT in the mission, such as persistent surveillance, search-and-rescue, warfare. Bringing the concept of HST faces several scientific challenges. For example, the strategies of allocation on the high-level decision making. Here, human usually plays the supervisory or decision making role. Performance of such fixed structure of HST in actual mission operation could be affected by the supervisor’s status from many aspects, which could be considered in three general parts: workload, situational awareness, and trust towards the robot swarm teammate and mission performance. Besides, the complexity of a single human operator in accessing multiple machine agents increases the work burdens. An interface between swarm teammates and human operators to simplify the interaction process is desired in the HST.In this paper, instead of purely considering the workload of human teammates, we propose the computational model of human swarm interaction (HSI) in the simulated map surveillance mission. UAV swarm and human supervisor are both assigned in searching a predefined area of interest (AOI). The workload allocation of map monitoring is adjusted based on the status of the human worker and swarm teammate. Workload, situation awareness ability, trust are formulated as independent models, which affect each other. A communication-aware UAV swarm persistent surveillance algorithm is assigned in the swarm autonomy portion. With the different surveillance task loads, the swarm agent’s thrust parameter adjusts the autonomy level to fit the human operator’s needs. Reinforcement learning is applied in seeking the relative balance of workload in both human and swarm sides. Metrics such as mission accomplishment rate, human supervisor performance, mission performance of UAV swarm are evaluated in the end. The simulation results show that the algorithm could learn the human-machine trust interaction to seek the workload balance to reach better mission execution performance. This work inspires us to leverage a more comprehensive HST model in more practical HMT application scenarios.
2022-01-11
Roberts, Ciaran, Ngo, Sy-Toan, Milesi, Alexandre, Scaglione, Anna, Peisert, Sean, Arnold, Daniel.  2021.  Deep Reinforcement Learning for Mitigating Cyber-Physical DER Voltage Unbalance Attacks. 2021 American Control Conference (ACC). :2861–2867.
The deployment of DER with smart-inverter functionality is increasing the controllable assets on power distribution networks and, consequently, the cyber-physical attack surface. Within this work, we consider the use of reinforcement learning as an online controller that adjusts DER Volt/Var and Volt/Watt control logic to mitigate network voltage unbalance. We specifically focus on the case where a network-aware cyber-physical attack has compromised a subset of single-phase DER, causing a large voltage unbalance. We show how deep reinforcement learning successfully learns a policy minimizing the unbalance, both during normal operation and during a cyber-physical attack. In mitigating the attack, the learned stochastic policy operates alongside legacy equipment on the network, i.e. tap-changing transformers, adjusting optimally predefined DER control-logic.