Visible to the public Biblio

Filters: Keyword is reinforcement learning  [Clear All Filters]
2022-01-10
Ngo, Quoc-Dung, Nguyen, Huy-Trung, Nguyen, Viet-Dung, Dinh, Cong-Minh, Phung, Anh-Tu, Bui, Quy-Tung.  2021.  Adversarial Attack and Defense on Graph-based IoT Botnet Detection Approach. 2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE). :1–6.
To reduce the risk of botnet malware, methods of detecting botnet malware using machine learning have received enormous attention in recent years. Most of the traditional methods are based on supervised learning that relies on static features with defined labels. However, recent studies show that supervised machine learning-based IoT malware botnet models are more vulnerable to intentional attacks, known as an adversarial attack. In this paper, we study the adversarial attack on PSI-graph based researches. To perform the efficient attack, we proposed a reinforcement learning based method with a trained target classifier to modify the structures of PSI-graphs. We show that PSI-graphs are vulnerable to such attack. We also discuss about defense method which uses adversarial training to train a defensive model. Experiment result achieves 94.1% accuracy on the adversarial dataset; thus, shows that our defensive model is much more robust than the previous target classifier.
2021-12-20
NING, Baifeng, Xiao, Liang.  2021.  Defense Against Advanced Persistent Threats in Smart Grids: A Reinforcement Learning Approach. 2021 40th Chinese Control Conference (CCC). :8598–8603.
In smart girds, supervisory control and data acquisition (SCADA) systems have to protect data from advanced persistent threats (APTs), which exploit vulnerabilities of the power infrastructures to launch stealthy and targeted attacks. In this paper, we propose a reinforcement learning-based APT defense scheme for the control center to choose the detection interval and the number of Central Processing Units (CPUs) allocated to the data concentrators based on the data priority, the size of the collected meter data, the history detection delay, the previous number of allocated CPUs, and the size of the labeled compromised meter data without the knowledge of the attack interval and attack CPU allocation model. The proposed scheme combines deep learning and policy-gradient based actor-critic algorithm to accelerate the optimization speed at the control center, where an actor network uses the softmax distribution to choose the APT defense policy and the critic network updates the actor network weights to improve the computational performance. The advantage function is applied to reduce the variance of the policy gradient. Simulation results show that our proposed scheme has a performance gain over the benchmarks in terms of the detection delay, data protection level, and utility.
2021-12-02
Rao, Poojith U., Sodhi, Balwinder, Sodhi, Ranjana.  2020.  Cyber Security Enhancement of Smart Grids Via Machine Learning - A Review. 2020 21st National Power Systems Conference (NPSC). :1–6.
The evolution of power system as a smart grid (SG) not only has enhanced the monitoring and control capabilities of the power grid, but also raised its security concerns and vulnerabilities. With a boom in Internet of Things (IoT), a lot a sensors are being deployed across the grid. This has resulted in huge amount of data available for processing and analysis. Machine learning (ML) and deep learning (DL) algorithms are being widely used to extract useful information from this data. In this context, this paper presents a comprehensive literature survey of different ML and DL techniques that have been used in the smart grid cyber security area. The survey summarizes different type of cyber threats which today's SGs are prone to, followed by various ML and DL-assisted defense strategies. The effectiveness of the ML based methods in enhancing the cyber security of SGs is also demonstrated with the help of a case study.
2021-11-30
Shateri, Mohammadhadi, Messina, Francisco, Piantanida, Pablo, Labeau, Fabrice.  2020.  Privacy-Cost Management in Smart Meters Using Deep Reinforcement Learning. 2020 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe). :929–933.
Smart meters (SMs) play a pivotal rule in the smart grid by being able to report the electricity usage of consumers to the utility provider (UP) almost in real-time. However, this could leak sensitive information about the consumers to the UP or a third-party. Recent works have leveraged the availability of energy storage devices, e.g., a rechargeable battery (RB), in order to provide privacy to the consumers with minimal additional energy cost. In this paper, a privacy-cost management unit (PCMU) is proposed based on a model-free deep reinforcement learning algorithm, called deep double Q-learning (DDQL). Empirical results evaluated on actual SMs data are presented to compare DDQL with the state-of-the-art, i.e., classical Q-learning (CQL). Additionally, the performance of the method is investigated for two concrete cases where attackers aim to infer the actual demand load and the occupancy status of dwellings. Finally, an abstract information-theoretic characterization is provided.
2021-11-29
Lyons, D., Zahra, S..  2020.  Using Taint Analysis and Reinforcement Learning (TARL) to Repair Autonomous Robot Software. 2020 IEEE Security and Privacy Workshops (SPW). :181–184.
It is important to be able to establish formal performance bounds for autonomous systems. However, formal verification techniques require a model of the environment in which the system operates; a challenge for autonomous systems, especially those expected to operate over longer timescales. This paper describes work in progress to automate the monitor and repair of ROS-based autonomous robot software written for an apriori partially known and possibly incorrect environment model. A taint analysis method is used to automatically extract the dataflow sequence from input topic to publish topic, and instrument that code. A unique reinforcement learning approximation of MDP utility is calculated, an empirical and non-invasive characterization of the inherent objectives of the software designers. By comparing design (a-priori) utility with deploy (deployed system) utility, we show, using a small but real ROS example, that it's possible to monitor a performance criterion and relate violations of the criterion to parts of the software. The software is then patched using automated software repair techniques and evaluated against the original off-line utility.
2021-10-12
El-Sobky, Mariam, Sarhan, Hisham, Abu-ElKheir, Mervat.  2020.  Security Assessment of the Contextual Multi-Armed Bandit - RL Algorithm for Link Adaptation. 2020 2nd Novel Intelligent and Leading Emerging Sciences Conference (NILES). :514–519.
Industry is increasingly adopting Reinforcement Learning algorithms (RL) in production without thoroughly analyzing their security features. In addition to the potential threats that may arise if the functionality of these algorithms is compromised while in operation. One of the well-known RL algorithms is the Contextual Multi-Armed Bandit (CMAB) algorithm. In this paper, we explore how the CMAB can be used to solve the Link Adaptation problem - a well-known problem in the telecommunication industry by learning the optimal transmission parameters that will maximize a communication link's throughput. We analyze the potential vulnerabilities of the algorithm and how they may adversely affect link parameters computation. Additionally, we present a provable security assessment for the Contextual Multi-Armed Bandit Reinforcement Learning (CMAB-RL) algorithm in a network simulated environment using Ray. This is by demonstrating CMAB security vulnerabilities theoretically and practically. Some security controls are proposed for CMAB agent and the surrounding environment. In order to fix those vulnerabilities and mitigate the risk. These controls can be applied to other RL agents in order to design more robust and secure RL agents.
Paul, Shuva, Ni, Zhen, Ding, Fei.  2020.  An Analysis of Post Attack Impacts and Effects of Learning Parameters on Vulnerability Assessment of Power Grid. 2020 IEEE Power Energy Society Innovative Smart Grid Technologies Conference (ISGT). :1–5.
Due to the increasing number of heterogeneous devices connected to electric power grid, the attack surface increases the threat actors. Game theory and machine learning are being used to study the power system failures caused by external manipulation. Most of existing works in the literature focus on one-shot process of attacks and fail to show the dynamic evolution of the defense strategy. In this paper, we focus on an adversarial multistage sequential game between the adversaries of the smart electric power transmission and distribution system. We study the impact of exploration rate and convergence of the attack strategies (sequences of action that creates large scale blackout based on the system capacity) based on the reinforcement learning approach. We also illustrate how the learned attack actions disrupt the normal operation of the grid by creating transmission line outages, bus voltage violations, and generation loss. This simulation studies are conducted on IEEE 9 and 39 bus systems. The results show the improvement of the defense strategy through the learning process. The results also prove the feasibility of the learned attack actions by replicating the disturbances created in simulated power system.
2021-09-07
Faqir, Nada, En-Nahnahi, Noureddine, Boumhidi, Jaouad.  2020.  Deep Q-learning Approach for Congestion Problem In Smart Cities. 2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS). :1–6.
Traffic congestion is a critical problem in urban area. In this study, our objective is the control of traffic lights in an urban environment, in order to avoid traffic jams and optimize vehicle traffic; we aim to minimize the total waiting time. Our system is based on a new paradigm, which is deep reinforcement learning; it can automatically learn all the useful characteristics of traffic data and develop a strategy optimizing adaptive traffic light control. Our system is coupled to a microscopic simulator based on agents (Simulation of Urban MObility - SUMO) providing a synthetic but realistic environment in which the exploration of the results of potential regulatory actions can be carried out.
2021-08-31
Sundar, Agnideven Palanisamy, Li, Feng, Zou, Xukai, Hu, Qin, Gao, Tianchong.  2020.  Multi-Armed-Bandit-based Shilling Attack on Collaborative Filtering Recommender Systems. 2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). :347–355.
Collaborative Filtering (CF) is a popular recommendation system that makes recommendations based on similar users' preferences. Though it is widely used, CF is prone to Shilling/Profile Injection attacks, where fake profiles are injected into the CF system to alter its outcome. Most of the existing shilling attacks do not work on online systems and cannot be efficiently implemented in real-world applications. In this paper, we introduce an efficient Multi-Armed-Bandit-based reinforcement learning method to practically execute online shilling attacks. Our method works by reducing the uncertainty associated with the item selection process and finds the most optimal items to enhance attack reach. Such practical online attacks open new avenues for research in building more robust recommender systems. We treat the recommender system as a black box, making our method effective irrespective of the type of CF used. Finally, we also experimentally test our approach against popular state-of-the-art shilling attacks.
Adamov, Alexander, Carlsson, Anders.  2020.  Reinforcement Learning for Anti-Ransomware Testing. 2020 IEEE East-West Design Test Symposium (EWDTS). :1–5.
In this paper, we are going to verify the possibility to create a ransomware simulation that will use an arbitrary combination of known tactics and techniques to bypass an anti-malware defense. To verify this hypothesis, we conducted an experiment in which an agent was trained with the help of reinforcement learning to run the ransomware simulator in a way that can bypass anti-ransomware solution and encrypt the target files. The novelty of the proposed method lies in applying reinforcement learning to anti-ransomware testing that may help to identify weaknesses in the anti-ransomware defense and fix them before a real attack happens.
2021-08-02
Chai, Xinzhong, Wang, Yasen, Yan, Chuanxu, Zhao, Yuan, Chen, Wenlong, Wang, Xiaolei.  2020.  DQ-MOTAG: Deep Reinforcement Learning-based Moving Target Defense Against DDoS Attacks. 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC). :375—379.
The rapid developments of mobile communication and wearable devices greatly improve our daily life, while the massive entities and emerging services also make Cyber-Physical System (CPS) much more complicated. The maintenance of CPS security tends to be more and more difficult. As a ”gamechanging” new active defense concept, Moving Target Defense (MTD) handle this tricky problem by periodically upsetting and recombining connections between users and servers in the protected system, which is so-called ”shuffle”. By this means, adversaries can hardly obtain enough time to compromise the potential victims, which is the indispensable condition to collect necessary information or conduct further malicious attacks. But every coin has two sides, MTD also introduce unbearable high energy consumption and resource occupation in the meantime, which hinders the large-scale application of MTD for quite a long time. In this paper, we propose a novel deep reinforcement learning-based MOTAG system called DQ-MOTAG. To our knowledge, this is the first work to provide self-adaptive shuffle period adjustment ability for MTD with reinforcement learning-based intelligent control mechanism. We also design an algorithm to generate optimal duration of next period to guide subsequent shuffle. Finally, we conduct a series of experiments to prove the availability and performance of DQ-MOTAG compared to exist methods. The result highlights our solution in terms of defense performance, error block rate and network source consumption.
Zhou, Zan, Xu, Changqiao, Ma, Tengchao, Kuang, Xiaohui.  2020.  Multi-vNIC Intelligent Mutation: A Moving Target Defense to thwart Client-side DNS Cache Attack. ICC 2020 - 2020 IEEE International Conference on Communications (ICC). :1—6.
As massive research efforts are poured into server-side DNS security enhancement in online cloud service platforms, sophisticated APTs tend to develop client-side DNS attacks, where defenders only have limited resources and abilities. The collaborative DNS attack is a representative newest client-side paradigm to stealthily undermine user cache by falsifying DNS responses. Different from existing static methods, in this paper, we propose a moving target defense solution named multi-vNIC intelligent mutation to free defenders from arduous work and thwart elusive client-side DNS attack in the meantime. Multiple virtual network interface cards are created and switched in a mutating manner. Thus attackers have to blindly guess the actual NIC with a high risk of exposure. Firstly, we construct a dynamic game-theoretic model to capture the main characteristics of both attacker and defender. Secondly, a reinforcement learning mechanism is developed to generate adaptive optimal defense strategy. Experiment results also highlight the security performance of our defense method compared to several state-of-the-art technologies.
2021-07-27
Xiao, Wenli, Jiang, Hao, Xia, Song.  2020.  A New Black Box Attack Generating Adversarial Examples Based on Reinforcement Learning. 2020 Information Communication Technologies Conference (ICTC). :141–146.
Machine learning can be misled by adversarial examples, which is formed by making small changes to the original data. Nowadays, there are kinds of methods to produce adversarial examples. However, they can not apply non-differentiable models, reduce the amount of calculations, and shorten the sample generation time at the same time. In this paper, we propose a new black box attack generating adversarial examples based on reinforcement learning. By using deep Q-learning network, we can train the substitute model and generate adversarial examples at the same time. Experimental results show that this method only needs 7.7ms to produce an adversarial example, which solves the problems of low efficiency, large amount of calculation and inapplicable to non-differentiable model.
2021-07-08
Long, Saiqin, Li, Zhetao, Xing, Yun, Tian, Shujuan, Li, Dongsheng, Yu, Rong.  2020.  A Reinforcement Learning-Based Virtual Machine Placement Strategy in Cloud Data Centers. :223—230.
{With the widespread use of cloud computing, energy consumption of cloud data centers is increasing which mainly comes from IT equipment and cooling equipment. This paper argues that once the number of virtual machines on the physical machines reaches a certain level, resource competition occurs, resulting in a performance loss of the virtual machines. Unlike most papers, we do not impose placement constraints on virtual machines by giving a CPU cap to achieve the purpose of energy savings in cloud data centers. Instead, we use the measure of performance loss to weigh. We propose a reinforcement learning-based virtual machine placement strategy(RLVMP) for energy savings in cloud data centers. The strategy considers the weight of virtual machine performance loss and energy consumption, which is finally solved with the greedy strategy. Simulation experiments show that our strategy has a certain improvement in energy savings compared with the other algorithms.
2021-06-01
Cideron, Geoffrey, Seurin, Mathieu, Strub, Florian, Pietquin, Olivier.  2020.  HIGhER: Improving instruction following with Hindsight Generation for Experience Replay. 2020 IEEE Symposium Series on Computational Intelligence (SSCI). :225–232.
Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality. While these characterizations may foster instructing, conditioning or structuring interactive agent behavior, it remains an open-problem to correctly relate language understanding and reinforcement learning in even simple instruction following scenarios. This joint learning problem is alleviated through expert demonstrations, auxiliary losses, or neural inductive biases. In this paper, we propose an orthogonal approach called Hindsight Generation for Experience Replay (HIGhER) that extends the Hindsight Experience Replay approach to the language-conditioned policy setting. Whenever the agent does not fulfill its instruction, HIGhER learns to output a new directive that matches the agent trajectory, and it relabels the episode with a positive reward. To do so, HIGhER learns to map a state into an instruction by using past successful trajectories, which removes the need to have external expert interventions to relabel episodes as in vanilla HER. We show the efficiency of our approach in the BabyAI environment, and demonstrate how it complements other instruction following methods.
2021-05-05
Rana, Krishan, Dasagi, Vibhavari, Talbot, Ben, Milford, Michael, Sünderhauf, Niko.  2020.  Multiplicative Controller Fusion: Leveraging Algorithmic Priors for Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). :6069—6076.
Learning-based approaches often outperform hand-coded algorithmic solutions for many problems in robotics. However, learning long-horizon tasks on real robot hardware can be intractable, and transferring a learned policy from simulation to reality is still extremely challenging. We present a novel approach to model-free reinforcement learning that can leverage existing sub-optimal solutions as an algorithmic prior during training and deployment. During training, our gated fusion approach enables the prior to guide the initial stages of exploration, increasing sample-efficiency and enabling learning from sparse long-horizon reward signals. Importantly, the policy can learn to improve beyond the performance of the sub-optimal prior since the prior's influence is annealed gradually. During deployment, the policy's uncertainty provides a reliable strategy for transferring a simulation-trained policy to the real world by falling back to the prior controller in uncertain states. We show the efficacy of our Multiplicative Controller Fusion approach on the task of robot navigation and demonstrate safe transfer from simulation to the real world without any fine-tuning. The code for this project is made publicly available at https://sites.google.com/view/mcf-nav/home.
2021-04-27
Piplai, A., Ranade, P., Kotal, A., Mittal, S., Narayanan, S. N., Joshi, A..  2020.  Using Knowledge Graphs and Reinforcement Learning for Malware Analysis. 2020 IEEE International Conference on Big Data (Big Data). :2626—2633.

Machine learning algorithms used to detect attacks are limited by the fact that they cannot incorporate the back-ground knowledge that an analyst has. This limits their suitability in detecting new attacks. Reinforcement learning is different from traditional machine learning algorithms used in the cybersecurity domain. Compared to traditional ML algorithms, reinforcement learning does not need a mapping of the input-output space or a specific user-defined metric to compare data points. This is important for the cybersecurity domain, especially for malware detection and mitigation, as not all problems have a single, known, correct answer. Often, security researchers have to resort to guided trial and error to understand the presence of a malware and mitigate it.In this paper, we incorporate prior knowledge, represented as Cybersecurity Knowledge Graphs (CKGs), to guide the exploration of an RL algorithm to detect malware. CKGs capture semantic relationships between cyber-entities, including that mined from open source. Instead of trying out random guesses and observing the change in the environment, we aim to take the help of verified knowledge about cyber-attack to guide our reinforcement learning algorithm to effectively identify ways to detect the presence of malicious filenames so that they can be deleted to mitigate a cyber-attack. We show that such a guided system outperforms a base RL system in detecting malware.

Pozdniakov, K., Alonso, E., Stankovic, V., Tam, K., Jones, K..  2020.  Smart Security Audit: Reinforcement Learning with a Deep Neural Network Approximator. 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA). :1–8.
A significant challenge in modern computer security is the growing skill gap as intruder capabilities increase, making it necessary to begin automating elements of penetration testing so analysts can contend with the growing number of cyber threats. In this paper, we attempt to assist human analysts by automating a single host penetration attack. To do so, a smart agent performs different attack sequences to find vulnerabilities in a target system. As it does so, it accumulates knowledge, learns new attack sequences and improves its own internal penetration testing logic. As a result, this agent (AgentPen for simplicity) is able to successfully penetrate hosts it has never interacted with before. A computer security administrator using this tool would receive a comprehensive, automated sequence of actions leading to a security breach, highlighting potential vulnerabilities, and reducing the amount of menial tasks a typical penetration tester would need to execute. To achieve autonomy, we apply an unsupervised machine learning algorithm, Q-learning, with an approximator that incorporates a deep neural network architecture. The security audit itself is modelled as a Markov Decision Process in order to test a number of decision-making strategies and compare their convergence to optimality. A series of experimental results is presented to show how this approach can be effectively used to automate penetration testing using a scalable, i.e. not exhaustive, and adaptive approach.
2021-03-15
Joykutty, A. M., Baranidharan, B..  2020.  Cognitive Radio Networks: Recent Advances in Spectrum Sensing Techniques and Security. 2020 International Conference on Smart Electronics and Communication (ICOSEC). :878–884.
Wireless networks are very significant in the present world owing to their widespread use and its application in domains like disaster management, smart cities, IoT etc. A wireless network is made up of a group of wireless nodes that communicate with each other without using any formal infrastructure. The topology of the wireless network is not fixed and it can vary. The huge increase in the number of wireless devices is a challenge owing to the limited availability of wireless spectrum. Opportunistic spectrum access by Cognitive radio enables the efficient usage of limited spectrum resources. The unused channels assigned to the primary users may go waste in idle time. Cognitive radio systems will sense the unused channel space and assigns it temporarily for secondary users. This paper discusses about the recent trends in the two most important aspects of Cognitive radio namely spectrum sensing and security.
2021-02-16
Shi, Y., Sagduyu, Y. E., Erpek, T..  2020.  Reinforcement Learning for Dynamic Resource Optimization in 5G Radio Access Network Slicing. 2020 IEEE 25th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD). :1—6.
The paper presents a reinforcement learning solution to dynamic resource allocation for 5G radio access network slicing. Available communication resources (frequency-time blocks and transmit powers) and computational resources (processor usage) are allocated to stochastic arrivals of network slice requests. Each request arrives with priority (weight), throughput, computational resource, and latency (deadline) requirements, and if feasible, it is served with available communication and computational resources allocated over its requested duration. As each decision of resource allocation makes some of the resources temporarily unavailable for future, the myopic solution that can optimize only the current resource allocation becomes ineffective for network slicing. Therefore, a Q-learning solution is presented to maximize the network utility in terms of the total weight of granted network slicing requests over a time horizon subject to communication and computational constraints. Results show that reinforcement learning provides major improvements in the 5G network utility relative to myopic, random, and first come first served solutions. While reinforcement learning sustains scalable performance as the number of served users increases, it can also be effectively used to assign resources to network slices when 5G needs to share the spectrum with incumbent users that may dynamically occupy some of the frequency-time blocks.
2021-01-28
Bhattacharya, A., Ramachandran, T., Banik, S., Dowling, C. P., Bopardikar, S. D..  2020.  Automated Adversary Emulation for Cyber-Physical Systems via Reinforcement Learning. 2020 IEEE International Conference on Intelligence and Security Informatics (ISI). :1—6.

Adversary emulation is an offensive exercise that provides a comprehensive assessment of a system’s resilience against cyber attacks. However, adversary emulation is typically a manual process, making it costly and hard to deploy in cyber-physical systems (CPS) with complex dynamics, vulnerabilities, and operational uncertainties. In this paper, we develop an automated, domain-aware approach to adversary emulation for CPS. We formulate a Markov Decision Process (MDP) model to determine an optimal attack sequence over a hybrid attack graph with cyber (discrete) and physical (continuous) components and related physical dynamics. We apply model-based and model-free reinforcement learning (RL) methods to solve the discrete-continuous MDP in a tractable fashion. As a baseline, we also develop a greedy attack algorithm and compare it with the RL procedures. We summarize our findings through a numerical study on sensor deception attacks in buildings to compare the performance and solution quality of the proposed algorithms.

2021-01-22
Akbari, I., Tahoun, E., Salahuddin, M. A., Limam, N., Boutaba, R..  2020.  ATMoS: Autonomous Threat Mitigation in SDN using Reinforcement Learning. NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium. :1—9.
Machine Learning has revolutionized many fields of computer science. Reinforcement Learning (RL), in particular, stands out as a solution to sequential decision making problems. With the growing complexity of computer networks in the face of new emerging technologies, such as the Internet of Things and the growing complexity of threat vectors, there is a dire need for autonomous network systems. RL is a viable solution for achieving this autonomy. Software-defined Networking (SDN) provides a global network view and programmability of network behaviour, which can be employed for security management. Previous works in RL-based threat mitigation have mostly focused on very specific problems, mostly non-sequential, with ad-hoc solutions. In this paper, we propose ATMoS, a general framework designed to facilitate the rapid design of RL applications for network security management using SDN. We evaluate our framework for implementing RL applications for threat mitigation, by showcasing the use of ATMoS with a Neural Fitted Q-learning agent to mitigate an Advanced Persistent Threat. We present the RL model's convergence results showing the feasibility of our solution for active threat mitigation.
2020-12-02
Tsiligkaridis, T., Romero, D..  2018.  Reinforcement Learning with Budget-Constrained Nonparametric Function Approximation for Opportunistic Spectrum Access. 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP). :579—583.

Opportunistic spectrum access is one of the emerging techniques for maximizing throughput in congested bands and is enabled by predicting idle slots in spectrum. We propose a kernel-based reinforcement learning approach coupled with a novel budget-constrained sparsification technique that efficiently captures the environment to find the best channel access actions. This approach allows learning and planning over the intrinsic state-action space and extends well to large state spaces. We apply our methods to evaluate coexistence of a reinforcement learning-based radio with a multi-channel adversarial radio and a single-channel carrier-sense multiple-access with collision avoidance (CSMA-CA) radio. Numerical experiments show the performance gains over carrier-sense systems.

2020-11-17
Zhou, Z., Qian, L., Xu, H..  2019.  Intelligent Decentralized Dynamic Power Allocation in MANET at Tactical Edge based on Mean-Field Game Theory. MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM). :604—609.

In this paper, decentralized dynamic power allocation problem has been investigated for mobile ad hoc network (MANET) at tactical edge. Due to the mobility and self-organizing features in MANET and environmental uncertainties in the battlefield, many existing optimal power allocation algorithms are neither efficient nor practical. Furthermore, the continuously increasing large scale of the wireless connection population in emerging Internet of Battlefield Things (IoBT) introduces additional challenges for optimal power allocation due to the “Curse of Dimensionality”. In order to address these challenges, a novel Actor-Critic-Mass algorithm is proposed by integrating the emerging Mean Field game theory with online reinforcement learning. The proposed approach is able to not only learn the optimal power allocation for IoBT in a decentralized manner, but also effectively handle uncertainties from harsh environment at tactical edge. In the developed scheme, each agent in IoBT has three neural networks (NN), i.e., 1) Critic NN learns the optimal cost function that minimizes the Signal-to-interference-plus-noise ratio (SINR), 2) Actor NN estimates the optimal transmitter power adjustment rate, and 3) Mass NN learns the probability density function of all agents' transmitting power in IoBT. The three NNs are tuned based on the Fokker-Planck-Kolmogorov (FPK) and Hamiltonian-Jacobian-Bellman (HJB) equation given in the Mean Field game theory. An IoBT wireless network has been simulated to evaluate the effectiveness of the proposed algorithm. The results demonstrate that the actor-critic-mass algorithm can effectively approximate the probability distribution of all agents' transmission power and converge to the target SINR. Moreover, the optimal decentralized power allocation is obtained through integrated mean-field game theory with reinforcement learning.

2020-10-05
Chakraborty, Anit, Dutta, Sayandip, Bhattacharyya, Siddhartha, Platos, Jan, Snasel, Vaclav.  2018.  Reinforcement Learning inspired Deep Learned Compositional Model for Decision Making in Tracking. 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN). :158—163.

We formulate a tracker which performs incessant decision making in order to track objects where the objects may undergo different challenges such as partial occlusions, moving camera, cluttered background etc. In the process, the agent must make a decision on whether to keep track of the object when it is occluded or has moved out of the frame temporarily based on its prediction from the previous location or to reinitialize the tracker based on the belief that the target has been lost. Instead of the heuristic methods we depend on reward and penalty based training that helps the agent reach an optimal solution via this partially observable Markov decision making (POMDP). Furthermore, we employ deeply learned compositional model to estimate human pose in order to better handle occlusion without needing human inputs. By learning compositionality of human bodies via deep neural network the agent can make better decision on presence of human in a frame or lack thereof under occlusion. We adapt skeleton based part representation and do away with the large spatial state requirement. This especially helps in cases where orientation of the target in focus is unorthodox. Finally we demonstrate that the deep reinforcement learning based training coupled with pose estimation capabilities allows us to train and tag multiple large video datasets much quicker than previous works.