Visible to the public Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks

TitleDeep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks
Publication TypeConference Paper
Year of Publication2018
AuthorsYu, Yiding, Wang, Taotao, Liew, Soung Chang
Conference Name2018 IEEE International Conference on Communications (ICC)
Date PublishedMay 2018
PublisherIEEE
ISBN Number978-1-5386-3180-5
Keywordsclean slate, clean-slate design, Collaboration, convergence, DARPA SC2, deep reinforcement learning, deep-reinforcement learning multiple access, design framework, DLMA, DRL agent, DRL algorithmic framework, heterogeneous wireless networks, Human Behavior, human factors, learning (artificial intelligence), MAC design, machine learning, machine-learning technique, Media Access Protocol, Metrics, network layers, neural nets, Neural networks, optimal MAC strategy, policy, Policy Based Governance, policy governance, prior knowledge, pubcrawl, radio networks, resilience, Resiliency, Robustness, share spectrum, telecommunication computing, time division multiple access, time-slotted networks, traditional reinforcement learning, universal MAC protocol, wireless networks
Abstract

This paper investigates the use of deep reinforcement learning (DRL) in the design of a "universal" MAC protocol referred to as Deep-reinforcement Learning Multiple Access (DLMA). The design framework is partially inspired by the vision of DARPA SC2, a 3-year competition whereby competitors are to come up with a clean-slate design that "best share spectrum with any network(s), in any environment, without prior knowledge, leveraging on machine-learning technique". While the scope of DARPA SC2 is broad and involves the redesign of PHY, MAC, and Network layers, this paper's focus is narrower and only involves the MAC design. In particular, we consider the problem of sharing time slots among a multiple of time-slotted networks that adopt different MAC protocols. One of the MAC protocols is DLMA. The other two are TDMA and ALOHA. The DRL agents of DLMA do not know that the other two MAC protocols are TDMA and ALOHA. Yet, by a series of observations of the environment, its own actions, and the rewards - in accordance with the DRL algorithmic framework - a DRL agent can learn the optimal MAC strategy for harmonious co-existence with TDMA and ALOHA nodes. In particular, the use of neural networks in DRL (as opposed to traditional reinforcement learning) allows for fast convergence to optimal solutions and robustness against perturbation in hyper- parameter settings, two essential properties for practical deployment of DLMA in real wireless networks.

URLhttps://ieeexplore.ieee.org/document/8422168
DOI10.1109/ICC.2018.8422168
Citation Keyyu_deep-reinforcement_2018