Title | Deep Reinforcement Learning Based Node Pairing Scheme in Edge-Chain for IoT Applications |
Publication Type | Conference Paper |
Year of Publication | 2020 |
Authors | Gao, Yang, Wu, Weniun, Dong, Junyu, Yin, Yufeng, Si, Pengbo |
Conference Name | GLOBECOM 2020 - 2020 IEEE Global Communications Conference |
Date Published | dec |
Keywords | blockchain, blockchain technology, Deep Reinforcement Learning (DRL), edge computing, Human Behavior, human factors, Internet of Things, Internet of Things (IoT), Metrics, Mobile Edge Computing (MEC), Optimization, Policy Gradient (PG) method, pubcrawl, Resource management, Scalability, Servers, Tamper resistance, Task Analysis |
Abstract | Nowadays, the Internet of Things (IoT) is playing an important role in our life. This inevitably generates mass data and requires a more secure transmission. As blockchain technology can build trust in a distributed environment and ensure the data traceability and tamper resistance, it is a promising way to support IoT data transmission and sharing. In this paper, edge computing is considered to provide adequate resources for end users to offload computing tasks in the blockchain enabled IoT system, and the node pairing problem between end users and edge computing servers is researched with the consideration of wireless channel quality and the service quality. From the perspective of the end users, the objective optimization is designed to maximize the profits and minimize the payments for completing the tasks and ensuring the resource limits of the edge servers at the same time. The deep reinforcement learning (DRL) method is utilized to train an intelligent strategy, and the policy gradient based node pairing (PG-NP) algorithm is proposed. Through a deep neural network, the well-trained policy matched the system states to the optimal actions. The REINFORCE algorithm with baseline is applied to train the policy network. According to the training results, as the comparison strategies are max-credit, max-SINR, random and max-resource, the PG-NP algorithm performs about 57% better than the second-best method. And testing results show that PGNP also has a good generalization ability which is negatively correlated with the training performance to a certain extend. |
DOI | 10.1109/GLOBECOM42002.2020.9322205 |
Citation Key | gao_deep_2020 |