Xu, Chengtao, Song, Houbing.
2021.
Mixed Initiative Balance of Human-Swarm Teaming in Surveillance via Reinforcement learning. 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC). :1—10.
Human-machine teaming (HMT) operates in a context defined by the mission. Varying from the complexity and disturbance in the cooperation between humans and machines, a single machine has difficulty handling work with humans in the scales of efficiency and workload. Swarm of machines provides a more feasible solution in such a mission. Human-swarm teaming (HST) extends the concept of HMT in the mission, such as persistent surveillance, search-and-rescue, warfare. Bringing the concept of HST faces several scientific challenges. For example, the strategies of allocation on the high-level decision making. Here, human usually plays the supervisory or decision making role. Performance of such fixed structure of HST in actual mission operation could be affected by the supervisor’s status from many aspects, which could be considered in three general parts: workload, situational awareness, and trust towards the robot swarm teammate and mission performance. Besides, the complexity of a single human operator in accessing multiple machine agents increases the work burdens. An interface between swarm teammates and human operators to simplify the interaction process is desired in the HST.In this paper, instead of purely considering the workload of human teammates, we propose the computational model of human swarm interaction (HSI) in the simulated map surveillance mission. UAV swarm and human supervisor are both assigned in searching a predefined area of interest (AOI). The workload allocation of map monitoring is adjusted based on the status of the human worker and swarm teammate. Workload, situation awareness ability, trust are formulated as independent models, which affect each other. A communication-aware UAV swarm persistent surveillance algorithm is assigned in the swarm autonomy portion. With the different surveillance task loads, the swarm agent’s thrust parameter adjusts the autonomy level to fit the human operator’s needs. Reinforcement learning is applied in seeking the relative balance of workload in both human and swarm sides. Metrics such as mission accomplishment rate, human supervisor performance, mission performance of UAV swarm are evaluated in the end. The simulation results show that the algorithm could learn the human-machine trust interaction to seek the workload balance to reach better mission execution performance. This work inspires us to leverage a more comprehensive HST model in more practical HMT application scenarios.
Huang, Chao, Luo, Wenhao, Liu, Rui.
2021.
Meta Preference Learning for Fast User Adaptation in Human-Supervisory Multi-Robot Deployments. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). :5851—5856.
As multi-robot systems (MRS) are widely used in various tasks such as natural disaster response and social security, people enthusiastically expect an MRS to be ubiquitous that a general user without heavy training can easily operate. However, humans have various preferences on balancing between task performance and safety, imposing different requirements onto MRS control. Failing to comply with preferences makes people feel difficult in operation and decreases human willingness of using an MRS. Therefore, to improve social acceptance as well as performance, there is an urgent need to adjust MRS behaviors according to human preferences before triggering human corrections, which increases cognitive load. In this paper, a novel Meta Preference Learning (MPL) method was developed to enable an MRS to fast adapt to user preferences. MPL based on meta learning mechanism can quickly assess human preferences from limited instructions; then, a neural network based preference model adjusts MRS behaviors for preference adaption. To validate method effectiveness, a task scenario "An MRS searches victims in an earthquake disaster site" was designed; 20 human users were involved to identify preferences as "aggressive", "medium", "reserved"; based on user guidance and domain knowledge, about 20,000 preferences were simulated to cover different operations related to "task quality", "task progress", "robot safety". The effectiveness of MPL in preference adaption was validated by the reduced duration and frequency of human interventions.