TIRL: Enriching Actor-Critic RL with non-expert human teachers and a Trust Model

Submitted by aekwall on Mon, 02/01/2021 - 11:40am

Title	TIRL: Enriching Actor-Critic RL with non-expert human teachers and a Trust Model
Publication Type	Conference Paper
Year of Publication	2020
Authors	Rutard, F., Sigaud, O., Chetouani, M.
Conference Name	2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)
Date Published	aug
Keywords	actor-critic agent, actor-critic RL, attractive tools, computer aided instruction, Human Behavior, human model, human teacher, human trust, learning (artificial intelligence), nonexpert human teachers, nonexpert teachers, optimal policy, physical robots, pubcrawl, reinforcement learning algorithms, RL architecture, sequential tasks, simulated teachers, sparse teaching signals, teacher demonstrations, teaching, TIRL, Training data, trust model, trust-based interactive task learning
Abstract	Reinforcement learning (RL) algorithms have been demonstrated to be very attractive tools to train agents to achieve sequential tasks. However, these algorithms require too many training data to converge to be efficiently applied to physical robots. By using a human teacher, the learning process can be made faster and more robust, but the overall performance heavily depends on the quality and availability of teacher demonstrations or instructions. In particular, when these teaching signals are inadequate, the agent may fail to learn an optimal policy. In this paper, we introduce a trust-based interactive task learning approach. We propose an RL architecture able to learn both from environment rewards and from various sparse teaching signals provided by non-expert teachers, using an actor-critic agent, a human model and a trust model. We evaluate the performance of this architecture on 4 different setups using a maze environment with different simulated teachers and show that the benefits of the trust model.
DOI	10.1109/RO-MAN47096.2020.9223530
Citation Key	rutard_tirl_2020

Groups:

Science of Security VO