Biblio
Filters: Author is Mishra, Rajesh K [Clear All Filters]
Model-free Reinforcement Learning for Stochastic Stackelberg Security Games. 2020 59th IEEE Conference on Decision and Control (CDC). :348—353.
.
2020. In this paper, we consider a sequential stochastic Stackelberg game with two players, a leader, and a follower. The follower observes the state of the system privately while the leader does not. Players play Stackelberg equilibrium where the follower plays best response to the leader's strategy. In such a scenario, the leader has the advantage of committing to a policy that maximizes its returns given the knowledge that the follower is going to play the best response to its policy. Such a pair of strategies of both the players is defined as Stackelberg equilibrium of the game. Recently, [1] provided a sequential decomposition algorithm to compute the Stackelberg equilibrium for such games which allow for the computation of Markovian equilibrium policies in linear time as opposed to double exponential, as before. In this paper, we extend that idea to the case when the state update dynamics are not known to the players, to propose an reinforcement learning (RL) algorithm based on Expected Sarsa that learns the Stackelberg equilibrium policy by simulating a model of the underlying Markov decision process (MDP). We use particle filters to estimate the belief update for a common agent that computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm.