Visible to the public Biblio

Filters: Author is Vasal, Deepanshu  [Clear All Filters]
2023-02-02
Vasal, Deepanshu.  2022.  Sequential decomposition of Stochastic Stackelberg games. 2022 American Control Conference (ACC). :1266–1271.
In this paper, we consider a discrete-time stochastic Stackelberg game where there is a defender (also called leader) who has to defend a target and an attacker (also called follower). The attacker has a private type that evolves as a controlled Markov process. The objective is to compute the stochastic Stackelberg equilibrium of the game where defender commits to a strategy. The attacker’s strategy is the best response to the defender strategy and defender’s strategy is optimum given the attacker plays the best response. In general, computing such equilibrium involves solving a fixed-point equation for the whole game. In this paper, we present an algorithm that computes such strategies by solving lower dimensional fixed-point equations for each time t. Based on this algorithm, we compute the Stackelberg equilibrium of a security example.
2022-10-20
Mishra, Rajesh K, Vasal, Deepanshu, Vishwanath, Sriram.  2020.  Model-free Reinforcement Learning for Stochastic Stackelberg Security Games. 2020 59th IEEE Conference on Decision and Control (CDC). :348—353.
In this paper, we consider a sequential stochastic Stackelberg game with two players, a leader, and a follower. The follower observes the state of the system privately while the leader does not. Players play Stackelberg equilibrium where the follower plays best response to the leader's strategy. In such a scenario, the leader has the advantage of committing to a policy that maximizes its returns given the knowledge that the follower is going to play the best response to its policy. Such a pair of strategies of both the players is defined as Stackelberg equilibrium of the game. Recently, [1] provided a sequential decomposition algorithm to compute the Stackelberg equilibrium for such games which allow for the computation of Markovian equilibrium policies in linear time as opposed to double exponential, as before. In this paper, we extend that idea to the case when the state update dynamics are not known to the players, to propose an reinforcement learning (RL) algorithm based on Expected Sarsa that learns the Stackelberg equilibrium policy by simulating a model of the underlying Markov decision process (MDP). We use particle filters to estimate the belief update for a common agent that computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm.