Visible to the public Biblio

Filters: Keyword is optimal policy  [Clear All Filters]
2021-02-01
Rutard, F., Sigaud, O., Chetouani, M..  2020.  TIRL: Enriching Actor-Critic RL with non-expert human teachers and a Trust Model. 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). :604–611.
Reinforcement learning (RL) algorithms have been demonstrated to be very attractive tools to train agents to achieve sequential tasks. However, these algorithms require too many training data to converge to be efficiently applied to physical robots. By using a human teacher, the learning process can be made faster and more robust, but the overall performance heavily depends on the quality and availability of teacher demonstrations or instructions. In particular, when these teaching signals are inadequate, the agent may fail to learn an optimal policy. In this paper, we introduce a trust-based interactive task learning approach. We propose an RL architecture able to learn both from environment rewards and from various sparse teaching signals provided by non-expert teachers, using an actor-critic agent, a human model and a trust model. We evaluate the performance of this architecture on 4 different setups using a maze environment with different simulated teachers and show that the benefits of the trust model.
2020-04-03
Liau, David, Zaeem, Razieh Nokhbeh, Barber, K. Suzanne.  2019.  Evaluation Framework for Future Privacy Protection Systems: A Dynamic Identity Ecosystem Approach. 2019 17th International Conference on Privacy, Security and Trust (PST). :1—3.
In this paper, we leverage previous work in the Identity Ecosystem, a Bayesian network mathematical representation of a person's identity, to create a framework to evaluate identity protection systems. Information dynamic is considered and a protection game is formed given that the owner and the attacker both gain some level of control over the status of other PII within the dynamic Identity Ecosystem. We present a policy iteration algorithm to solve the optimal policy for the game and discuss its convergence. Finally, an evaluation and comparison of identity protection strategies is provided given that an optimal policy is used against different protection policies. This study is aimed to understand the evolutionary process of identity theft and provide a framework for evaluating different identity protection strategies and future privacy protection system.
2019-06-17
Zheng, Jianjun, Siami Namin, Akbar.  2018.  A Markov Decision Process to Determine Optimal Policies in Moving Target. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :2321–2323.

Moving Target Defense (MTD) has been introduced as a new game changer strategy in cybersecurity to strengthen defenders and conversely weaken adversaries. The successful implementation of an MTD system can be influenced by several factors including the effectiveness of the employed technique, the deployment strategy, the cost of the MTD implementation, and the impact from the enforced security policies. Several efforts have been spent on introducing various forms of MTD techniques. However, insufficient research work has been conducted on cost and policy analysis and more importantly the selection of these policies in an MTD-based setting. This poster paper proposes a Markov Decision Process (MDP) modeling-based approach to analyze security policies and further select optimal policies for moving target defense implementation and deployment. The adapted value iteration method would solve the Bellman Optimality Equation for optimal policy selection for each state of the system. The results of some simulations indicate that such modeling can be used to analyze the impact of costs of possible actions towards the optimal policies.

2018-09-28
Yang, Y., Wunsch, D., Yin, Y..  2017.  Hamiltonian-driven adaptive dynamic programming for nonlinear discrete-time dynamic systems. 2017 International Joint Conference on Neural Networks (IJCNN). :1339–1346.

In this paper, based on the Hamiltonian, an alternative interpretation about the iterative adaptive dynamic programming (ADP) approach from the perspective of optimization is developed for discrete time nonlinear dynamic systems. The role of the Hamiltonian in iterative ADP is explained. The resulting Hamiltonian driven ADP is able to evaluate the performance with respect to arbitrary admissible policies, compare two different admissible policies and further improve the given admissible policy. The convergence of the Hamiltonian ADP to the optimal policy is proven. Implementation of the Hamiltonian-driven ADP by neural networks is discussed based on the assumption that each iterative policy and value function can be updated exactly. Finally, a simulation is conducted to verify the effectiveness of the presented Hamiltonian-driven ADP.