Mixed Initiative and Collaborative Learning in Adversarial Environments (Apr 2020)
PIs: Claire Tomlin (Lead), Shankar Sastry, Xenofon Koutsoukos, Janos Sztipanovits
HARD PROBLEM(S) ADDRESSED: Human Behavior (primary), Resilient Architectures (secondary), and Scalability and Composability (secondary)
We have been developing a framework for incorporating human behavior into resilient robot motion planning. We have been developing scalable, online safety updates of these motion plans
PUBLICATIONS (from the current quarter only; pending publications go in section 2 below)
- D. Fridovich-Keil, E. Ratner, L. Peters, A. D. Dragan, and C. J. Tomlin, ‘‘Efficient iterativelinear-quadratic approximations for nonlinear multi-player general-sum differential games,’’ in International Conference on Robotics and Automation (ICRA), 2020.
- L. Peters, D. Fridovich-Keil, C. J. Tomlin, and Z. Sunberg, ‘‘Inference-based strategy alignment for general-sum differential games,’’ in International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2020.
- Lim, M. H., Tomlin, C. J., & Sunberg, Z. N. (2020). “Sparse tree search optimality guarantees in POMDPs with continuous observation spaces.” The 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI 2020).
- S. Bansal, A. Bajcsy, E. Ratner, A. Dragan, C. J. Tomlin. A Hamilton-Jacobi Reachability-Based Framework for Predicting and Analyzing Human Motion for Safe Planning. To appear in the International Conference on Robotics and Automation (ICRA), 2020.
KEY HIGHLIGHTS
Each effort should submit one or two specific highlights. Each item should include a paragraph or two along with a citation if available. Write as if for the general reader of IEEE S&P.
The purpose of the highlights is to give our immediate sponsors a body of evidence that the funding they are providing (in the framework of the SoS lablet model) is delivering results that "more than justify" the investment they are making
Many problems in robotics involve multiple decision making agents. To operate efficiently in such settings, a robot must reason about the impact of its decisions on the behavior of other agents. Differential games offer an expressive theoretical framework for formulating these types of multi-agent problems. Unfortunately, most numerical solution techniques scale poorly with state dimension and are rarely used in real-time applications. For this reason, it is common to predict the future decisions of other agents and solve the resulting decoupled, i.e., single-agent, optimal control problem. This decoupling neglects the underlying interactive nature of the problem; however, efficient solution techniques do exist for broad classes of optimal control problems. We take inspiration from one such technique, the iterative linear-quadratic regulator (ILQR), which solves repeated approximations with linear dynamics and quadratic costs. Similarly, our proposed algorithm solves repeated linear-quadratic games. We experimentally benchmark our algorithm in several examples with a variety of initial conditions and show that the resulting strategies exhibit complex interactive behavior. Our results indicate that our algorithm converges reliably and runs in real- time. In a three-player, 14-state simulated intersection problem, our algorithm initially converges in <0.25s. Receding horizon invocations converge in <50 ms in a hardware collision-avoidance test.
Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.
Real-world autonomous systems often employ probabilistic predictive models of human behavior during planning to reason about their future motion. Since accurately modeling human behavior a priori is challenging, such models are often parameterized, enabling the robot to adapt predictions based on observations by maintaining a distribution over the model parameters. Although this enables data and priors to improve the human model, observation models are difficult to specify and priors may be incorrect, leading to erroneous state predictions that can degrade the safety of the robot motion plan. In this work, we seek to design a predictor which is more robust to misspecified models and priors, but can still leverage human behavioral data online to reduce conservatism in a safe way. To do this, we cast human motion prediction as a Hamilton-Jacobi reachability problem in the joint state space of the human and the belief over the model parameters. We construct a new continuous-time dynamical system, where the inputs are the observations of human behavior, and the dynamics include how the belief over the model parameters change. The results of this reachability computation enable us to both analyze the effect of incorrect priors on future predictions in continuous state and time, as well as to make predictions of the human state in the future. We compare our approach to the worst-case forward reachable set and a stochastic predictor which uses Bayesian inference and produces full future state distributions. Our comparisons in simulation and in hardware demonstrate how our framework can enable robust planning while not being overly conservative, even when the human model is inaccurate.
COMMUNITY ENGAGEMENTS
Claire Tomlin will run the 6th installment of Berkeley Girls in Engineering (GiE), a program held at UC Berkeley for middle school students, in Summer 2020. Traditionally, the program runs for 4 weeks, with 30 students participating per week, for a total of 120 students each summer. The week long day camp includes modules across all types of engineering, with hands-on experiments, to teach about bioengineering, robotics, material science, computer science, water treatment, concrete design, and a range of other engineering topics. Students team up to complete a poster about an engineering problem and how they would solve it, presented at the end of the week to the camp and family members. This year, we are now planning on a virtual format, in which we are preparing a kit, as well as a lended chromebook and wifi hotspot access for each participant.
EDUCATIONAL ADVANCES
We are developing a new course in systems theory at Berkeley, to be taken by upper level undergraduates and first and second year graduate students, on a rapprochement between control theory and reinforcement learning. The course will focus on a modern viewpoint on modeling, analysis, and control design, leveraging tools and successes from both systems and control theory and machine learning. The first version of this course is being taught by Shankar Sastry in Spring 2020.