Mixed Initiative and Collaborative Learning in Adversarial Environments (Jan-Mar 2022)
PIs: Claire Tomlin (Lead), Shankar Sastry, Xenofon Koutsoukos, Janos Sztipanovits
Reporting Period: (7/1/2021 – 12/31/2021)
Hard Problems Addressed
Resilient Architectures (primary)
Scalability and Composability (secondary)
In the past quarter Jan-Mar 2022, our focus has continued on developing resilient control structures which use learning mechanisms. This work includes the development of learning based system identification methods, analysis of learning the cost function, as well as Lyapunov Density models, which merge training data density models with Lyapunov techniques.
Publications
[1] T. Westenbroek, A. Siththaranjan, M. Sarwari, C. Tomlin, S. Sastry, “On the Computational Consequences of Cost Function Design in Nonlinear Optimal Control”, submitted to the IEEE CDC, March 2022.
[2] Shankar A. Deka, Alonso M. Valle and Claire J. Tomlin, “Koopman-based Neural Lyapunov functions for general attractors”, submitted to the IEEE CDC, March 2022.
[3] Katie Kang, Paula Gradu, Jason Choi, Claire Tomlin, Sergey Levine, “Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control”, Submitted to ICML, 2022
Key Highlights
Optimal control is an essential tool for stabilizing complex nonlinear system. However, despite the extensive impacts of methods such as receding horizon control, dynamic programming and reinforcement learning, the design of cost functions for a particular system often remains a heuristic-driven process of trial and error. In this work we seek to gain insights into how the choice of cost function interacts with the underlying structure of the control system and impacts the amount of computation required to obtain a stabilizing controller. We treat the cost design problem as a two-step process where the designer specifies outputs for the system that are to be penalized and then modulates the relative weighting of the inputs and the outputs in the cost. We then bound the length of the prediction horizon T > 0 that is required for receding horizon control methods to stabilize the system as a concrete way of characterizing the computational difficulty of stabilizing the system using the chosen cost function. Drawing on insights from the ‘cheap control’ literature, we investigate cases where the chosen outputs lead to minimum phase and non-minimum phase input-output dynamics. When the system is minimum phase, the prediction horizon needed to ensure stability can be made arbitrarily small by making the penalty on the control small enough. This indicates that choices of cost function which implicitly induce minimum phase behavior lead to an optimal control problem from which it is ‘easy’ to obtain a stabilizing controller. Using these insights, we investigate empirically how the choice of cost function affects the ability of modern reinforcement learning algorithms to learn a stabilizing controller. Taken together, the results in this paper indicate that cost functions which induce non-minimum phase behavior lead to inherent computational difficulties.
Koopman spectral theory has grown in the past decade as a powerful tool for dynamical systems analysis and control. In this paper, we show how recent data-driven techniques for estimating Koopman-Invariant subspaceswith neural networks can be leveraged to extract Lyapunov
certificates for the underlying system. In our work, we specifically focus on systems with a limit-cycle, beyond just an isolated equilibrium point, and use Koopman eigenfunctions to efficiently parameterize candidate Lyapunov functions to construct forward-invariant sets under some (unknown) attractor dynamics. Additionally, when the dynamics are polynomial and when neural networks are replaced by polynomials as a choice of function approximators in our approach, one can further leverage Sum-of-Squares programs and/or nonlinear programs to yield provably correct Lyapunov certificates. In such a polynomial case, our Koopman-based approach for constructing Lyapunov functions uses significantly fewer decision variables compared to directly formulating and solving a Sum-of-Squares optimization problem
Learned models and policies can generalize effectively when evaluated within the distribution of the training data, but can produce unpredictable and erroneous outputs on out-of-distribution inputs. In order to avoid distribution shift when deploying learning-based control algorithms, we seek a mechanism to constrain the agent to states and actions that resemble those that it was trained on. In control theory, Lyapunov stability and control-invariant sets allow us to make guarantees about controllers that stabilize the system around specific states, while in machine learning, density models allow us to estimate the training data distribution. Can we combine these two concepts, producing learning-based control algorithms that constrain the system to in-distribution states using only in-distribution actions? In this work, we propose to do this by combining concepts from Lyapunov stability and density estimation, introducing Lyapunov density models: a generalization of control Lyapunov functions and density models that provides guarantees on an agent's ability to stay in-distribution over its entire trajectory.
Community Engagement
March 2020 to present: Shankar Sastry launched a new Institute entitled the C3 Digital Transformation Institute (https://c3dti.ai) a partnership of Berkeley, UIUC (co-leads) with U Chicago, CMU, MIT, Princeton, Stanford, Royal Institute of Technology to develop the science and technology of Digital Transformation. The private philanthropy that supports this institute was very much leveraged on the support of Federal research such as this SoS lablet. We have been furthering the agenda of SoS in the workshops that this institute has run in the Spring see https://c3dti.ai/events/workshops. In the reporting period we held two workshops, one on Networks of Machine Learning, for Machine Learning, by Machine Learning (September 22–24, 2021), and a second on Digital Transformation of the Built Environment (October 26 & 28, 2021).