Visible to the public SaTC: CORE: Small: Towards Adversarially Robust Machine LearningConflict Detection Enabled

Project Details

Performance Period

Aug 15, 2018 - Jul 31, 2021

Institution(s)

Massachusetts Institute of Technology

Award Number


Machine learning has witnessed tremendous progress over the last decade and is rapidly becoming a critical part of key aspects of our lives, from health care and financial services, to the way we evaluate job applications, commute to work, or even use media. This gives rise to a fundamental question: how will all these machine learning solutions fare when applied in real-world settings that are safety-sensitive or even security-critical? Will these solutions be sufficiently reliable and resistant to malicious tampering? These concerns are hardly unjustified. In fact, the current state-of-the-art machine learning toolkit was tailored to optimize for the "average case" performance and turns out to be catastrophically vulnerable to more "worst case" inputs and manipulation. A particularly prominent problem is the existence of so-called adversarial examples, i.e., inputs that are almost indistinguishable from natural data and yet cause the model to act incorrectly. This project's overarching goal is to re-think machine learning methodology to make it deliver solutions that are robust and secure. Much of the focus will be on building a principled and holistic understanding of adversarial robustness, i.e., resistance to adversarial examples, as a phenomenon both in machine learning and in security. As part of this goal, the team will conduct community-building and outreach efforts including hosting challenge competitions to test machine learning robustness, disseminating datasets and code, and developing course materials and research projects suitable for a variety of undergraduate and high school curricula.

The planned work will pursue three main thrusts: exploring the complexity landscape of adversarial robustness, analyzing the power and limitations of adversarial training (the current dominant approach to producing adversarially robust models), and designing classifiers whose adversarial robustness can be rigorously verified. The intent is to develop the theoretical foundations of the studied concepts but also to leverage them to engage the practical aspects of the problem. More specifically, on one hand, the project aims to establish formal threat models and provide ultimate limits on what kind of security can and cannot be achieved for them. The directions to be explored here span complexity analyses of adversarially robust generalization and model stealing, design of new regularization and optimization techniques for these contexts, and adapting the existing -- mostly continuous -- methods to discrete domains. On the other hand, the plan is to use the developed techniques to deploy adversarially robust models and then validate their robustness via a mix of public security challenges and benchmarks as well as formal verification tools. Design of these tools is a part of this project too and it involves exploring methods for reducing the number of non-linearities deep learning models use as well as applying convex programming approaches.