Mixed Initiative and Collaborative Learning in Adversarial Environments (July 2019)
PIs: Claire Tomlin (Lead), Shankar Sastry, Xenofon Koutsoukos, Janos Sztipanovits
HARD PROBLEM(S) ADDRESSED: Human Behavior (primary), Resilient Architectures (secondary), and Scalability and Composability (secondary)
Our primary work on the lablet has been on the brittleness of machine learning (esp deep learning) algorithms when used for intrusion detection or for the detection of Advanced Persistent Threats. In a number of papers we have addressed the following concerns:
1. Step Size Matters in Deep learning: We showed that choice of step size was critical in determining when deep learning algorithms converged or when they resulted in limit cycles. This has implications for their use in intrusion detection.
2. Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples. State-of-the-art neural networks (esp. those using cross-entropy loss) are vulnerable to adversarial examples; they can easily misclassify inputs that are imperceptibly different than their training and test data. We establish that the use of cross-entropy loss function and the low-rank features of the training data have responsibility for this misclassification.
PUBLICATIONS
1. K. Nar and S. Sastry, "Step Size Matters in Deep Learning", Proceedings of Advances in Neural Information Processing, December 2018.
2. K. Nar, O. Ocal, S. Sastry and K. Ramachandran, "Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples", https://arxiv.org/abs/1901.08360, January 2019.
KEY HIGHLIGHTS
1. Step Size Matters in Deep learning
Training a neural network with the gradient descent algorithm gives rise to a discrete-time nonlinear dynamical system. Consequently, behaviors that are typically observed in these systems emerge during training, such as convergence to an orbit but not to a fixed point or dependence of convergence on the initialization. To elucidate the effects of the step size on training of neural networks, we study the gradient descent algorithm as a discrete-time dynamical system, and by analyzing the Lyapunov stability of different solutions, we show the relationship between the step size of the algorithm and the solutions that can be obtained with this algorithm. The results provide an explanation for several phenomena observed in practice, including the deterioration in the training error with increased depth, the hardness of estimating linear mappings with large singular values, and the distinct performance of deep residual networks.
2. Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples
State-of-the-art neural networks are vulnerable to adversarial examples; they can easily misclassify inputs that are imperceptibly different than their training and test data. In this work, we establish that the use of cross-entropy loss function and the low-rank features of the training data have responsibility for the existence of these inputs. Based on this observation, we suggest that addressing adversarial examples requires rethinking the use of cross-entropy loss function and looking for an alternative that is more suited for minimization with low-rank features. In this direction, we present a training scheme called differential training, which uses a loss function defined on the differences between the features of points from opposite classes. We show that differential training can ensure a large margin between the decision boundary of the neural network and the points in the training dataset. This larger margin increases the amount of perturbation needed to flip the prediction of the classifier and makes it harder to find an adversarial example with small perturbations. We test differential training on a binary classification task with CIFAR-10 dataset and demonstrate that it radically reduces the ratio of images for which an adversarial example could be found -- not only in the training dataset, but in the test dataset as well.
COMMUNITY ENGAGEMENTS
Our results have drawn the ire of large segments of the Machine Learning community, because they have shown relative simple ways that deep learning is brittle.
EDUCATIONAL ADVANCES
We need to think about how machine learning is taught. In particular, about the relationship between low rank features of the data and machine learning. We are planning to teach a class on this topic in Fall 2019, with Professor Yi Ma.