Title | Fidelity: Towards Measuring the Trustworthiness of Neural Network Classification |
Publication Type | Conference Paper |
Year of Publication | 2019 |
Authors | Yang, Z. |
Conference Name | 2019 IEEE Conference on Dependable and Secure Computing (DSC) |
Keywords | adversarial attack detection, adversarial examples, adversarial settings, composability, Computational modeling, learning (artificial intelligence), machine learning, machine learning model, neural nets, neural network classification, neural network system, Neural networks, pattern classification, Perturbation methods, pubcrawl, security of data, security-critical tasks, Sociology, Statistics, Task Analysis, Trusted Computing, trustworthiness |
Abstract | With the increasing performance of neural networks on many security-critical tasks, the security concerns of machine learning have become increasingly prominent. Recent studies have shown that neural networks are vulnerable to adversarial examples: carefully crafted inputs with negligible perturbations on legitimate samples could mislead a neural network to produce adversary-selected outputs while humans can still correctly classify them. Therefore, we need an additional measurement on the trustworthiness of the results of a machine learning model, especially in adversarial settings. In this paper, we analyse the root cause of adversarial examples, and propose a new property, namely fidelity, of machine learning models to describe the gap between what a model learns and the ground truth learned by humans. One of its benefits is detecting adversarial attacks. We formally define fidelity, and propose a novel approach to quantify it. We evaluate the quantification of fidelity in adversarial settings on two neural networks. The study shows that involving the fidelity enables a neural network system to detect adversarial examples with true positive rate 97.7%, and false positive rate 1.67% on a studied neural network. |
DOI | 10.1109/DSC47296.2019.8937572 |
Citation Key | yang_fidelity_2019 |