Visible to the public Fidelity: Towards Measuring the Trustworthiness of Neural Network Classification

TitleFidelity: Towards Measuring the Trustworthiness of Neural Network Classification
Publication TypeConference Paper
Year of Publication2019
AuthorsYang, Z.
Conference Name2019 IEEE Conference on Dependable and Secure Computing (DSC)
Keywordsadversarial attack detection, adversarial examples, adversarial settings, composability, Computational modeling, learning (artificial intelligence), machine learning, machine learning model, neural nets, neural network classification, neural network system, Neural networks, pattern classification, Perturbation methods, pubcrawl, security of data, security-critical tasks, Sociology, Statistics, Task Analysis, Trusted Computing, trustworthiness
AbstractWith the increasing performance of neural networks on many security-critical tasks, the security concerns of machine learning have become increasingly prominent. Recent studies have shown that neural networks are vulnerable to adversarial examples: carefully crafted inputs with negligible perturbations on legitimate samples could mislead a neural network to produce adversary-selected outputs while humans can still correctly classify them. Therefore, we need an additional measurement on the trustworthiness of the results of a machine learning model, especially in adversarial settings. In this paper, we analyse the root cause of adversarial examples, and propose a new property, namely fidelity, of machine learning models to describe the gap between what a model learns and the ground truth learned by humans. One of its benefits is detecting adversarial attacks. We formally define fidelity, and propose a novel approach to quantify it. We evaluate the quantification of fidelity in adversarial settings on two neural networks. The study shows that involving the fidelity enables a neural network system to detect adversarial examples with true positive rate 97.7%, and false positive rate 1.67% on a studied neural network.
DOI10.1109/DSC47296.2019.8937572
Citation Keyyang_fidelity_2019