Black Box Attacks on Explainable Artificial Intelligence(XAI) methods in Cyber Security
Title | Black Box Attacks on Explainable Artificial Intelligence(XAI) methods in Cyber Security |
Publication Type | Conference Paper |
Year of Publication | 2020 |
Authors | Kuppa, A., Le-Khac, N.-A. |
Conference Name | 2020 International Joint Conference on Neural Networks (IJCNN) |
Keywords | adversarial attack, Analytical models, artificial intelligence, artificial intelligence security, binary output, black box attack, Black Box Attacks, black box encryption, black box settings, black-box models, composability, computer security, cyber security, cybersecurity domain, Data analysis, Data models, Deep Learning, domain experts, exact properties, explainable artificial intelligence, explainable artificial intelligence methods, gradient-based XAI, learning (artificial intelligence), Metrics, ML models, Predictive models, predictive security metrics, privacy, pubcrawl, Resiliency, Robustness, Scalability, security, security domain, security of data, security-relevant data-sets, threat models, white box, White Box Security, white box setting, xai, XAI methods |
Abstract | Cybersecurity community is slowly leveraging Machine Learning (ML) to combat ever evolving threats. One of the biggest drivers for successful adoption of these models is how well domain experts and users are able to understand and trust their functionality. As these black-box models are being employed to make important predictions, the demand for transparency and explainability is increasing from the stakeholders.Explanations supporting the output of ML models are crucial in cyber security, where experts require far more information from the model than a simple binary output for their analysis. Recent approaches in the literature have focused on three different areas: (a) creating and improving explainability methods which help users better understand the internal workings of ML models and their outputs; (b) attacks on interpreters in white box setting; (c) defining the exact properties and metrics of the explanations generated by models. However, they have not covered, the security properties and threat models relevant to cybersecurity domain, and attacks on explainable models in black box settings.In this paper, we bridge this gap by proposing a taxonomy for Explainable Artificial Intelligence (XAI) methods, covering various security properties and threat models relevant to cyber security domain. We design a novel black box attack for analyzing the consistency, correctness and confidence security properties of gradient based XAI methods. We validate our proposed system on 3 security-relevant data-sets and models, and demonstrate that the method achieves attacker's goal of misleading both the classifier and explanation report and, only explainability method without affecting the classifier output. Our evaluation of the proposed approach shows promising results and can help in designing secure and robust XAI methods. |
DOI | 10.1109/IJCNN48605.2020.9206780 |
Citation Key | kuppa_black_2020 |
- security
- gradient-based XAI
- learning (artificial intelligence)
- Metrics
- ML models
- Predictive models
- privacy
- pubcrawl
- Resiliency
- Robustness
- Scalability
- explainable artificial intelligence methods
- security domain
- security of data
- security-relevant data-sets
- threat models
- white box
- White Box Security
- white box setting
- xai
- XAI methods
- composability
- adversarial attack
- Analytical models
- Artificial Intelligence
- artificial intelligence security
- binary output
- black box attack
- Black Box Attacks
- black box encryption
- black box settings
- black-box models
- predictive security metrics
- computer security
- cyber security
- cybersecurity domain
- data analysis
- Data models
- deep learning
- domain experts
- exact properties
- explainable artificial intelligence