Automated Threat Report Classification over Multi-Source Data
Title | Automated Threat Report Classification over Multi-Source Data |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Ayoade, G., Chandra, S., Khan, L., Hamlen, K., Thuraisingham, B. |
Conference Name | 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC) |
Date Published | oct |
Keywords | advanced persistent threats, automated threat report classification, bias correction, business data processing, Collaboration, command and control systems, data mining, defense systems, document handling, enterprise system defenders, feature extraction, Human Behavior, learning (artificial intelligence), machine learning model, Metrics, multisource data, natural language processing, natural language processing techniques, NLP, Organizations, pattern classification, pubcrawl, Resiliency, Scalability, security, security of data, Standards organizations, Threat report, threat report documents, Training |
Abstract | With an increase in targeted attacks such as advanced persistent threats (APTs), enterprise system defenders require comprehensive frameworks that allow them to collaborate and evaluate their defense systems against such attacks. MITRE has developed a framework which includes a database of different kill-chains, tactics, techniques, and procedures that attackers employ to perform these attacks. In this work, we leverage natural language processing techniques to extract attacker actions from threat report documents generated by different organizations and automatically classify them into standardized tactics and techniques, while providing relevant mitigation advisories for each attack. A naive method to achieve this is by training a machine learning model to predict labels that associate the reports with relevant categories. In practice, however, sufficient labeled data for model training is not always readily available, so that training and test data come from different sources, resulting in bias. A naive model would typically underperform in such a situation. We address this major challenge by incorporating an importance weighting scheme called bias correction that efficiently utilizes available labeled data, given threat reports, whose categories are to be automatically predicted. We empirically evaluated our approach on 18,257 real-world threat reports generated between year 2000 and 2018 from various computer security organizations to demonstrate its superiority by comparing its performance with an existing approach. |
URL | https://ieeexplore.ieee.org/document/8537838 |
DOI | 10.1109/CIC.2018.00040 |
Citation Key | ayoade_automated_2018 |
- multisource data
- Training
- threat report documents
- Threat report
- Standards organizations
- security of data
- security
- Scalability
- Resiliency
- pubcrawl
- pattern classification
- Organizations
- NLP
- natural language processing techniques
- natural language processing
- advanced persistent threats
- Metrics
- machine learning model
- learning (artificial intelligence)
- Human behavior
- feature extraction
- enterprise system defenders
- document handling
- defense systems
- Data mining
- command and control systems
- collaboration
- business data processing
- bias correction
- automated threat report classification