Detecting Cyber Threats in Non-English Hacker Forums: An Adversarial Cross-Lingual Knowledge Transfer Approach
Title | Detecting Cyber Threats in Non-English Hacker Forums: An Adversarial Cross-Lingual Knowledge Transfer Approach |
Publication Type | Conference Paper |
Year of Publication | 2020 |
Authors | Ebrahimi, M., Samtani, S., Chai, Y., Chen, H. |
Conference Name | 2020 IEEE Security and Privacy Workshops (SPW) |
Date Published | May 2020 |
Publisher | IEEE |
ISBN Number | 978-1-7281-9346-5 |
Keywords | adversarial learning, Computer hacking, cross-lingual knowledge transfer, dark web, Generative Adversarial Learning, generative adversarial networks, hacker forums, Human Behavior, human factors, Knowledge engineering, knowledge transfer, Long short-term memory, machine learning algorithms, Predictive Metrics, privacy, pubcrawl, Resiliency, Scalability, Semantics |
Abstract | The regularity of devastating cyber-attacks has made cybersecurity a grand societal challenge. Many cybersecurity professionals are closely examining the international Dark Web to proactively pinpoint potential cyber threats. Despite its potential, the Dark Web contains hundreds of thousands of non-English posts. While machine translation is the prevailing approach to process non-English text, applying MT on hacker forum text results in mistranslations. In this study, we draw upon Long-Short Term Memory (LSTM), Cross-Lingual Knowledge Transfer (CLKT), and Generative Adversarial Networks (GANs) principles to design a novel Adversarial CLKT (A-CLKT) approach. A-CLKT operates on untranslated text to retain the original semantics of the language and leverages the collective knowledge about cyber threats across languages to create a language invariant representation without any manual feature engineering or external resources. Three experiments demonstrate how A-CLKT outperforms state-of-the-art machine learning, deep learning, and CLKT algorithms in identifying cyber-threats in French and Russian forums. |
URL | https://ieeexplore.ieee.org/document/9283883 |
DOI | 10.1109/SPW50608.2020.00021 |
Citation Key | ebrahimi_detecting_2020 |
- hacker forums
- Semantics
- pubcrawl
- privacy
- machine learning algorithms
- Long short-term memory
- knowledge transfer
- Knowledge engineering
- Human Factors
- Human behavior
- Generative Adversarial Learning
- generative adversarial networks
- dark web
- cross-lingual knowledge transfer
- Computer hacking
- adversarial learning
- Scalability
- Predictive Metrics
- Resiliency