A Generative Adversarial Learning Framework for Breaking Text-Based CAPTCHA in the Dark Web

Submitted by grigby1 on Fri, 01/15/2021 - 12:23pm

Title	A Generative Adversarial Learning Framework for Breaking Text-Based CAPTCHA in the Dark Web
Publication Type	Conference Paper
Year of Publication	2020
Authors	Zhang, N., Ebrahimi, M., Li, W., Chen, H.
Conference Name	2020 IEEE International Conference on Intelligence and Security Informatics (ISI)
Date Published	Nov. 2020
Publisher	IEEE
ISBN Number	978-1-7281-8800-3
Keywords	automated CAPTCHA breaking, captchas, cyber threat intelligence, dark web, Generative Adversarial Learning, generative adversarial networks, Human Behavior, human factors, Predictive Metrics, pubcrawl, Resiliency, Scalability
Abstract	Cyber threat intelligence (CTI) necessitates automated monitoring of dark web platforms (e.g., Dark Net Markets and carding shops) on a large scale. While there are existing methods for collecting data from the surface web, large-scale dark web data collection is commonly hindered by anti-crawling measures. Text-based CAPTCHA serves as the most prohibitive type of these measures. Text-based CAPTCHA requires the user to recognize a combination of hard-to-read characters. Dark web CAPTCHA patterns are intentionally designed to have additional background noise and variable character length to prevent automated CAPTCHA breaking. Existing CAPTCHA breaking methods cannot remedy these challenges and are therefore not applicable to the dark web. In this study, we propose a novel framework for breaking text-based CAPTCHA in the dark web. The proposed framework utilizes Generative Adversarial Network (GAN) to counteract dark web-specific background noise and leverages an enhanced character segmentation algorithm. Our proposed method was evaluated on both benchmark and dark web CAPTCHA testbeds. The proposed method significantly outperformed the state-of-the-art baseline methods on all datasets, achieving over 92.08% success rate on dark web testbeds. Our research enables the CTI community to develop advanced capabilities of large-scale dark web monitoring.
URL	https://ieeexplore.ieee.org/document/9280537
DOI	10.1109/ISI49825.2020.9280537
Citation Key	zhang_generative_2020

Groups:

Science of Security VO