Combining convolutional neural network and self-adaptive algorithm to defeat synthetic multi-digit text-based CAPTCHA

Submitted by grigby1 on Wed, 12/20/2017 - 1:03pm

Title	Combining convolutional neural network and self-adaptive algorithm to defeat synthetic multi-digit text-based CAPTCHA
Publication Type	Conference Paper
Year of Publication	2017
Authors	Wang, Y., Huang, Y., Zheng, W., Zhou, Z., Liu, D., Lu, M.
Conference Name	2017 IEEE International Conference on Industrial Technology (ICIT)
Keywords	add-on recognition part, Business, CAPTCHA, CAPTCHA segmentations, captchas, character recognition, character segmentation, China, clustering, Clustering algorithms, composability, convolution, convolutional neural network, data entry, Human Behavior, human beings, human factors, image segmentation, neural nets, Neural networks, Optical character recognition software, pubcrawl, reverse Turing test, Segmentation, self-adaptive algorithm, synthetic multidigit text-based CAPTCHA, text analysis, text-based scheme
Abstract	We always use CAPTCHA(Completely Automated Public Turing test to Tell Computers and Humans Apart) to prevent automated bot for data entry. Although there are various kinds of CAPTCHAs, text-based scheme is still applied most widely, because it is one of the most convenient and user-friendly way for daily user [1]. The fact is that segmentations of different types of CAPTCHAs are not always the same, which means one of CAPTCHA's bottleneck is the segmentation. Once we could accurately split the character, the problem could be solved much easier. Unfortunately, the best way to divide them is still case by case, which is to say there is no universal way to achieve it. In this paper, we present a novel algorithm to achieve state-of-the-art performance, what was more, we also constructed a new convolutional neural network as an add-on recognition part to stabilize our state-of-the-art performance of the whole CAPTCHA system. The CAPTCHA datasets we are using is from the State Administration for Industry& Commerce of the People's Republic of China. In this datasets, there are totally 33 entrances of CAPTCHAs. In this experiments, we assume that each of the entrance is known. Results are provided showing how our algorithms work well towards these CAPTCHAs.
DOI	10.1109/ICIT.2017.7915494
Citation Key	wang_combining_2017

Groups:

Science of Security VO