Visible to the public Biblio

Filters: Keyword is image recognition  [Clear All Filters]
2021-07-02
Lehman, Sarah M., Alrumayh, Abrar S., Ling, Haibin, Tan, Chiu C..  2020.  Stealthy Privacy Attacks Against Mobile AR Apps. 2020 IEEE Conference on Communications and Network Security (CNS). :1—5.
The proliferation of mobile augmented reality applications and the toolkits to create them have serious implications for user privacy. In this paper, we explore how malicious AR app developers can leverage capabilities offered by commercially available AR libraries, and describe how edge computing can be used to address this privacy problem.
2021-06-24
Lee, Dongseop, Kim, Hyunjin, Ryou, Jaecheol.  2020.  Poisoning Attack on Show and Tell Model and Defense Using Autoencoder in Electric Factory. 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). :538–541.
Recently, deep neural network technology has been developed and used in various fields. The image recognition model can be used for automatic safety checks at the electric factory. However, as the deep neural network develops, the importance of security increases. A poisoning attack is one of security problems. It is an attack that breaks down by entering malicious data into the training data set of the model. This paper generates adversarial data that modulates feature values to different targets by manipulating less RGB values. Then, poisoning attacks in one of the image recognition models, the show and tell model. Then use autoencoder to defend adversarial data.
2021-04-27
Marchisio, A., Nanfa, G., Khalid, F., Hanif, M. A., Martina, M., Shafique, M..  2020.  Is Spiking Secure? A Comparative Study on the Security Vulnerabilities of Spiking and Deep Neural Networks 2020 International Joint Conference on Neural Networks (IJCNN). :1–8.
Spiking Neural Networks (SNNs) claim to present many advantages in terms of biological plausibility and energy efficiency compared to standard Deep Neural Networks (DNNs). Recent works have shown that DNNs are vulnerable to adversarial attacks, i.e., small perturbations added to the input data can lead to targeted or random misclassifications. In this paper, we aim at investigating the key research question: "Are SNNs secure?" Towards this, we perform a comparative study of the security vulnerabilities in SNNs and DNNs w.r.t. the adversarial noise. Afterwards, we propose a novel black-box attack methodology, i.e., without the knowledge of the internal structure of the SNN, which employs a greedy heuristic to automatically generate imperceptible and robust adversarial examples (i.e., attack images) for the given SNN. We perform an in-depth evaluation for a Spiking Deep Belief Network (SDBN) and a DNN having the same number of layers and neurons (to obtain a fair comparison), in order to study the efficiency of our methodology and to understand the differences between SNNs and DNNs w.r.t. the adversarial examples. Our work opens new avenues of research towards the robustness of the SNNs, considering their similarities to the human brain's functionality.
2021-03-29
Begaj, S., Topal, A. O., Ali, M..  2020.  Emotion Recognition Based on Facial Expressions Using Convolutional Neural Network (CNN). 2020 International Conference on Computing, Networking, Telecommunications Engineering Sciences Applications (CoNTESA). :58—63.

Over the last few years, there has been an increasing number of studies about facial emotion recognition because of the importance and the impact that it has in the interaction of humans with computers. With the growing number of challenging datasets, the application of deep learning techniques have all become necessary. In this paper, we study the challenges of Emotion Recognition Datasets and we also try different parameters and architectures of the Conventional Neural Networks (CNNs) in order to detect the seven emotions in human faces, such as: anger, fear, disgust, contempt, happiness, sadness and surprise. We have chosen iCV MEFED (Multi-Emotion Facial Expression Dataset) as the main dataset for our study, which is relatively new, interesting and very challenging.

Jia, C., Li, C. L., Ying, Z..  2020.  Facial expression recognition based on the ensemble learning of CNNs. 2020 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC). :1—5.

As a part of body language, facial expression is a psychological state that reflects the current emotional state of the person. Recognition of facial expressions can help to understand others and enhance communication with others. We propose a facial expression recognition method based on convolutional neural network ensemble learning in this paper. Our model is composed of three sub-networks, and uses the SVM classifier to Integrate the output of the three networks to get the final result. The recognition accuracy of the model's expression on the FER2013 dataset reached 71.27%. The results show that the method has high test accuracy and short prediction time, and can realize real-time, high-performance facial recognition.

Xu, X., Ruan, Z., Yang, L..  2020.  Facial Expression Recognition Based on Graph Neural Network. 2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC). :211—214.

Facial expressions are one of the most powerful, natural and immediate means for human being to present their emotions and intensions. In this paper, we present a novel method for fully automatic facial expression recognition. The facial landmarks are detected for characterizing facial expressions. A graph convolutional neural network is proposed for feature extraction and facial expression recognition classification. The experiments were performed on the three facial expression databases. The result shows that the proposed FER method can achieve good recognition accuracy up to 95.85% using the proposed method.

2021-02-08
Wang, R., Li, L., Hong, W., Yang, N..  2009.  A THz Image Edge Detection Method Based on Wavelet and Neural Network. 2009 Ninth International Conference on Hybrid Intelligent Systems. 3:420—424.

A THz image edge detection approach based on wavelet and neural network is proposed in this paper. First, the source image is decomposed by wavelet, the edges in the low-frequency sub-image are detected using neural network method and the edges in the high-frequency sub-images are detected using wavelet transform method on the coarsest level of the wavelet decomposition, the two edge images are fused according to some fusion rules to obtain the edge image of this level, it then is projected to the next level. Afterwards the final edge image of L-1 level is got according to some fusion rule. This process is repeated until reaching the 0 level thus to get the final integrated and clear edge image. The experimental results show that our approach based on fusion technique is superior to Canny operator method and wavelet transform method alone.

2020-12-28
Raju, R. S., Lipasti, M..  2020.  BlurNet: Defense by Filtering the Feature Maps. 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). :38—46.

Recently, the field of adversarial machine learning has been garnering attention by showing that state-of-the-art deep neural networks are vulnerable to adversarial examples, stemming from small perturbations being added to the input image. Adversarial examples are generated by a malicious adversary by obtaining access to the model parameters, such as gradient information, to alter the input or by attacking a substitute model and transferring those malicious examples over to attack the victim model. Specifically, one of these attack algorithms, Robust Physical Perturbations (RP2), generates adversarial images of stop signs with black and white stickers to achieve high targeted misclassification rates against standard-architecture traffic sign classifiers. In this paper, we propose BlurNet, a defense against the RP2 attack. First, we motivate the defense with a frequency analysis of the first layer feature maps of the network on the LISA dataset, which shows that high frequency noise is introduced into the input image by the RP2 algorithm. To remove the high frequency noise, we introduce a depthwise convolution layer of standard blur kernels after the first layer. We perform a blackbox transfer attack to show that low-pass filtering the feature maps is more beneficial than filtering the input. We then present various regularization schemes to incorporate this lowpass filtering behavior into the training regime of the network and perform white-box attacks. We conclude with an adaptive attack evaluation to show that the success rate of the attack drops from 90% to 20% with total variation regularization, one of the proposed defenses.

2020-12-07
Li, Y., Zhang, T., Han, X., Qi, Y..  2018.  Image Style Transfer in Deep Learning Networks. 2018 5th International Conference on Systems and Informatics (ICSAI). :660–664.

Since Gatys et al. proved that the convolution neural network (CNN) can be used to generate new images with artistic styles by separating and recombining the styles and contents of images. Neural Style Transfer has attracted wide attention of computer vision researchers. This paper aims to provide an overview of the style transfer application deep learning network development process, and introduces the classical style migration model, on the basis of the research on the migration of style of the deep learning network for collecting and organizing, and put forward related to gathered during the investigation of the problem solution, finally some classical model in the image style to display and compare the results of migration.

2020-11-04
Khalid, F., Hanif, M. A., Rehman, S., Ahmed, R., Shafique, M..  2019.  TrISec: Training Data-Unaware Imperceptible Security Attacks on Deep Neural Networks. 2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS). :188—193.

Most of the data manipulation attacks on deep neural networks (DNNs) during the training stage introduce a perceptible noise that can be catered by preprocessing during inference, or can be identified during the validation phase. There-fore, data poisoning attacks during inference (e.g., adversarial attacks) are becoming more popular. However, many of them do not consider the imperceptibility factor in their optimization algorithms, and can be detected by correlation and structural similarity analysis, or noticeable (e.g., by humans) in multi-level security system. Moreover, majority of the inference attack rely on some knowledge about the training dataset. In this paper, we propose a novel methodology which automatically generates imperceptible attack images by using the back-propagation algorithm on pre-trained DNNs, without requiring any information about the training dataset (i.e., completely training data-unaware). We present a case study on traffic sign detection using the VGGNet trained on the German Traffic Sign Recognition Benchmarks dataset in an autonomous driving use case. Our results demonstrate that the generated attack images successfully perform misclassification while remaining imperceptible in both “subjective” and “objective” quality tests.

2020-10-26
Clincy, Victor, Shahriar, Hossain.  2019.  IoT Malware Analysis. 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 1:920–921.
IoT devices can be used to fulfil many of our daily tasks. IoT could be wearable devices, home appliances, or even light bulbs. With the introduction of this new technology, however, vulnerabilities are being introduced and can be leveraged or exploited by malicious users. One common vehicle of exploitation is malicious software, or malware. Malware can be extremely harmful and compromise the confidentiality, integrity and availability (CIA triad) of information systems. This paper analyzes the types of malware attacks, introduce some mitigation approaches and discusses future challenges.
2020-10-05
Cruz, Rodrigo Santa, Fernando, Basura, Cherian, Anoop, Gould, Stephen.  2018.  Neural Algebra of Classifiers. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). :729—737.

The world is fundamentally compositional, so it is natural to think of visual recognition as the recognition of basic visually primitives that are composed according to well-defined rules. This strategy allows us to recognize unseen complex concepts from simple visual primitives. However, the current trend in visual recognition follows a data greedy approach where huge amounts of data are required to learn models for any desired visual concept. In this paper, we build on the compositionality principle and develop an "algebra" to compose classifiers for complex visual concepts. To this end, we learn neural network modules to perform boolean algebra operations on simple visual classifiers. Since these modules form a complete functional set, a classifier for any complex visual concept defined as a boolean expression of primitives can be obtained by recursively applying the learned modules, even if we do not have a single training sample. As our experiments show, using such a framework, we can compose classifiers for complex visual concepts outperforming standard baselines on two well-known visual recognition benchmarks. Finally, we present a qualitative analysis of our method and its properties.

2020-09-11
Azakami, Tomoka, Shibata, Chihiro, Uda, Ryuya, Kinoshita, Toshiyuki.  2019.  Creation of Adversarial Examples with Keeping High Visual Performance. 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT). :52—56.
The accuracy of the image classification by the convolutional neural network is exceeding the ability of human being and contributes to various fields. However, the improvement of the image recognition technology gives a great blow to security system with an image such as CAPTCHA. In particular, since the character string CAPTCHA has already added distortion and noise in order not to be read by the computer, it becomes a problem that the human readability is lowered. Adversarial examples is a technique to produce an image letting an image classification by the machine learning be wrong intentionally. The best feature of this technique is that when human beings compare the original image with the adversarial examples, they cannot understand the difference on appearance. However, Adversarial examples that is created with conventional FGSM cannot completely misclassify strong nonlinear networks like CNN. Osadchy et al. have researched to apply this adversarial examples to CAPTCHA and attempted to let CNN misclassify them. However, they could not let CNN misclassify character images. In this research, we propose a method to apply FGSM to the character string CAPTCHAs and to let CNN misclassified them.
Shu, Yujin, Xu, Yongjin.  2019.  End-to-End Captcha Recognition Using Deep CNN-RNN Network. 2019 IEEE 3rd Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). :54—58.
With the development of the Internet, the captcha technology has also been widely used. Captcha technology is used to distinguish between humans and machines, namely Completely Automated Public Turing test to tell Computers and Humans Apart. In this paper, an end-to-end deep CNN-RNN network model is constructed by studying the captcha recognition technology, which realizes the recognition of 4-character text captcha. The CNN-RNN model first constructs a deep residual convolutional neural network based on the residual network structure to accurately extract the input captcha picture features. Then, through the constructed variant RNN network, that is, the two-layer GRU network, the deep internal features of the captcha are extracted, and finally, the output sequence is the 4-character captcha. The experiments results show that the end-to-end deep CNN-RNN network model has a good performance on different captcha datasets, achieving 99% accuracy. And experiment on the few samples dataset which only has 4000 training samples also shows an accuracy of 72.9 % and a certain generalization ability.
2020-08-10
Kwon, Hyun, Yoon, Hyunsoo, Park, Ki-Woong.  2019.  Selective Poisoning Attack on Deep Neural Network to Induce Fine-Grained Recognition Error. 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE). :136–139.

Deep neural networks (DNNs) provide good performance for image recognition, speech recognition, and pattern recognition. However, a poisoning attack is a serious threat to DNN's security. The poisoning attack is a method to reduce the accuracy of DNN by adding malicious training data during DNN training process. In some situations such as a military, it may be necessary to drop only a chosen class of accuracy in the model. For example, if an attacker does not allow only nuclear facilities to be selectively recognized, it may be necessary to intentionally prevent UAV from correctly recognizing nuclear-related facilities. In this paper, we propose a selective poisoning attack that reduces the accuracy of only chosen class in the model. The proposed method reduces the accuracy of a chosen class in the model by training malicious training data corresponding to a chosen class, while maintaining the accuracy of the remaining classes. For experiment, we used tensorflow as a machine learning library and MNIST and CIFAR10 as datasets. Experimental results show that the proposed method can reduce the accuracy of the chosen class to 43.2% and 55.3% in MNIST and CIFAR10, while maintaining the accuracy of the remaining classes.

2020-08-03
Xiong, Chen, Chen, Hua, Cai, Ming, Gao, Jing.  2019.  A Vehicle Trajectory Adversary Model Based on VLPR Data. 2019 5th International Conference on Transportation Information and Safety (ICTIS). :903–912.
Although transport agency has employed desensitization techniques to deal with the privacy information when publicizing vehicle license plate recognition (VLPR) data, the adversaries can still eavesdrop on vehicle trajectories by certain means and further acquire the associated person and vehicle information through background knowledge. In this work, a privacy attacking method by using the desensitized VLPR data is proposed to link the vehicle trajectory. First the road average speed is evaluated by analyzing the changes of traffic flow, which is used to estimate the vehicle's travel time to the next VLPR system. Then the vehicle suspicion list is constructed through the time relevance of neighboring VLPR systems. Finally, since vehicles may have the same features like color, type, etc, the target trajectory will be located by filtering the suspected list by the rule of qualified identifier (QI) attributes and closest time method. Based on the Foshan City's VLPR data, the method is tested and results show that correct vehicle trajectory can be linked, which proves that the current VLPR data publication way has the risk of privacy disclosure. At last, the effects of related parameters on the proposed method are discussed and effective suggestions are made for publicizing VLPR date in the future.
Iula, Antonio, Micucci, Monica.  2019.  Palmprint recognition based on ultrasound imaging. 2019 42nd International Conference on Telecommunications and Signal Processing (TSP). :621–624.
Biometric recognition systems based on ultrasound images have been investigated for several decades, and nowadays ultrasonic fingerprint sensors are fully integrated in portable devices. Main advantage of the Ultrasound over other technologies are the possibility to collect 3D images, allowing to gain information on under-skin features, which improve recognition accuracy and resistance to spoofing. Also, ultrasound images are not sensible to several skin contaminations, humidity and not uniform ambient illumination. An ultrasound system, able to acquire 3D images of the human palm has been recently proposed. In this work, a recognition procedure based on 2D palmprint images collected with this system is proposed and evaluated through verification experiments carried out on a home made database composed of 141 samples collected from 24 users. Perspective of the proposed method by upgrading the recognition procedure to provide a 3D template able to accounts for palm lines' depth are finally highlighted and discussed.
2020-07-30
Perez, Claudio A., Estévez, Pablo A, Galdames, Francisco J., Schulz, Daniel A., Perez, Juan P., Bastías, Diego, Vilar, Daniel R..  2018.  Trademark Image Retrieval Using a Combination of Deep Convolutional Neural Networks. 2018 International Joint Conference on Neural Networks (IJCNN). :1—7.
Trademarks are recognizable images and/or words used to distinguish various products or services. They become associated with the reputation, innovation, quality, and warranty of the products. Countries around the world have offices for industrial/intellectual property (IP) registration. A new trademark image in application for registration should be distinct from all the registered trademarks. Due to the volume of trademark registration applications and the size of the databases containing existing trademarks, it is impossible for humans to make all the comparisons visually. Therefore, technological tools are essential for this task. In this work we use a pre-trained, publicly available Convolutional Neural Network (CNN) VGG19 that was trained on the ImageNet database. We adapted the VGG19 for the trademark image retrieval (TIR) task by fine tuning the network using two different databases. The VGG19v was trained with a database organized with trademark images using visual similarities, and the VGG19c was trained using trademarks organized by using conceptual similarities. The database for the VGG19v was built using trademarks downloaded from the WEB, and organized by visual similarity according to experts from the IP office. The database for the VGG19c was built using trademark images from the United States Patent and Trademarks Office and organized according to the Vienna conceptual protocol. The TIR was assessed using the normalized average rank for a test set from the METU database that has 922,926 trademark images. We computed the normalized average ranks for VGG19v, VGG19c, and for a combination of both networks. Our method achieved significantly better results on the METU database than those published previously.
2020-07-03
Dinama, Dima Maharika, A’yun, Qurrota, Syahroni, Achmad Dahlan, Adji Sulistijono, Indra, Risnumawan, Anhar.  2019.  Human Detection and Tracking on Surveillance Video Footage Using Convolutional Neural Networks. 2019 International Electronics Symposium (IES). :534—538.

Safety is one of basic human needs so we need a security system that able to prevent crime happens. Commonly, we use surveillance video to watch environment and human behaviour in a location. However, the surveillance video can only used to record images or videos with no additional information. Therefore we need more advanced camera to get another additional information such as human position and movement. This research were able to extract those information from surveillance video footage by using human detection and tracking algorithm. The human detection framework is based on Deep Learning Convolutional Neural Networks which is a very popular branch of artificial intelligence. For tracking algorithms, channel and spatial correlation filter is used to track detected human. This system will generate and export tracked movement on footage as an additional information. This tracked movement can be analysed furthermore for another research on surveillance video problems.

2020-06-22
Adesuyi, Tosin A., Kim, Byeong Man.  2019.  Preserving Privacy in Convolutional Neural Network: An ∊-tuple Differential Privacy Approach. 2019 IEEE 2nd International Conference on Knowledge Innovation and Invention (ICKII). :570–573.
Recent breakthrough in neural network has led to the birth of Convolutional neural network (CNN) which has been found to be very efficient especially in the areas of image recognition and classification. This success is traceable to the availability of large datasets and its capability to learn salient and complex data features which subsequently produce a reusable output model (Fθ). The Fθ are often made available (e.g. on cloud as-a-service) for others (client) to train their data or do transfer learning, however, an adversary can perpetrate a model inversion attack on the model Fθ to recover training data, hence compromising the sensitivity of the model buildup data. This is possible because CNN as a variant of deep neural network does memorize most of its training data during learning. Consequently, this has pose a privacy concern especially when a medical or financial data are used as model buildup data. Existing researches that proffers privacy preserving approach however suffer from significant accuracy degradation and this has left privacy preserving model on a theoretical desk. In this paper, we proposed an ϵ-tuple differential privacy approach that is based on neuron impact factor estimation to preserve privacy of CNN model without significant accuracy degradation. We experiment our approach on two large datasets and the result shows no significant accuracy degradation.
2020-06-19
Liu, Keng-Cheng, Hsu, Chen-Chien, Wang, Wei-Yen, Chiang, Hsin-Han.  2019.  Facial Expression Recognition Using Merged Convolution Neural Network. 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE). :296—298.

In this paper, a merged convolution neural network (MCNN) is proposed to improve the accuracy and robustness of real-time facial expression recognition (FER). Although there are many ways to improve the performance of facial expression recognition, a revamp of the training framework and image preprocessing renders better results in applications. When the camera is capturing images at high speed, however, changes in image characteristics may occur at certain moments due to the influence of light and other factors. Such changes can result in incorrect recognition of human facial expression. To solve this problem, we propose a statistical method for recognition results obtained from previous images, instead of using the current recognition output. Experimental results show that the proposed method can satisfactorily recognize seven basic facial expressions in real time.

2020-01-21
Suksomboon, Kalika, Shen, Zhishu, Ueda, Kazuaki, Tagami, Atsushi.  2019.  C2P2: Content-Centric Privacy Platform for Privacy-Preserving Monitoring Services. 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 1:252–261.
Motivated by ubiquitous surveillance cameras in a smart city, a monitoring service can be provided to citizens. However, the rise of privacy concerns may disrupt this advanced service. Yet, the existing cloud-based services have not clearly proven that they can preserve Wth-privacy in which the relationship of three types of information, i.e., who requests the service, what the target is and where the camera is, does not leak. We address this problem by proposing a content-centric privacy platform (C2P2) that enables the construction of a Wth-privacy-preserving monitoring service without cloud dependency. C2P2 uses an image classification model of a target serving as the key to access the monitoring service specific to the target. In C2P2, communication is based on information-centric networking (ICN) that enables privacy preservation to be centered on the content itself rather than relying on a centralized system. Moreover, to preserve the privacy of bystanders, C2P2 separates the sensitive information (e.g., human faces) from the non-sensitive information (e.g., image background), while the privacy-aware forwarding strategies in C2P2 enable data aggregation and prevent privacy leakage resulting from false positive of image recognition. We evaluate the privacy leakage of C2P2 compared to that of the cloud-based system. The privacy analysis shows that, compared to the cloud-based system, C2P2 achieves a lower privacy loss ratio while reducing the communication cost significantly.
2019-12-30
Liu, Keng-Cheng, Hsu, Chen-Chien, Wang, Wei-Yen, Chiang, Hsin-Han.  2019.  Real-Time Facial Expression Recognition Based on CNN. 2019 International Conference on System Science and Engineering (ICSSE). :120–123.
In this paper, we propose a method for improving the robustness of real-time facial expression recognition. Although there are many ways to improve the accuracy of facial expression recognition, a revamp of the training framework and image preprocessing allow better results in applications. One existing problem is that when the camera is capturing images in high speed, changes in image characteristics may occur at certain moments due to the influence of light and other factors. Such changes can result in incorrect recognition of the human facial expression. To solve this problem for smooth system operation and maintenance of recognition speed, we take changes in image characteristics at high speed capturing into account. The proposed method does not use the immediate output for reference, but refers to the previous image for averaging to facilitate recognition. In this way, we are able to reduce interference by the characteristics of the images. The experimental results show that after adopting this method, overall robustness and accuracy of facial expression recognition have been greatly improved compared to those obtained by only the convolution neural network (CNN).
2019-04-01
Hu, Y., Chen, L., Cheng, J..  2018.  A CAPTCHA recognition technology based on deep learning. 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA). :617–620.
Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) is an important human-machine distinction technology for website to prevent the automatic malicious program attack. CAPTCHA recognition studies can find security breaches in CAPTCHA, improve CAPTCHA technology, it can also promote the technologies of license plate recognition and handwriting recognition. This paper proposed a method based on Convolutional Neural Network (CNN) model to identify CAPTCHA and avoid the traditional image processing technology such as location and segmentation. The adaptive learning rate is introduced to accelerate the convergence rate of the model, and the problem of over-fitting and local optimal solution has been solved. The multi task joint training model is used to improve the accuracy and generalization ability of model recognition. The experimental results show that the model has a good recognition effect on CAPTCHA with background noise and character adhesion distortion.
Zhang, T., Zheng, H., Zhang, L..  2018.  Verification CAPTCHA Based on Deep Learning. 2018 37th Chinese Control Conference (CCC). :9056–9060.
At present, the captcha is widely used in the Internet. The method of captcha recognition using the convolutional neural networks was introduced in this paper. It was easier to apply the convolution neural network model of simple training to segment the captcha, and the network structure was established imitating VGGNet model. and the correct rate can be reached more than 90%. For the more difficult segmentation captcha, it can be used the end-to-end thought to the captcha as a whole to training, In this way, the recognition rate of the more difficult segmentation captcha can be reached about 85%.