Visible to the public Biblio

Filters: Keyword is word processing  [Clear All Filters]
2021-03-18
Bi, X., Liu, X..  2020.  Chinese Character Captcha Sequential Selection System Based on Convolutional Neural Network. 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL). :554—559.

To ensure security, Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is widely used in people's online lives. This paper presents a Chinese character captcha sequential selection system based on convolutional neural network (CNN). Captchas composed of English and digits can already be identified with extremely high accuracy, but Chinese character captcha recognition is still challenging. The task we need to complete is to identify Chinese characters with different colors and different fonts that are not on a straight line with rotation and affine transformation on pictures with complex backgrounds, and then perform word order restoration on the identified Chinese characters. We divide the task into several sub-processes: Chinese character detection based on Faster R-CNN, Chinese character recognition and word order recovery based on N-Gram. In the Chinese character recognition sub-process, we have made outstanding contributions. We constructed a single Chinese character data set and built a 10-layer convolutional neural network. Eventually we achieved an accuracy of 98.43%, and completed the task perfectly.

2020-11-02
Pan, C., Huang, J., Gong, J., Yuan, X..  2019.  Few-Shot Transfer Learning for Text Classification With Lightweight Word Embedding Based Models. IEEE Access. 7:53296–53304.
Many deep learning architectures have been employed to model the semantic compositionality for text sequences, requiring a huge amount of supervised data for parameters training, making it unfeasible in situations where numerous annotated samples are not available or even do not exist. Different from data-hungry deep models, lightweight word embedding-based models could represent text sequences in a plug-and-play way due to their parameter-free property. In this paper, a modified hierarchical pooling strategy over pre-trained word embeddings is proposed for text classification in a few-shot transfer learning way. The model leverages and transfers knowledge obtained from some source domains to recognize and classify the unseen text sequences with just a handful of support examples in the target problem domain. The extensive experiments on five datasets including both English and Chinese text demonstrate that the simple word embedding-based models (SWEMs) with parameter-free pooling operations are able to abstract and represent the semantic text. The proposed modified hierarchical pooling method exhibits significant classification performance in the few-shot transfer learning tasks compared with other alternative methods.
2018-02-27
Bours, P., Brahmanpally, S..  2017.  Language Dependent Challenge-Based Keystroke Dynamics. 2017 International Carnahan Conference on Security Technology (ICCST). :1–6.

Keystroke Dynamics can be used as an unobtrusive method to enhance password authentication, by checking the typing rhythm of the user. Fixed passwords will give an attacker the possibility to try to learn to mimic the typing behaviour of a victim. In this paper we will investigate the performance of a keystroke dynamic (KD) system when the users have to type given (English) words. Under the assumption that it is easy to type words in your native language and difficult in a foreign language will we also test the performance of such a challenge-based KD system when the challenges are not common English words, but words in the native language of the user. We collected data from participants with 6 different native language backgrounds and had them type random 8-12 character words in each of the 6 languages. The participants also typed random English words and random French words. English was assumed to be a language familiar to all participants, while French was not a native language to any participant and most likely most participants were not fluent in French. Analysis showed that using language dependent words gave a better performance of the challenge-based KD compared to an all English challenge-based system. When using words in a native language, then the performance of the participants with their mother-tongue equal to that native language had a similar performance compared to the all English challenge-based system, but the non-native speakers had an FMR that was significantly lower than the native language speakers. We found that native Telugu speakers had an FMR of less than 1% when writing Spanish or Slovak words. We also found that duration features were best to recognize genuine users, but latency features performed best to recognize non-native impostor users.

2017-12-12
Zhou, G., Huang, J. X..  2017.  Modeling and Learning Distributed Word Representation with Metadata for Question Retrieval. IEEE Transactions on Knowledge and Data Engineering. 29:1226–1239.

Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.

2015-05-05
Babour, A., Khan, J.I..  2014.  Tweet Sentiment Analytics with Context Sensitive Tone-Word Lexicon. Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. 1:392-399.

In this paper we propose a twitter sentiment analytics that mines for opinion polarity about a given topic. Most of current semantic sentiment analytics depends on polarity lexicons. However, many key tone words are frequently bipolar. In this paper we demonstrate a technique which can accommodate the bipolarity of tone words by context sensitive tone lexicon learning mechanism where the context is modeled by the semantic neighborhood of the main target. Performance analysis shows that ability to contextualize the tone word polarity significantly improves the accuracy.