Visible to the public Biblio

Filters: Keyword is optical character recognition  [Clear All Filters]
Shao, Rulin, Shi, Zhouxing, Yi, Jinfeng, Chen, Pin-Yu, Hsieh, Cho-Jui.  2022.  Robust Text CAPTCHAs Using Adversarial Examples. 2022 IEEE International Conference on Big Data (Big Data). :1495–1504.
CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a widely used technology to distinguish real users and automated users such as bots. However, the advance of AI technologies weakens many CAPTCHA tests and can induce security concerns. In this paper, we propose a user-friendly text-based CAPTCHA generation method named Robust Text CAPTCHA (RTC). At the first stage, the foregrounds and backgrounds are constructed with font and background images respectively sampled from font and image libraries, and they are then synthesized into identifiable pseudo adversarial CAPTCHAs. At the second stage, we utilize a highly transferable adversarial attack designed for text CAPTCHAs to better obstruct CAPTCHA solvers. Our experiments cover comprehensive models including shallow models such as KNN, SVM and random forest, as well as various deep neural networks and OCR models. Experiments show that our CAPTCHAs have a failure rate lower than one millionth in general and high usability. They are also robust against various defensive techniques that attackers may employ, including adversarially trained CAPTCHA solvers and solvers trained with collected RTCs using manual annotation. Codes available at
Raut, Yash, Pote, Shreyash, Boricha, Harshank, Gunjgur, Prathmesh.  2022.  A Robust Captcha Scheme for Web Security. 2022 6th International Conference On Computing, Communication, Control And Automation (ICCUBEA. :1–6.
The internet has grown increasingly important in everyone's everyday lives due to the availability of numerous web services such as email, cloud storage, video streaming, music streaming, and search engines. On the other hand, attacks by computer programmes such as bots are a common hazard to these internet services. Captcha is a computer program that helps a server-side company determine whether or not a real user is requesting access. Captcha is a security feature that prevents unauthorised access to a user's account by protecting restricted areas from automated programmes, bots, or hackers. Many websites utilise Captcha to prevent spam and other hazardous assaults when visitors log in. However, in recent years, the complexity of Captcha solving has become difficult for humans too, making it less user friendly. To solve this, we propose creating a Captcha that is both simple and engaging for people while also robust enough to protect sensitive data from bots and hackers on the internet. The suggested captcha scheme employs animated artifacts, rotation, and variable fonts as resistance techniques. The proposed captcha technique proves successful against OCR bots with less than 15% accuracy while being easier to solve for human users with more than 98% accuracy.
ISSN: 2771-1358
Banday, M. T., Sheikh, S. A..  2020.  Improving Security Control of Text-Based CAPTCHA Challenges using Honeypot and Timestamping. 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC). :704—708.

The resistance to attacks aimed to break CAPTCHA challenges and the effectiveness, efficiency and satisfaction of human users in solving them called usability are the two major concerns while designing CAPTCHA schemes. User-friendliness, universality, and accessibility are related dimensions of usability, which must also be addressed adequately. With recent advances in segmentation and optical character recognition techniques, complex distortions, degradations and transformations are added to text-based CAPTCHA challenges resulting in their reduced usability. The extent of these deformations can be decreased if some additional security mechanism is incorporated in such challenges. This paper proposes an additional security mechanism that can add an extra layer of protection to any text-based CAPTCHA challenge, making it more challenging for bots and scripts that might be used to attack websites and web applications. It proposes the use of hidden text-boxes for user entry of CAPTCHA string which serves as honeypots for bots and automated scripts. The honeypot technique is used to trick bots and automated scripts into filling up input fields which legitimate human users cannot fill in. The paper reports implementation of honeypot technique and results of tests carried out over three months during which form submissions were logged for analysis. The results demonstrated great effectiveness of honeypots technique to improve security control and usability of text-based CAPTCHA challenges.

Dangiwa, Bello Ahmed, Kumar, Smitha S.  2018.  A Business Card Reader Application for iOS devices based on Tesseract. 2018 International Conference on Signal Processing and Information Security (ICSPIS). :1–4.
As the accessibility of high-resolution smartphone camera has increased and an improved computational speed, it is now convenient to build Business Card Readers on mobile phones. The project aims to design and develop a Business Card Reader (BCR) Application for iOS devices, using an open-source OCR Engine - Tesseract. The system accuracy was tested and evaluated using a dataset of 55 digital business cards obtained from an online repository. The accuracy result of the system was up to 74% in terms of both text recognition and data detection. A comparative analysis was carried out against a commercial business card reader application and our application performed vastly reasonable.
Stein, G., Peng, Q..  2018.  Low-Cost Breaking of a Unique Chinese Language CAPTCHA Using Curriculum Learning and Clustering. 2018 IEEE International Conference on Electro/Information Technology (EIT). :0595–0600.

Text-based CAPTCHAs are still commonly used to attempt to prevent automated access to web services. By displaying an image of distorted text, they attempt to create a challenge image that OCR software can not interpret correctly, but a human user can easily determine the correct response to. This work focuses on a CAPTCHA used by a popular Chinese language question-and-answer website and how resilient it is to modern machine learning methods. While the majority of text-based CAPTCHAs focus on transcription tasks, the CAPTCHA solved in this work is based on localization of inverted symbols in a distorted image. A convolutional neural network (CNN) was created to evaluate the likelihood of a region in the image belonging to an inverted character. It is used with a feature map and clustering to identify potential locations of inverted characters. Training of the CNN was performed using curriculum learning and compared to other potential training methods. The proposed method was able to determine the correct response in 95.2% of cases of a simulated CAPTCHA and 67.6% on a set of real CAPTCHAs. Potential methods to increase difficulty of the CAPTCHA and the success rate of the automated solver are considered.

Hassen, H., Khemakhem, M..  2014.  A secured distributed OCR system in a pervasive environment with authentication as a service in the Cloud. Multimedia Computing and Systems (ICMCS), 2014 International Conference on. :1200-1205.

In this paper we explore the potential for securing a distributed Arabic Optical Character Recognition (OCR) system via cloud computing technology in a pervasive and mobile environment. The goal of the system is to achieve full accuracy, high speed and security when taking into account large vocabularies and amounts of documents. This issue has been resolved by integrating the recognition process and the security issue with multiprocessing and distributed computing technologies.