Visible to the public Search Prevention with Captcha Against Web Indexing: A Proof of Concept

TitleSearch Prevention with Captcha Against Web Indexing: A Proof of Concept
Publication TypeConference Paper
Year of Publication2019
AuthorsKim, Donghoon, Sample, Luke
Conference Name2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC)
Date Publishedaug
KeywordsCAPTCHA, captcha version, captchas, composability, containing sensitive words, Crawlers, Google, Google search engine, Human Behavior, indexing, information retrieval, Internet, malicious web crawlers, pubcrawl, search engine, search engine bot, search engine database, search engines, search prevention, search prevention algorithm, security, security of data, Web crawler, web index, web indexing, Web pages, web-based captcha conversion tool, webpages
AbstractA website appears in search results based on web indexing conducted by a search engine bot (e.g., a web crawler). Some webpages do not want to be found easily because they include sensitive information. There are several methods to prevent web crawlers from indexing in search engine database. However, such webpages can still be indexed by malicious web crawlers. Through this study, we explore a paradox perspective on a new use of captchas for search prevention. Captchas are used to prevent web crawlers from indexing by converting sensitive words to captchas. We have implemented the web-based captcha conversion tool based on our search prevention algorithm. We also describe our proof of concept with the web-based chat application modified to utilize our algorithm. We have conducted the experiment to evaluate our idea on Google search engine with two versions of webpages, one containing plain text and another containing sensitive words converted to captchas. The experiment results show that the sensitive words on the captcha version of the webpages are unable to be found by Google's search engine, while the plain text versions are.
DOI10.1109/CSE/EUC.2019.00049
Citation Keykim_search_2019