Visible to the public SaTC: CORE: Small: Understanding, Measuring, and Defending against Malicious Web CrawlersConflict Detection Enabled

Project Details

Performance Period

Sep 01, 2018 - Aug 31, 2021

Institution(s)

SUNY at Stony Brook

Award Number


Given the constant expansion of the web, search engines rely on automated web crawlers to automatically discover new web pages and index them. Next to search engines, many different industries rely on web crawlers, ranging from security-related crawlers that find abusive pages, to crawlers that take snapshots of content in order to show previews of pages on social networks. At the same time, attackers are utilizing malicious crawlers to automatically find and exploit vulnerabilities on websites, to scrape content and email addresses, and to brute-force login forms. This project focuses on better understanding malicious web crawlers, gathering data about their activity online, and developing defensive systems that can differentiate between benign and malicious web crawlers.

The project seeks to understand, measure, and defend against malicious web crawlers through a multi-pronged approach. First, the project proposes the development of honeypot-like infrastructure for collecting information on existing benign and malicious crawlers. This information is used to track the most abusive crawlers and offer statistics about crawler activity on the web. Second, the project includes the design of tools and techniques for differentiating between real browsing users and malicious crawlers that pretend to be real users. Third, the project proposes the design, development, and evaluation of technologies for real-time detection of web crawlers and for defending against them. Last, the project includes the design of new crawling protocols that allow legitimate crawlers to work unhindered while severely restricting the crawling abilities of malicious crawlers. The outcomes of this research effort are expected to improve the understanding of malicious crawler activity on the web and to achieve substantial practical impact in protecting benign websites against malicious crawlers. Moreover, by improving the detection of malicious crawlers that compromise websites and exfiltrate user data, the project improves the security of all web users.