Biblio
Machine-based tracking is a type of behavior that extracts information on a user's machine, which can then be used for fingerprinting, tracking, or profiling purposes. In this paper, we focus on JavaScript-oriented machine-based tracking as JavaScript is widely accessible in all browsers. We find that coarse features related to JavaScript access, cookie access, and URL length subdomain information can perform well in creating a classifier that can identify these machine-based trackers with 97.7% accuracy. We then use the classifier on real-world datasets based on 30-minute website crawls of different types of websites – including websites that target children and websites that target a popular audience – and find 85%+ of all websites utilize machine-based tracking, even when they target a regulated group (children) as their primary audience.
Most anti-phishing solutions that exist today require scanning a large portion of the web, which is vast and equivalent to finding a needle in a haystack. In addition, such solutions are not very efficient. We propose a different approach. Our solution does not rely on the scanning of the entire Internet or a large portion of it and only needs access to the brand's traffic in order to be able to detect phishing attempts against that brand. By analyzing a sample of phishing websites, we find features that can be used to distinguish phishing websites from the legitimate ones. We then use these features to train a machine learning classifier capable of helping brands detect phishing attempts against them. Our approach can detect up to 86% of phishing attacks against the brands and is best used as a complementary tool to the existing anti-phishing solutions.