Random forest explorations for URL classification

Submitted by grigby1 on Wed, 12/20/2017 - 1:08pm

Title	Random forest explorations for URL classification
Publication Type	Conference Paper
Year of Publication	2017
Authors	Weedon, M., Tsaptsinos, D., Denholm-Price, J.
Conference Name	2017 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA)
Date Published	jun
Publisher	IEEE
ISBN Number	78-1-5090-5060-4
Keywords	blacklisting, Classification algorithms, common defence users, Computer crime, feature extraction, Human Behavior, human factors, Internet, learning (artificial intelligence), machine learning algorithms, pattern classification, phishing, phishing Websites, pubcrawl, Radio frequency, Random forest explorations, Testing, Training, Uniform resource locators, unsolicited e-mail, URL classification, Web sites
Abstract	Phishing is a major concern on the Internet today and many users are falling victim because of criminal's deceitful tactics. Blacklisting is still the most common defence users have against such phishing websites, but is failing to cope with the increasing number. In recent years, researchers have devised modern ways of detecting such websites using machine learning. One such method is to create machine learnt models of URL features to classify whether URLs are phishing. However, there are varying opinions on what the best approach is for features and algorithms. In this paper, the objective is to evaluate the performance of the Random Forest algorithm using a lexical only dataset. The performance is benchmarked against other machine learning algorithms and additionally against those reported in the literature. Initial results from experiments indicate that the Random Forest algorithm performs the best yielding an 86.9% accuracy.
URL	http://ieeexplore.ieee.org/document/8073403/
DOI	10.1109/CyberSA.2017.8073403
Citation Key	weedon_random_2017

Groups:

Science of Security VO