Visible to the public Using Supervised Machine Learning Algorithms to Detect Suspicious URLs in Online Social Networks

TitleUsing Supervised Machine Learning Algorithms to Detect Suspicious URLs in Online Social Networks
Publication TypeConference Paper
Year of Publication2017
AuthorsAl-Janabi, Mohammed, Quincey, Ed de, Andras, Peter
Conference NameProceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4993-2
KeywordsHuman Behavior, malicious URLs, Metrics, phishing, pubcrawl, Random Forest, Scalability, spam, spam detection, Twitter
Abstract

The increasing volume of malicious content in social networks requires automated methods to detect and eliminate such content. This paper describes a supervised machine learning classification model that has been built to detect the distribution of malicious content in online social networks (ONSs). Multisource features have been used to detect social network posts that contain malicious Uniform Resource Locators (URLs). These URLs could direct users to websites that contain malicious content, drive-by download attacks, phishing, spam, and scams. For the data collection stage, the Twitter streaming application programming interface (API) was used and VirusTotal was used for labelling the dataset. A random forest classification model was used with a combination of features derived from a range of sources. The random forest model without any tuning and feature selection produced a recall value of 0.89. After further investigation and applying parameter tuning and feature selection methods, however, we were able to improve the classifier performance to 0.92 in recall.

URLhttp://doi.acm.org/10.1145/3110025.3116201
DOI10.1145/3110025.3116201
Citation Keyal-janabi_using_2017