Using Supervised Machine Learning Algorithms to Detect Suspicious URLs in Online Social Networks
Title | Using Supervised Machine Learning Algorithms to Detect Suspicious URLs in Online Social Networks |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Al-Janabi, Mohammed, Quincey, Ed de, Andras, Peter |
Conference Name | Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4993-2 |
Keywords | Human Behavior, malicious URLs, Metrics, phishing, pubcrawl, Random Forest, Scalability, spam, spam detection, Twitter |
Abstract | The increasing volume of malicious content in social networks requires automated methods to detect and eliminate such content. This paper describes a supervised machine learning classification model that has been built to detect the distribution of malicious content in online social networks (ONSs). Multisource features have been used to detect social network posts that contain malicious Uniform Resource Locators (URLs). These URLs could direct users to websites that contain malicious content, drive-by download attacks, phishing, spam, and scams. For the data collection stage, the Twitter streaming application programming interface (API) was used and VirusTotal was used for labelling the dataset. A random forest classification model was used with a combination of features derived from a range of sources. The random forest model without any tuning and feature selection produced a recall value of 0.89. After further investigation and applying parameter tuning and feature selection methods, however, we were able to improve the classifier performance to 0.92 in recall. |
URL | http://doi.acm.org/10.1145/3110025.3116201 |
DOI | 10.1145/3110025.3116201 |
Citation Key | al-janabi_using_2017 |