Phishing URL Detection via CNN and Attention-Based Hierarchical RNN

Submitted by aekwall on Mon, 01/20/2020 - 12:13pm

Title	Phishing URL Detection via CNN and Attention-Based Hierarchical RNN
Publication Type	Conference Paper
Year of Publication	2019
Authors	Huang, Yongjie, Yang, Qiping, Qin, Jinghui, Wen, Wushao
Conference Name	2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE)
Date Published	aug
Keywords	attention-based hierarchical recurrent neural network module, attention-based hierarchical RNN, blacklisting, character-level spatial feature representations, Computer crime, convolutional neural nets, convolutional neural network module, cyber security, Deep Learning, deep learning-based approach, Electronic mail, feature engineering, feature extraction, Fuses, human factors, Internet, learning (artificial intelligence), machine learning, pattern classification, phishing, Phishing Detection, phishing uniform resource locators, phishing URL classifier, phishing URL detection, phishing Websites, policy-based governance, pubcrawl, recurrent neural nets, Resiliency, Scalability, three-layer CNN, Uniform resource locators, unsolicited e-mail, Web sites, word-level temporal feature representations, zero trust, zero-day phishing attacks
Abstract	Phishing websites have long been a serious threat to cyber security. For decades, many researchers have been devoted to developing novel techniques to detect phishing websites automatically. While state-of-the-art solutions can achieve superior performances, they require substantial manual feature engineering and are not adept at detecting newly emerging phishing attacks. Therefore, developing techniques that can detect phishing websites automatically and handle zero-day phishing attacks swiftly is still an open challenge in this area. In this work, we propose PhishingNet, a deep learning-based approach for timely detection of phishing Uniform Resource Locators (URLs). Specifically, we use a Convolutional Neural Network (CNN) module to extract character-level spatial feature representations of URLs; meanwhile, we employ an attention-based hierarchical Recurrent Neural Network(RNN) module to extract word-level temporal feature representations of URLs. We then fuse these feature representations via a three-layer CNN to build accurate feature representations of URLs, on which we train a phishing URL classifier. Extensive experiments on a verified dataset collected from the Internet demonstrate that the feature representations extracted automatically are conducive to the improvement of the generalization ability of our approach on newly emerging URLs, which makes our approach achieve competitive performance against other state-of-the-art approaches.
DOI	10.1109/TrustCom/BigDataSE.2019.00024
Citation Key	huang_phishing_2019

Groups:

Science of Security VO