Visible to the public Spear Phishing Emails Detection Based on Machine Learning

TitleSpear Phishing Emails Detection Based on Machine Learning
Publication TypeConference Paper
Year of Publication2021
AuthorsDing, Xiong, Liu, Baoxu, Jiang, Zhengwei, Wang, Qiuyun, Xin, Liling
Conference Name2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD)
Date Publishedmay
KeywordsCompanies, Conferences, feature extraction, Forwarding features, Human Behavior, interpolation, KMSMOTE, machine learning, machine learning algorithms, phishing, pubcrawl, Reputation features, spear phishing emails
AbstractSpear phishing emails target to specific individual or organization, they are more elaborated, targeted, and harmful than phishing emails. The attackers usually harvest information about the recipient in any available ways, then create a carefully camouflaged email and lure the recipient to perform dangerous actions. In this paper we present a new effective approach to detect spear phishing emails based on machine learning. Firstly we extracted 21 Stylometric features from email, 3 forwarding features from Email Forwarding Relationship Graph Database(EFRGD), and 3 reputation features from two third-party threat intelligence platforms, Virus Total(VT) and Phish Tank(PT). Then we made an improvement on Synthetic Minority Oversampling Technique(SMOTE) algorithm named KM-SMOTE to reduce the impact of unbalanced data. Finally we applied 4 machine learning algorithms to distinguish spear phishing emails from non-spear phishing emails. Our dataset consists of 417 spear phishing emails and 13916 non-spear phishing emails. We were able to achieve a maximum recall of 95.56%, precision of 98.85% and 97.16% of F1-score with the help of forwarding features, reputation features and KM-SMOTE algorithm.
DOI10.1109/CSCWD49262.2021.9437758
Citation Keyding_spear_2021