Detecting phishing attacks from URL by using NLP techniques

Submitted by K_Hooper on Wed, 01/10/2018 - 11:15am

Title	Detecting phishing attacks from URL by using NLP techniques
Publication Type	Conference Paper
Year of Publication	2017
Authors	Buber, E., Dırı, B., Sahingoz, O. K.
Conference Name	2017 International Conference on Computer Science and Engineering (UBMK)
ISBN Number	978-1-5386-0930-9
Keywords	Computer crime, Cyber Attack Detection, cyber attack threats, cyber security, Human Behavior, Internet, Internet users, Law, learning (artificial intelligence), machine learning, machine learning-based system, Markov processes, Nanoelectromechanical systems, natural language processing, natural language processing techniques, NLP, phishing attack, phishing attack analysis report, Postal services, pubcrawl, random forest algorithm, Resiliency, Scalability, security of data, Uniform resource locators, unsolicited e-mail, URL
Abstract	Nowadays, cyber attacks affect many institutions and individuals, and they result in a serious financial loss for them. Phishing Attack is one of the most common types of cyber attacks which is aimed at exploiting people's weaknesses to obtain confidential information about them. This type of cyber attack threats almost all internet users and institutions. To reduce the financial loss caused by this type of attacks, there is a need for awareness of the users as well as applications with the ability to detect them. In the last quarter of 2016, Turkey appears to be second behind China with an impact rate of approximately 43% in the Phishing Attack Analysis report between 45 countries. In this study, firstly, the characteristics of this type of attack are explained, and then a machine learning based system is proposed to detect them. In the proposed system, some features were extracted by using Natural Language Processing (NLP) techniques. The system was implemented by examining URLs used in Phishing Attacks before opening them with using some extracted features. Many tests have been applied to the created system, and it is seen that the best algorithm among the tested ones is the Random Forest algorithm with a success rate of 89.9%.
URL	http://ieeexplore.ieee.org/document/8093406/
DOI	10.1109/UBMK.2017.8093406
Citation Key	buber_detecting_2017

Groups:

Science of Security VO