Detecting phishing attacks from URL by using NLP techniques
Title | Detecting phishing attacks from URL by using NLP techniques |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Buber, E., Dırı, B., Sahingoz, O. K. |
Conference Name | 2017 International Conference on Computer Science and Engineering (UBMK) |
ISBN Number | 978-1-5386-0930-9 |
Keywords | Computer crime, Cyber Attack Detection, cyber attack threats, cyber security, Human Behavior, Internet, Internet users, Law, learning (artificial intelligence), machine learning, machine learning-based system, Markov processes, Nanoelectromechanical systems, natural language processing, natural language processing techniques, NLP, phishing attack, phishing attack analysis report, Postal services, pubcrawl, random forest algorithm, Resiliency, Scalability, security of data, Uniform resource locators, unsolicited e-mail, URL |
Abstract | Nowadays, cyber attacks affect many institutions and individuals, and they result in a serious financial loss for them. Phishing Attack is one of the most common types of cyber attacks which is aimed at exploiting people's weaknesses to obtain confidential information about them. This type of cyber attack threats almost all internet users and institutions. To reduce the financial loss caused by this type of attacks, there is a need for awareness of the users as well as applications with the ability to detect them. In the last quarter of 2016, Turkey appears to be second behind China with an impact rate of approximately 43% in the Phishing Attack Analysis report between 45 countries. In this study, firstly, the characteristics of this type of attack are explained, and then a machine learning based system is proposed to detect them. In the proposed system, some features were extracted by using Natural Language Processing (NLP) techniques. The system was implemented by examining URLs used in Phishing Attacks before opening them with using some extracted features. Many tests have been applied to the created system, and it is seen that the best algorithm among the tested ones is the Random Forest algorithm with a success rate of 89.9%. |
URL | http://ieeexplore.ieee.org/document/8093406/ |
DOI | 10.1109/UBMK.2017.8093406 |
Citation Key | buber_detecting_2017 |
- natural language processing
- URL
- unsolicited e-mail
- Uniform resource locators
- security of data
- Scalability
- Resiliency
- random forest algorithm
- pubcrawl
- Postal services
- phishing attack analysis report
- phishing attack
- NLP
- natural language processing techniques
- Computer crime
- Nanoelectromechanical systems
- Markov processes
- machine learning-based system
- machine learning
- learning (artificial intelligence)
- Law
- Internet users
- internet
- Human behavior
- cyber security
- cyber attack threats
- Cyber Attack Detection