Visible to the public Advanced Phishing Filter Using Autoencoder and Denoising Autoencoder

TitleAdvanced Phishing Filter Using Autoencoder and Denoising Autoencoder
Publication TypeConference Paper
Year of Publication2017
AuthorsDouzi, Samira, Amar, Meryem, El Ouahidi, Bouabid
Conference NameProceedings of the International Conference on Big Data and Internet of Thing
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-5430-1
Keywordsautoencoder, denoising autoencoder, Human Behavior, Metrics, phishing, pubcrawl, Scalability, spam detection, Spam-filter
Abstract

Phishing is referred as an attempt to obtain sensitive information, such as usernames, passwords, and credit card details (and, indirectly, money), for malicious reasons, by disguising as a trustworthy entity in an electronic communication [1]. Hackers and malicious users, often use Emails as phishing tools to obtain the personal data of legitimate users, by sending Emails with authentic identities, legitimate content, but also with malicious URL, which help them to steal consumer's data. The high dimensional data in phishing context contains large number of redundant features that significantly elevate the classification error. Additionally, the time required to perform classification increases with the number of features. So extracting complex Features from phishing Emails requires us to determine which Features are relevant and fundamental in phishing detection. The dominant approaches in phishing are based on machine learning techniques; these rely on manual feature engineering, which is time consuming. On the other hand, deep learning is a promising alternative to traditional methods. The main idea of deep learning techniques is to learn complex features extracted from data with minimum external contribution [2]. In this paper, we propose new phishing detection and prevention approach, based first on our previous spam filter [3] to classify textual content of Email. Secondly it's based on Autoencoder and on Denoising Autoencoder (DAE), to extract relevant and robust features set of URL (to which the website is actually directed), therefore the features space could be reduced considerably, and thus decreasing the phishing detection time.

URLhttp://doi.acm.org/10.1145/3175684.3175690
DOI10.1145/3175684.3175690
Citation Keydouzi_advanced_2017