Biblio | CPS-VO

Robic-Butez, Pierrick, Win, Thu Yein. 2019. Detection of Phishing websites using Generative Adversarial Network. 2019 IEEE International Conference on Big Data (Big Data). :3216—3221.

Phishing is typically deployed as an attack vector in the initial stages of a hacking endeavour. Due to it low-risk rightreward nature it has seen a widespread adoption, and detecting it has become a challenge in recent times. This paper proposes a novel means of detecting phishing websites using a Generative Adversarial Network. Taking into account the internal structure and external metadata of a website, the proposed approach uses a generator network which generates both legitimate as well as synthetic phishing features to train a discriminator network. The latter then determines if the features are either normal or phishing websites, before improving its detection accuracy based on the classification error. The proposed approach is evaluated using two different phishing datasets and is found to achieve a detection accuracy of up to 94%.

Korczynski, M., Tajalizadehkhoob, S., Noroozian, A., Wullink, M., Hesselman, C., v Eeten, M.. 2017. Reputation Metrics Design to Improve Intermediary Incentives for Security of TLDs. 2017 IEEE European Symposium on Security and Privacy (EuroS P). :579–594.

Over the years cybercriminals have misused the Domain Name System (DNS) - a critical component of the Internet - to gain profit. Despite this persisting trend, little empirical information about the security of Top-Level Domains (TLDs) and of the overall 'health' of the DNS ecosystem exists. In this paper, we present security metrics for this ecosystem and measure the operational values of such metrics using three representative phishing and malware datasets. We benchmark entire TLDs against the rest of the market. We explicitly distinguish these metrics from the idea of measuring security performance, because the measured values are driven by multiple factors, not just by the performance of the particular market player. We consider two types of security metrics: occurrence of abuse and persistence of abuse. In conjunction, they provide a good understanding of the overall health of a TLD. We demonstrate that attackers abuse a variety of free services with good reputation, affecting not only the reputation of those services, but of entire TLDs. We find that, when normalized by size, old TLDs like .com host more bad content than new generic TLDs. We propose a statistical regression model to analyze how the different properties of TLD intermediaries relate to abuse counts. We find that next to TLD size, abuse is positively associated with domain pricing (i.e. registries who provide free domain registrations witness more abuse). Last but not least, we observe a negative relation between the DNSSEC deployment rate and the count of phishing domains.

Korczynski, M., Tajalizadehkhoob, S., Noroozian, A., Wullink, M., Hesselman, C., v Eeten, M.. 2017. Reputation Metrics Design to Improve Intermediary Incentives for Security of TLDs. 2017 IEEE European Symposium on Security and Privacy (EuroS P). :579–594.

Over the years cybercriminals have misused the Domain Name System (DNS) - a critical component of the Internet - to gain profit. Despite this persisting trend, little empirical information about the security of Top-Level Domains (TLDs) and of the overall 'health' of the DNS ecosystem exists. In this paper, we present security metrics for this ecosystem and measure the operational values of such metrics using three representative phishing and malware datasets. We benchmark entire TLDs against the rest of the market. We explicitly distinguish these metrics from the idea of measuring security performance, because the measured values are driven by multiple factors, not just by the performance of the particular market player. We consider two types of security metrics: occurrence of abuse and persistence of abuse. In conjunction, they provide a good understanding of the overall health of a TLD. We demonstrate that attackers abuse a variety of free services with good reputation, affecting not only the reputation of those services, but of entire TLDs. We find that, when normalized by size, old TLDs like .com host more bad content than new generic TLDs. We propose a statistical regression model to analyze how the different properties of TLD intermediaries relate to abuse counts. We find that next to TLD size, abuse is positively associated with domain pricing (i.e. registries who provide free domain registrations witness more abuse). Last but not least, we observe a negative relation between the DNSSEC deployment rate and the count of phishing domains.

Abdelhamid, N., Thabtah, F., Abdel-jaber, H.. 2017. Phishing detection: A recent intelligent machine learning comparison based on models content and features. 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). :72–77.

In the last decade, numerous fake websites have been developed on the World Wide Web to mimic trusted websites, with the aim of stealing financial assets from users and organizations. This form of online attack is called phishing, and it has cost the online community and the various stakeholders hundreds of million Dollars. Therefore, effective counter measures that can accurately detect phishing are needed. Machine learning (ML) is a popular tool for data analysis and recently has shown promising results in combating phishing when contrasted with classic anti-phishing approaches, including awareness workshops, visualization and legal solutions. This article investigates ML techniques applicability to detect phishing attacks and describes their pros and cons. In particular, different types of ML techniques have been investigated to reveal the suitable options that can serve as anti-phishing tools. More importantly, we experimentally compare large numbers of ML techniques on real phishing datasets and with respect to different metrics. The purpose of the comparison is to reveal the advantages and disadvantages of ML predictive models and to show their actual performance when it comes to phishing attacks. The experimental results show that Covering approach models are more appropriate as anti-phishing solutions, especially for novice users, because of their simple yet effective knowledge bases in addition to their good phishing detection rate.