“X-Phish: Days of Future Past”‡: Adaptive & Privacy Preserving Phishing Detection

Submitted by grigby1 on Wed, 10/12/2022 - 2:36pm

Title	“X-Phish: Days of Future Past”‡: Adaptive & Privacy Preserving Phishing Detection
Publication Type	Conference Paper
Year of Publication	2021
Authors	Deval, Shalin Kumar, Tripathi, Meenakshi, Bezawada, Bruhadeshwar, Ray, Indrakshi
Conference Name	2021 IEEE Conference on Communications and Network Security (CNS)
Keywords	Adaptation models, Adaptive, collaborative learning, feature extraction, Human Behavior, machine learning, Network security, Organizations, phishing, Phishing Detection, privacy, privacy preserving, pubcrawl
Abstract	Website phishing continues to persist as one of the most important security threats of the modern Internet era. A major concern has been that machine learning based approaches, which have been the cornerstones of deployed phishing detection solutions, have not been able to adapt to the evolving nature of the phishing attacks. To create updated machine learning models, the collection of a sufficient corpus of real-time phishing data has always been a challenging problem as most phishing websites are short-lived. In this work, for the first time, we address these important concerns and describe an adaptive phishing detection solution that is able to adapt to changes in phishing attacks. Our solution has two major contributions. First, our solution allows for multiple organizations to collaborate in a privacy preserving manner and generate a robust machine learning model for phishing detection. Second, our solution is designed to be flexible in order to adapt to the novel phishing features introduced by attackers. Our solution not only allows for incorporating novel features into the existing machine learning model, but also can help, to a certain extent, the "unlearning" of existing features that have become obsolete in current phishing attacks. We evaluated our approach on a large real-world data collected over a period of six months. Our results achieve a high true positive rate of 97 %, which is on par with existing state-of-the art centralized solutions. Importantly, our results demonstrate that, a machine learning model can incorporate new features while selectively "unlearning" the older obsolete features.
DOI	10.1109/CNS53000.2021.9705052
Citation Key	deval_x-phish_2021

Groups:

Science of Security VO