Visible to the public “X-Phish: Days of Future Past”‡: Adaptive & Privacy Preserving Phishing Detection

Title“X-Phish: Days of Future Past”‡: Adaptive & Privacy Preserving Phishing Detection
Publication TypeConference Paper
Year of Publication2021
AuthorsDeval, Shalin Kumar, Tripathi, Meenakshi, Bezawada, Bruhadeshwar, Ray, Indrakshi
Conference Name2021 IEEE Conference on Communications and Network Security (CNS)
KeywordsAdaptation models, Adaptive, collaborative learning, feature extraction, Human Behavior, machine learning, Network security, Organizations, phishing, Phishing Detection, privacy, privacy preserving, pubcrawl
AbstractWebsite phishing continues to persist as one of the most important security threats of the modern Internet era. A major concern has been that machine learning based approaches, which have been the cornerstones of deployed phishing detection solutions, have not been able to adapt to the evolving nature of the phishing attacks. To create updated machine learning models, the collection of a sufficient corpus of real-time phishing data has always been a challenging problem as most phishing websites are short-lived. In this work, for the first time, we address these important concerns and describe an adaptive phishing detection solution that is able to adapt to changes in phishing attacks. Our solution has two major contributions. First, our solution allows for multiple organizations to collaborate in a privacy preserving manner and generate a robust machine learning model for phishing detection. Second, our solution is designed to be flexible in order to adapt to the novel phishing features introduced by attackers. Our solution not only allows for incorporating novel features into the existing machine learning model, but also can help, to a certain extent, the "unlearning" of existing features that have become obsolete in current phishing attacks. We evaluated our approach on a large real-world data collected over a period of six months. Our results achieve a high true positive rate of 97 %, which is on par with existing state-of-the art centralized solutions. Importantly, our results demonstrate that, a machine learning model can incorporate new features while selectively "unlearning" the older obsolete features.
DOI10.1109/CNS53000.2021.9705052
Citation Keydeval_x-phish_2021