Visible to the public Detection of Hurriedly Created Abnormal Profiles in Recommender Systems

TitleDetection of Hurriedly Created Abnormal Profiles in Recommender Systems
Publication TypeConference Paper
Year of Publication2018
AuthorsPanagiotakis, C., Papadakis, H., Fragopoulou, P.
Conference Name2018 International Conference on Intelligent Systems (IS)
ISBN Number978-1-5386-7097-2
Keywordsanomalous rating profiles, attackers, clustering methods, dimensionality reduction, feature extraction, gaussian distribution, Hidden Markov models, human factors, hurriedly created abnormal profiles, information filtering, k-means clustering, Labeling, learning (artificial intelligence), Outliers, pattern clustering, profile injection attacks, pubcrawl, Random Forest, random ratings, recommender systems, resilience, Resiliency, Scalability, security of data, specific items, specific ratings, spurious profiles, synthetic coordinates, system ratings, user rating behavior, user-item rating matrix
Abstract

Recommender systems try to predict the preferences of users for specific items. These systems suffer from profile injection attacks, where the attackers have some prior knowledge of the system ratings and their goal is to promote or demote a particular item introducing abnormal (anomalous) ratings. The detection of both cases is a challenging problem. In this paper, we propose a framework to spot anomalous rating profiles (outliers), where the outliers hurriedly create a profile that injects into the system either random ratings or specific ratings, without any prior knowledge of the existing ratings. The proposed detection method is based on the unpredictable behavior of the outliers in a validation set, on the user-item rating matrix and on the similarity between users. The proposed system is totally unsupervised, and in the last step it uses the k-means clustering method automatically spotting the spurious profiles. For the cases where labeling sample data is available, a random forest classifier is trained to show how supervised methods outperforms unsupervised ones. Experimental results on the MovieLens 100k and the MovieLens 1M datasets demonstrate the high performance of the proposed schemata.

URLhttps://ieeexplore.ieee.org/document/8710589
DOI10.1109/IS.2018.8710589
Citation Keypanagiotakis_detection_2018