Visible to the public Leveraging Time for Spammers Detection on Twitter

TitleLeveraging Time for Spammers Detection on Twitter
Publication TypeConference Paper
Year of Publication2016
AuthorsWashha, Mahdi, Qaroush, Aziz, Sedes, Florence
Conference NameProceedings of the 8th International Conference on Management of Digital EcoSystems
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4267-4
Keywordshoneypot, Human Behavior, legitimate users, machine learning, Metrics, pubcrawl, Scalability, spam, spam detection, time
Abstract

Twitter is one of the most popular microblogging social systems, which provides a set of distinctive posting services operating in real time. The flexibility of these services has attracted unethical individuals, so-called "spammers", aiming at spreading malicious, phishing, and misleading information. Unfortunately, the existence of spam results non-ignorable problems related to search and user's privacy. In the battle of fighting spam, various detection methods have been designed, which work by automating the detection process using the "features" concept combined with machine learning methods. However, the existing features are not effective enough to adapt spammers' tactics due to the ease of manipulation in the features. Also, the graph features are not suitable for Twitter based applications, though the high performance obtainable when applying such features. In this paper, beyond the simple statistical features such as number of hashtags and number of URLs, we examine the time property through advancing the design of some features used in the literature, and proposing new time based features. The new design of features is divided between robust advanced statistical features incorporating explicitly the time attribute, and behavioral features identifying any posting behavior pattern. The experimental results show that the new form of features is able to classify correctly the majority of spammers with an accuracy higher than 93% when using Random Forest learning algorithm, applied on a collected and annotated data-set. The results obtained outperform the accuracy of the state of the art features by about 6%, proving the significance of leveraging time in detecting spam accounts.

URLhttp://doi.acm.org/10.1145/3012071.3012078
DOI10.1145/3012071.3012078
Citation Keywashha_leveraging_2016