Detecting Blog Spam Hashtags Using Topic Modeling
Title | Detecting Blog Spam Hashtags Using Topic Modeling |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Hyun, Yoonjin, Kim, Namgyu |
Conference Name | Proceedings of the 18th Annual International Conference on Electronic Commerce: E-Commerce in Smart Connected World |
Date Published | August 2016 |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4222-3 |
Keywords | hash tag spam, Human Behavior, Metrics, pubcrawl, Scalability, spam detection, text mining, topic modeling |
Abstract | Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers as this data contains abundant information about various consumers' opinions. However, as the usefulness of text data is increasing, attempts to gain profits by distorting text data maliciously or non-maliciously are also increasing. In this sense, various types of spam detection techniques have been studied to prevent the side effects of spamming. The most representative studies include e-mail spam detection, web spam detection, and opinion spam detection. "Spam" is recognized on the basis of three characteristics and actions: (1) if a certain user is recognized as a spammer, then all content created by that user should be recognized as spam; (2) if certain content is exposed to other users (regardless of the users' intention), then content is recognized as spam; and (3) any content that contains malicious or non-malicious false information is recognized as spam. Many studies have been performed to solve type (1) and type (2) spamming by analyzing various metadata, such as user networks and spam words. In the case of type (3), however, relatively few studies have been conducted because it is difficult to determine the veracity of a certain word or information. In this study, we regard a hashtag that is irrelevant to the content of a blog post as spam and devise a methodology to detect such spam hashtags. |
URL | https://dl.acm.org/doi/10.1145/2971603.2971646 |
DOI | 10.1145/2971603.2971646 |
Citation Key | hyun_detecting_2016 |