Visible to the public Biblio

Filters: Author is Hyun, Yoonjin  [Clear All Filters]
2017-09-19
Hyun, Yoonjin, Kim, Namgyu.  2016.  Detecting Blog Spam Hashtags Using Topic Modeling. Proceedings of the 18th Annual International Conference on Electronic Commerce: E-Commerce in Smart Connected World. :43:1–43:6.

Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers as this data contains abundant information about various consumers' opinions. However, as the usefulness of text data is increasing, attempts to gain profits by distorting text data maliciously or non-maliciously are also increasing. In this sense, various types of spam detection techniques have been studied to prevent the side effects of spamming. The most representative studies include e-mail spam detection, web spam detection, and opinion spam detection. "Spam" is recognized on the basis of three characteristics and actions: (1) if a certain user is recognized as a spammer, then all content created by that user should be recognized as spam; (2) if certain content is exposed to other users (regardless of the users' intention), then content is recognized as spam; and (3) any content that contains malicious or non-malicious false information is recognized as spam. Many studies have been performed to solve type (1) and type (2) spamming by analyzing various metadata, such as user networks and spam words. In the case of type (3), however, relatively few studies have been conducted because it is difficult to determine the veracity of a certain word or information. In this study, we regard a hashtag that is irrelevant to the content of a blog post as spam and devise a methodology to detect such spam hashtags.