Visible to the public Finding Collusive Spam in Community Question Answering Platforms: A Pattern and Burstiness Based Method

TitleFinding Collusive Spam in Community Question Answering Platforms: A Pattern and Burstiness Based Method
Publication TypeConference Paper
Year of Publication2022
AuthorsXu, Mingming, Zhang, Lu, Zhu, Haiting
Conference Name2021 Ninth International Conference on Advanced Cloud and Big Data (CBD)
Date Publishedmar
KeywordsBig Data, Burstiness Detecting, Collusive Spamming Activities, community question answering, crowdsourcing, Frequent Pattern, Human Behavior, Knowledge engineering, machine learning, Metrics, pubcrawl, Q&A Spam Detection, Scalability, spam detection, unsolicited e-mail, Vocabulary, Web and internet services
AbstractCommunity question answering (CQA) websites have become very popular platforms attracting numerous participants to share and acquire knowledge and information in Internet However, with the rapid growth of crowdsourcing systems, many malicious users organize collusive attacks against the CQA platforms for promoting a target (product or service) via posting suggestive questions and deceptive answers. These manipulate deceptive contents, aggregating into multiple collusive questions and answers (Q&As) spam groups, can fully control the sentiment of a target and distort the decision of users, which pollute the CQA environment and make it less credible. In this paper, we propose a Pattern and Burstiness based Collusive Q&A Spam Detection method (PBCSD) to identify the deceptive questions and answers. Specifically, we intensively study the campaign process of crowdsourcing tasks and summarize the clues in the Q&As' vocabulary usage level when collusive attacks are launched. Based on the clues, we extract the Q&A groups using frequent pattern mining and further purify them by the burstiness on posting time of Q&As. By designing several discriminative features at the Q&A group level, multiple machine learning based classifiers can be used to judge the groups as deceptive or ordinary, and the Q&As in deceptive groups are finally identified as collusive Q&A spam. We evaluate the proposed PBCSD method in a real-world dataset collected from Baidu Zhidao, a famous CQA platform in China, and the experimental results demonstrate the PBCSD is effective for collusive Q&A spam detection and outperforms a number of state-of-art methods.
DOI10.1109/CBD54617.2021.00024
Citation Keyxu_finding_2022