Visible to the public Spam Detection Framework for Online Reviews Using Hadoop’ s Computational Capability

TitleSpam Detection Framework for Online Reviews Using Hadoop’ s Computational Capability
Publication TypeConference Paper
Year of Publication2018
AuthorsLekshmi, M. B., Deepthi, V. R.
Conference Name2018 International CET Conference on Control, Communication, and Computing (IC4)
Date Publishedjul
KeywordsBig Data, big data processing, Business, business decisions, decision making, Distribution functions, false reviews, feature extraction, feature type extraction, Hadoop, Hadoop computational capability, Human Behavior, Internet, Labeling, learning (artificial intelligence), machine learning techniques, MapReduce, MapReduce feature, metadata, Metrics, NetSpam, online review, online reviews, online shopping, parallel programming, pattern classification, pubcrawl, reliability, retail data processing, review dataset classification, Scalability, spam detection, spam detection framework, spam detection methods, spam detection procedure, spam features, spam review, spam reviews, unsolicited e-mail
AbstractNowadays, online reviews have become one of the vital elements for customers to do online shopping. Organizations and individuals use this information to buy the right products and make business decisions. This has influenced the spammers or unethical business people to create false reviews and promote their products to out-beat competitions. Sophisticated systems are developed by spammers to create bulk of spam reviews in any websites within hours. To tackle this problem, studies have been conducted to formulate effective ways to detect the spam reviews. Various spam detection methods have been introduced in which most of them extracts meaningful features from the text or used machine learning techniques. These approaches gave little importance on extracted feature type and processing rate. NetSpam[1] defines a framework which can classify the review dataset based on spam features and maps them to a spam detection procedure which performs better than previous works in predictive accuracy. In this work, a method is proposed that can improve the processing rate by applying a distributed approach on review dataset using MapReduce feature. Parallel programming concept using MapReduce is used for processing big data in Hadoop. The solution involves parallelising the algorithm defined in NetSpam and it defines a spam detection procedure with better predictive accuracy and processing rate.
DOI10.1109/CETIC4.2018.8530957
Citation Keylekshmi_spam_2018