Biblio
In recent years, websites that incorporate user reviews, such as Amazon, IMDB and YELP, have become exceedingly popular. As an important factor affecting users purchasing behavior, review information has been becoming increasingly important, and accordingly, the reliability of review information becomes an important issue. This paper proposes a method to more accurately detect the appearance period of spam reviews and to identify the spam reviews by verifying the consistency of review information among multiple review sites. Evaluation experiments were conducted to show the accuracy of the detection results, and compared the newly proposed method with our previously proposed method.
Recent Development in Hardware and Software Technology for the communication email is preferred. But due to the unbidden emails, it affects communication. There is a need for detection and classification of spam email. In this present research email spam detection and classification, models are built. We have used different Machine learning classifiers like Naive Bayes, SVM, KNN, Bagging and Boosting (Adaboost), and Ensemble Classifiers with a voting mechanism. Evaluation and testing of classifiers is performed on email spam dataset from UCI Machine learning repository and Kaggle website. Different accuracy measures like Accuracy Score, F measure, Recall, Precision, Support and ROC are used. The preliminary result shows that Ensemble Classifier with a voting mechanism is the best to be used. It gives the minimum false positive rate and high accuracy.
Spam is a genuine and irritating issue for quite a longtime. Despite the fact that a lot of arrangements have been advanced, there still remains a considerable measure to be advanced in separating spam messages all the more proficiently. These days a noteworthy issue in spam separating also as content characterization in common dialect handling is the colossal size of vector space because of the various element terms, which is normally the reason for broad figuring and moderate order. Extracting semantic implications from the substance of writings and utilizing these as highlight terms to develop the vector space, rather than utilizing words as highlight terms in convention ways, could decrease the component of vectors viably and advance the characterization in the meantime. In spite of the fact that there are a wide range of techniques to square spam messages, a large portion of program designers just mean to square spam messages from being conveyed to their customers. In this paper, we present an effective way to deal with keep spam messages from being exchanged.In this work, a Collaborative filtering approach with semantics-based text classification technology was proposed and the related feature terms were selected from the semantic meanings of the text content.
A robust and reliable system of detecting spam reviews is a crying need in todays world in order to purchase products without being cheated from online sites. In many online sites, there are options for posting reviews, and thus creating scopes for fake paid reviews or untruthful reviews. These concocted reviews can mislead the general public and put them in a perplexity whether to believe the review or not. Prominent machine learning techniques have been introduced to solve the problem of spam review detection. The majority of current research has concentrated on supervised learning methods, which require labeled data - an inadequacy when it comes to online review. Our focus in this article is to detect any deceptive text reviews. In order to achieve that we have worked with both labeled and unlabeled data and proposed deep learning methods for spam review detection which includes Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN) and a variant of Recurrent Neural Network (RNN) that is Long Short-Term Memory (LSTM). We have also applied some traditional machine learning classifiers such as Nave Bayes (NB), K Nearest Neighbor (KNN) and Support Vector Machine (SVM) to detect spam reviews and finally, we have shown the performance comparison for both traditional and deep learning classifiers.
Comment spam is one of the great challenges faced by forum administrators. Detecting and blocking comment spam can relieve the load on servers, improve user experience and purify the network conditions. This paper focuses on the detection of comment spam. The behaviors of spammer and the content of spam were analyzed. According to analysis results, two types of effective features are extracted which can make a better description of spammer characteristics. Additionally, a gradient boosting tree algorithm was used to construct the comment spam detector based on the extracted features. Our proposed method is examined on a blog spam dataset which was published by previous research, and the result illustrates that our method performs better than the previous method on detection accuracy. Moreover, the CPU time is recorded to demonstrate that the time spent on both training and testing maintains a small value.
Emails are the fundamental unit of web applications. There is an exponential growth in sending and receiving emails online. However, spam mail has turned into an intense issue in email correspondence condition. There are number of substance based channel systems accessible to be specific content based filter(CBF), picture based sifting and many other systems to channel spam messages. The existing technological solution consists of a combination of porter stemer algorithm(PSA) and k means clustering which is adaptive in nature. These procedures are more expensive in regard of the calculation and system assets as they required the examination of entire spam message and calculation of the entire substance of the server. These are the channels must additionally not powerful in nature life on the grounds that the idea of spam block mail and spamming changes much of the time. We propose a starting point based spam mail-sifting system benefit, which works considering top head notcher data of the mail message paying little respect to the body substance of the mail. It streamlines the system and server execution by increasing the precision, recall and accuracy than the existing methods. To design an effective and efficient of autonomous and efficient spam detection system to improve network performance from unknown privileged user attacks.
We propose a new spam detection approach based solely on meta data features gained from email headers. The approach achieves above 99 % classification accuracy on the CSDMC2010 dataset, which matches or surpasses state-of-the-art spam classifiers. We utilize a static set of engineered features, supplemented with automatically extracted features. The approach is just as effective for spam detection in end-to-end encryption, as our feature set remains unchanged for encrypted emails. In contrast to most established spam detectors, we disregard the email body completely and can therefore deliver very high classification speeds, as computationally expensive text preprocessing is not necessary.
Short messages usage has been tremendously increased such as SMS, tweets and status updates. Due to its popularity and ease of use, many companies use it for advertisement purpose. Hackers also use SMS to defraud users and steal personal information. In this paper, the use of Graphs centrality metrics is proposed for spam SMS detection. The graph centrality measures: degree, closeness, and eccentricity are used for classification of SMS. Graphs for each class are created using labeled SMS and then unlabeled SMS is classified using the centrality scores of the token available in the unclassified SMS. Our results show that highest precision and recall is achieved by using degree centrality. Degree centrality achieved the highest precision i.e. 0.81 and recall i.e., 0.76 for spam messages.
This paper presents the details of the roving proxy framework for SMS spam and SMS phishing (SMishing) detection. The framework aims to protect organizations and enterprises from the danger of SMishing attacks. Feasibility and functionality studies of the framework are presented along with an update process study to define the minimum requirements for the system to adapt with the latest spam and SMishing trends.
Spams are unsolicited and unnecessary messages which may contain harmful codes or links for activation of malicious viruses and spywares. Increasing popularity of social networks attracts the spammers to perform malicious activities in social networks. So an efficient spam detection method is necessary for social networks. In this paper, feed forward neural network with back propagation based spam detection model is proposed. The quality of the learning process is improved by tuning initial weights of feed forward neural network using proposed enhanced step size firefly algorithm which reduces the time for finding optimal weights during the learning process. The model is applied for twitter dataset and the experimental results show that, the proposed model performs well in terms of accuracy and detection rate and has lower false positive rate.
E-mail is widespread and an essential communication technology in modern times. Since e-mail has problems with spam mails and spoofed e-mails, countermeasures are required. Although SPF, DKIM and DMARC have been proposed as sender domain authentication, these mechanisms cannot detect non-spoofing spam mails. To overcome this issue, this paper proposes a method to detect spam domains by supervised learning with features extracted from e-mail reception log and active DNS data, such as the result of Sender Authentication, the Sender IP address, the number of each DNS record, and so on. As a result of the experiment, our method can detect spam domains with 88.09% accuracy and 97.11% precision. We confirmed that our method can detect spam domains with detection accuracy 19.40% higher than the previous study by utilizing not only active DNS data but also e-mail reception log in combination.
E-mail communication is one of today's indispensable communication ways. The widespread use of email has brought about some problems. The most important one of these problems are spam (unwanted) e-mails, often composed of advertisements or offensive content, sent without the recipient's request. In this study, it is aimed to analyze the content information of e-mails written in Turkish with the help of Naive Bayes Classifier and Vector Space Model from machine learning methods, to determine whether these e-mails are spam e-mails and classify them. Both methods are subjected to different evaluation criteria and their performances are compared.