Visible to the public Biblio

Filters: Keyword is spam detection  [Clear All Filters]
2016
Reaves, Bradley, Blue, Logan, Tian, Dave, Traynor, Patrick, Butler, Kevin R.B..  2016.  Detecting SMS Spam in the Age of Legitimate Bulk Messaging. Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks. :165–170.

Text messaging is used by more people around the world than any other communications technology. As such, it presents a desirable medium for spammers. While this problem has been studied by many researchers over the years, the recent increase in legitimate bulk traffic (e.g., account verification, 2FA, etc.) has dramatically changed the mix of traffic seen in this space, reducing the effectiveness of previous spam classification efforts. This paper demonstrates the performance degradation of those detectors when used on a large-scale corpus of text messages containing both bulk and spam messages. Against our labeled dataset of text messages collected over 14 months, the precision and recall of past classifiers fall to 23.8% and 61.3% respectively. However, using our classification techniques and labeled clusters, precision and recall rise to 100% and 96.8%. We not only show that our collected dataset helps to correct many of the overtraining errors seen in previous studies, but also present insights into a number of current SMS spam campaigns.

Goyal, Shruti, Bindu, P. V., Thilagam, P. Santhi.  2016.  Dynamic Structure for Web Graphs with Extended Functionalities. Proceedings of the International Conference on Advances in Information Communication Technology & Computing. :46:1–46:6.

The hyperlink structure of World Wide Web is modeled as a directed, dynamic, and huge web graph. Web graphs are analyzed for determining page rank, fighting web spam, detecting communities, and so on, by performing tasks such as clustering, classification, and reachability. These tasks involve operations such as graph navigation, checking link existence, and identifying active links, which demand scanning of entire graphs. Frequent scanning of very large graphs involves more I/O operations and memory overheads. To rectify these issues, several data structures have been proposed to represent graphs in a compact manner. Even though the problem of representing graphs has been actively studied in the literature, there has been much less focus on representation of dynamic graphs. In this paper, we propose Tree-Dictionary-Representation (TDR), a compressed graph representation that supports dynamic nature of graphs as well as the various graph operations. Our experimental study shows that this representation works efficiently with limited main memory use and provides fast traversal of edges.

Washha, Mahdi, Qaroush, Aziz, Sedes, Florence.  2016.  Leveraging Time for Spammers Detection on Twitter. Proceedings of the 8th International Conference on Management of Digital EcoSystems. :109–116.

Twitter is one of the most popular microblogging social systems, which provides a set of distinctive posting services operating in real time. The flexibility of these services has attracted unethical individuals, so-called "spammers", aiming at spreading malicious, phishing, and misleading information. Unfortunately, the existence of spam results non-ignorable problems related to search and user's privacy. In the battle of fighting spam, various detection methods have been designed, which work by automating the detection process using the "features" concept combined with machine learning methods. However, the existing features are not effective enough to adapt spammers' tactics due to the ease of manipulation in the features. Also, the graph features are not suitable for Twitter based applications, though the high performance obtainable when applying such features. In this paper, beyond the simple statistical features such as number of hashtags and number of URLs, we examine the time property through advancing the design of some features used in the literature, and proposing new time based features. The new design of features is divided between robust advanced statistical features incorporating explicitly the time attribute, and behavioral features identifying any posting behavior pattern. The experimental results show that the new form of features is able to classify correctly the majority of spammers with an accuracy higher than 93% when using Random Forest learning algorithm, applied on a collected and annotated data-set. The results obtained outperform the accuracy of the state of the art features by about 6%, proving the significance of leveraging time in detecting spam accounts.

Tong, Van, Nguyen, Giang.  2016.  A Method for Detecting DGA Botnet Based on Semantic and Cluster Analysis. Proceedings of the Seventh Symposium on Information and Communication Technology. :272–277.

Botnets play major roles in a vast number of threats to network security, such as DDoS attacks, generation of spam emails, information theft. Detecting Botnets is a difficult task in due to the complexity and performance issues when analyzing the huge amount of data from real large-scale networks. In major Botnet malware, the use of Domain Generation Algorithms allows to decrease possibility to be detected using white list - blacklist scheme and thus DGA Botnets have higher survival. This paper proposes a DGA Botnet detection scheme based on DNS traffic analysis which utilizes semantic measures such as entropy, meaning the level of the domain, frequency of n-gram appearances and Mahalanobis distance for domain classification. The proposed method is an improvement of Phoenix botnet detection mechanism, where in the classification phase, the modified Mahalanobis distance is used instead of the original for classification. The clustering phase is based on modified k-means algorithm for archiving better effectiveness. The effectiveness of the proposed method was measured and compared with Phoenix, Linguistic and SVM Light methods. The experimental results show the accuracy of proposed Botnet detection scheme ranges from 90 to 99,97% depending on Botnet type.

Tu, Guan-Hua, Li, Chi-Yu, Peng, Chunyi, Li, Yuanjie, Lu, Songwu.  2016.  New Security Threats Caused by IMS-based SMS Service in 4G LTE Networks. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :1118–1130.

SMS (Short Messaging Service) is a text messaging service for mobile users to exchange short text messages. It is also widely used to provide SMS-powered services (e.g., mobile banking). With the rapid deployment of all-IP 4G mobile networks, the underlying technology of SMS evolves from the legacy circuit-switched network to the IMS (IP Multimedia Subsystem) system over packet-switched network. In this work, we study the insecurity of the IMS-based SMS. We uncover its security vulnerabilities and exploit them to devise four SMS attacks: silent SMS abuse, SMS spoofing, SMS client DoS, and SMS spamming. We further discover that those SMS threats can propagate towards SMS-powered services, thereby leading to three malicious attacks: social network account hijacking, unauthorized donation, and unauthorized subscription. Our analysis reveals that the problems stem from the loose security regulations among mobile phones, carrier networks, and SMS-powered services. We finally propose remedies to the identified security issues.

Russu, Paolo, Demontis, Ambra, Biggio, Battista, Fumera, Giorgio, Roli, Fabio.  2016.  Secure Kernel Machines Against Evasion Attacks. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. :59–69.

Machine learning is widely used in security-sensitive settings like spam and malware detection, although it has been shown that malicious data can be carefully modified at test time to evade detection. To overcome this limitation, adversary-aware learning algorithms have been developed, exploiting robust optimization and game-theoretical models to incorporate knowledge of potential adversarial data manipulations into the learning algorithm. Despite these techniques have been shown to be effective in some adversarial learning tasks, their adoption in practice is hindered by different factors, including the difficulty of meeting specific theoretical requirements, the complexity of implementation, and scalability issues, in terms of computational time and space required during training. In this work, we aim to develop secure kernel machines against evasion attacks that are not computationally more demanding than their non-secure counterparts. In particular, leveraging recent work on robustness and regularization, we show that the security of a linear classifier can be drastically improved by selecting a proper regularizer, depending on the kind of evasion attack, as well as unbalancing the cost of classification errors. We then discuss the security of nonlinear kernel machines, and show that a proper choice of the kernel function is crucial. We also show that unbalancing the cost of classification errors and varying some kernel parameters can further improve classifier security, yielding decision functions that better enclose the legitimate data. Our results on spam and PDF malware detection corroborate our analysis.

Hyun, Yoonjin, Kim, Namgyu.  2016.  Detecting Blog Spam Hashtags Using Topic Modeling. Proceedings of the 18th Annual International Conference on Electronic Commerce: E-Commerce in Smart Connected World. :43:1–43:6.

Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers as this data contains abundant information about various consumers' opinions. However, as the usefulness of text data is increasing, attempts to gain profits by distorting text data maliciously or non-maliciously are also increasing. In this sense, various types of spam detection techniques have been studied to prevent the side effects of spamming. The most representative studies include e-mail spam detection, web spam detection, and opinion spam detection. "Spam" is recognized on the basis of three characteristics and actions: (1) if a certain user is recognized as a spammer, then all content created by that user should be recognized as spam; (2) if certain content is exposed to other users (regardless of the users' intention), then content is recognized as spam; and (3) any content that contains malicious or non-malicious false information is recognized as spam. Many studies have been performed to solve type (1) and type (2) spamming by analyzing various metadata, such as user networks and spam words. In the case of type (3), however, relatively few studies have been conducted because it is difficult to determine the veracity of a certain word or information. In this study, we regard a hashtag that is irrelevant to the content of a blog post as spam and devise a methodology to detect such spam hashtags.

Kumar, Vimal, Kumar, Satish, Gupta, Avadhesh Kumar.  2016.  Real-time Detection of Botnet Behavior in Cloud Using Domain Generation Algorithm. Proceedings of the International Conference on Advances in Information Communication Technology & Computing. :69:1–69:3.

In the last few years, the high acceptability of service computing delivered over the internet has exponentially created immense security challenges for the services providers. Cyber criminals are using advanced malware such as polymorphic botnets for participating in our everyday online activities and trying to access the desired information in terms of personal details, credit card numbers and banking credentials. Polymorphic botnet attack is one of the biggest attacks in the history of cybercrime and currently, millions of computers are infected by the botnet clients over the world. Botnet attack is an intelligent and highly coordinated distributed attack which consists of a large number of bots that generates big volumes of spamming e-mails and launching distributed denial of service (DDoS) attacks on the victim machines in a heterogeneous network environment. Therefore, it is necessary to detect the malicious bots and prevent their planned attacks in the cloud environment. A number of techniques have been developed for detecting the malicious bots in a network in the past literature. This paper recognize the ineffectiveness exhibited by the singnature based detection technique and networktraffic based detection such as NetFlow or traffic flow detection and Anomaly based detection. We proposed a real time malware detection methodology based on Domain Generation Algorithm. It increasesthe throughput in terms of early detection of malicious bots and high accuracy of identifying the suspicious behavior.

Hu, Xuan, Li, Banghuai, Zhang, Yang, Zhou, Changling, Ma, Hao.  2016.  Detecting Compromised Email Accounts from the Perspective of Graph Topology. Proceedings of the 11th International Conference on Future Internet Technologies. :76–82.

While email plays a growingly important role on the Internet, we are faced with more severe challenges brought by compromised email accounts, especially for the administrators of institutional email service providers. Inspired by the previous experience on spam filtering and compromised accounts detection, we propose several criteria, like Success Outdegree Proportion, Reverse Pagerank, Recipient Clustering Coefficient and Legitimate Recipient Proportion, for compromised email accounts detection from the perspective of graph topology in this paper. Specifically, several widely used social network analysis metrics are used and adapted according to the characteristics of mail log analysis. We evaluate our methods on a dataset constructed by mining the one month (30 days) mail log from an university with 118,617 local users and 11,460,399 mail log entries. The experimental results demonstrate that our methods achieve very positive performance, and we also prove that these methods can be efficiently applied on even larger datasets.

Eom, Chris Soo-Hyun, Lee, Wookey, Lee, James Jung-Hun.  2016.  Spammer Detection for Real-time Big Data Graphs. Proceedings of the Sixth International Conference on Emerging Databases: Technologies, Applications, and Theory. :51–60.

In recent years, prodigious explosion of social network services may trigger new business models. However, it has negative aspects such as personal information spill or spamming, as well. Amongst conventional spam detection approaches, the studies which are based on vertex degrees or Local Clustering Coefficient have been caused false positive results so that normal vertices can be specified as spammers. In this paper, we propose a novel approach by employing the circuit structure in the social networks, which demonstrates the advantages of our work through the experiment.

2017
Douzi, Samira, Amar, Meryem, El Ouahidi, Bouabid.  2017.  Advanced Phishing Filter Using Autoencoder and Denoising Autoencoder. Proceedings of the International Conference on Big Data and Internet of Thing. :125–129.

Phishing is referred as an attempt to obtain sensitive information, such as usernames, passwords, and credit card details (and, indirectly, money), for malicious reasons, by disguising as a trustworthy entity in an electronic communication [1]. Hackers and malicious users, often use Emails as phishing tools to obtain the personal data of legitimate users, by sending Emails with authentic identities, legitimate content, but also with malicious URL, which help them to steal consumer's data. The high dimensional data in phishing context contains large number of redundant features that significantly elevate the classification error. Additionally, the time required to perform classification increases with the number of features. So extracting complex Features from phishing Emails requires us to determine which Features are relevant and fundamental in phishing detection. The dominant approaches in phishing are based on machine learning techniques; these rely on manual feature engineering, which is time consuming. On the other hand, deep learning is a promising alternative to traditional methods. The main idea of deep learning techniques is to learn complex features extracted from data with minimum external contribution [2]. In this paper, we propose new phishing detection and prevention approach, based first on our previous spam filter [3] to classify textual content of Email. Secondly it's based on Autoencoder and on Denoising Autoencoder (DAE), to extract relevant and robust features set of URL (to which the website is actually directed), therefore the features space could be reduced considerably, and thus decreasing the phishing detection time.

Divita, Joseph, Hallman, Roger A..  2017.  An Approach to Botnet Malware Detection Using Nonparametric Bayesian Methods. Proceedings of the 12th International Conference on Availability, Reliability and Security. :75:1–75:9.

Botnet malware, which infects Internet-connected devices and seizes control for a remote botmaster, is a long-standing threat to Internet-connected users and systems. Botnets are used to conduct DDoS attacks, distributed computing (e.g., mining bitcoins), spread electronic spam and malware, conduct cyberwarfare, conduct click-fraud scams, and steal personal user information. Current approaches to the detection and classification of botnet malware include syntactic, or signature-based, and semantic, or context-based, detection techniques. Both methods have shortcomings and botnets remain a persistent threat. In this paper, we propose a method of botnet detection using Nonparametric Bayesian Methods.

Yao, Yuanshun, Viswanath, Bimal, Cryan, Jenna, Zheng, Haitao, Zhao, Ben Y..  2017.  Automated Crowdturfing Attacks and Defenses in Online Review Systems. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. :1143–1158.

Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers. In this paper, we identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. Not only are these attacks cheap and therefore more scalable, but they can control rate of content output to eliminate the signature burstiness that makes crowdsourced campaigns easy to detect. Using Yelp reviews as an example platform, we show how a two phased review generation and customization attack can produce reviews that are indistinguishable by state-of-the-art statistical detectors. We conduct a survey-based user study to show these reviews not only evade human detection, but also score high on "usefulness" metrics by users. Finally, we develop novel automated defenses against these attacks, by leveraging the lossy transformation introduced by the RNN training and generation cycle. We consider countermeasures against our mechanisms, show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers.

Azad, Muhammad Ajmal, Bag, Samiran.  2017.  Decentralized Privacy-aware Collaborative Filtering of Smart Spammers in a Telecommunication Network. Proceedings of the Symposium on Applied Computing. :1711–1717.

Smart spammers and telemarketers circumvent the standalone spam detection systems by making low rate spam-ming activity to a large number of recipients distributed across many telecommunication operators. The collaboration among multiple telecommunication operators (OPs) will allow operators to get rid of unwanted callers at the early stage of their spamming activity. The challenge in the design of collaborative spam detection system is that OPs are not willing to share certain information about behaviour of their users/customers because of privacy concerns. Ideally, operators agree to share certain aggregated statistical information if collaboration process ensures complete privacy protection of users and their network data. To address this challenge and convince OPs for the collaboration, this paper proposes a decentralized reputation aggregation protocol that enables OPs to take part in a collaboration process without use of a trusted third party centralized system and without developing a predefined trust relationship with other OPs. To this extent, the collaboration among operators is achieved through the exchange of cryptographic reputation scores among OPs thus fully protects relationship network and reputation scores of users even in the presence of colluders. We evaluate the performance of proposed protocol over the simulated data consisting of five collaborators. Experimental results revealed that proposed approach outperforms standalone systems in terms of true positive rate and false positive rate.

García-Recuero, Álvaro.  2017.  Efficient Privacy-preserving Adversarial Learning in Decentralized Online Social Networks. Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. :1132–1135.

In the last decade we have witnessed a more than prolific growth of online social media content in sites designed for online social interactions. These systems have been traditionally designed as centralized silos, which unfortunately suffer from abusive behavior ranging from spam, cyberbullying to even censorship. This paper investigates the utility of supervised learning techniques for abuse detection in future decentralized settings, where less metadata remains available for use in learning algorithms. We present a method that uses a privacy-preserving protocol to exchange a fingerprint of the neighborhood of a pair of nodes, namely sender and receiver. Our method extracts social graph metadata to form a subset of key features, namely neighborhood knowledge, some of which we approximate to reduce communication and computational requirements of such a protocol. In our benchmarking we show that a data minimization approach can obtain features 13% faster while providing similar or, as with the SVM classifier, even better abuse detection rates with just approximated Private Set Intersection.

Jianyu, Wang, Chunming, Wu, Shouling, Ji, Qinchen, Gu, Zhao, Li.  2017.  Fraud Detection via Coding Nominal Attributes. Proceedings of the 2017 2Nd International Conference on Multimedia Systems and Signal Processing. :42–45.

Research on advertisement has mainly focused on how to accurately predict the click-through rate (CTR). Much less is known about fraud detection and malicious behavior defense. Previous studies usually use statistics, design threshold and manually make strategies, which cannot find potential fraud behavior effectively and suffer from new attacks. In this paper, we make the first step to understand the type of malicious activities on large-scale online advertising platforms. By analyzing each feature comprehensively, we propose a novel coding approach to transform nominal attributes into numeric while maintaining the most effective information of the original data for fraud detection. Next, we code important features such as IP and cookie in our dataset and train machine learning methods to detect fraud traffic automatically. Experimental results on real datasets demonstrate that the proposed fraud detection method performs well considering both the accuracy and efficiency. Finally, we conclude how to design a defense system by considering which methods could be used for the anti-spam gaming in the future.

Carpineto, Claudio, Romano, Giovanni.  2017.  Learning to Detect and Measure Fake Ecommerce Websites in Search-engine Results. Proceedings of the International Conference on Web Intelligence. :403–410.

When searching for a brand name in search engines, it is very likely to come across websites that sell fake brand's products. In this paper, we study how to tackle and measure this problem automatically. Our solution consists of a pipeline with two learning stages. We first detect the ecommerce websites (including shopbots) present in the list of search results and then discriminate between legitimate and fake ecommerce websites. We identify suitable learning features for each stage and show through a prototype system termed RI.SI.CO. that this approach is feasible, fast, and highly effective. Experimenting with one goods sector, we found that RI.SI.CO. achieved better classification accuracy than that of non-expert humans. We next show that the information extracted by our method can be used to generate sector-level 'counterfeiting charts' that allow us to analyze and compare the counterfeit risk associated with different brands in a same sector. We also show that the risk of coming across counterfeit websites is affected by the particular web search engine and type of search query used by shoppers. Our research offers new insights and some very practical and useful means for analyzing and measuring counterfeit ecommerce websites in search-engine results, thus enabling targeted anti-counterfeiting actions.

Muñoz-González, Luis, Biggio, Battista, Demontis, Ambra, Paudice, Andrea, Wongrassamee, Vasin, Lupu, Emil C., Roli, Fabio.  2017.  Towards Poisoning of Deep Learning Algorithms with Back-Gradient Optimization. Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. :27–38.

A number of online services nowadays rely upon machine learning to extract valuable information from data collected in the wild. This exposes learning algorithms to the threat of data poisoning, i.e., a coordinate attack in which a fraction of the training data is controlled by the attacker and manipulated to subvert the learning process. To date, these attacks have been devised only against a limited class of binary learning algorithms, due to the inherent complexity of the gradient-based procedure used to optimize the poisoning points (a.k.a. adversarial training examples). In this work, we first extend the definition of poisoning attacks to multiclass problems. We then propose a novel poisoning algorithm based on the idea of back-gradient optimization, i.e., to compute the gradient of interest through automatic differentiation, while also reversing the learning procedure to drastically reduce the attack complexity. Compared to current poisoning strategies, our approach is able to target a wider class of learning algorithms, trained with gradient-based procedures, including neural networks and deep learning architectures. We empirically evaluate its effectiveness on several application examples, including spam filtering, malware detection, and handwritten digit recognition. We finally show that, similarly to adversarial test examples, adversarial training examples can also be transferred across different learning algorithms.

Al-Janabi, Mohammed, Quincey, Ed de, Andras, Peter.  2017.  Using Supervised Machine Learning Algorithms to Detect Suspicious URLs in Online Social Networks. Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. :1104–1111.

The increasing volume of malicious content in social networks requires automated methods to detect and eliminate such content. This paper describes a supervised machine learning classification model that has been built to detect the distribution of malicious content in online social networks (ONSs). Multisource features have been used to detect social network posts that contain malicious Uniform Resource Locators (URLs). These URLs could direct users to websites that contain malicious content, drive-by download attacks, phishing, spam, and scams. For the data collection stage, the Twitter streaming application programming interface (API) was used and VirusTotal was used for labelling the dataset. A random forest classification model was used with a combination of features derived from a range of sources. The random forest model without any tuning and feature selection produced a recall value of 0.89. After further investigation and applying parameter tuning and feature selection methods, however, we were able to improve the classifier performance to 0.92 in recall.

2018
Liu, Ninghao, Yang, Hongxia, Hu, Xia.  2018.  Adversarial Detection with Model Interpretation. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. :1803–1811.
Machine learning (ML) systems have been increasingly applied in web security applications such as spammer detection, malware detection and fraud detection. These applications have an intrinsic adversarial nature where intelligent attackers can adaptively change their behaviors to avoid being detected by the deployed detectors. Existing efforts against adversaries are usually limited by the type of applied ML models or the specific applications such as image classification. Additionally, the working mechanisms of ML models usually cannot be well understood by users, which in turn impede them from understanding the vulnerabilities of models nor improving their robustness. To bridge the gap, in this paper, we propose to investigate whether model interpretation could potentially help adversarial detection. Specifically, we develop a novel adversary-resistant detection framework by utilizing the interpretation of ML models. The interpretation process explains the mechanism of how the target ML model makes prediction for a given instance, thus providing more insights for crafting adversarial samples. The robustness of detectors is then improved through adversarial training with the adversarial samples. A data-driven method is also developed to empirically estimate costs of adversaries in feature manipulation. Our approach is model-agnostic and can be applied to various types of classification models. Our experimental results on two real-world datasets demonstrate the effectiveness of interpretation-based attacks and how estimated feature manipulation cost would affect the behavior of adversaries.
Ho, Kenny, Liesaputra, Veronica, Yongchareon, Sira, Mohaghegh, Mahsa.  2018.  Evaluating Social Spammer Detection Systems. Proceedings of the Australasian Computer Science Week Multiconference. :18:1–18:7.
The rising popularity of social network services, such as Twitter, has attracted many spammers and created a large number of fake accounts, overwhelming legitimate users with advertising, malware and unwanted and disruptive information. This not only inconveniences the users' social activities but causes financial loss and privacy issues. Identifying social spammers is challenging because spammers continually change their strategies to fool existing anti-spamming systems. Thus, many researchers have tried to propose new classification systems using various types of features extracted from the content and user's information. However, no comprehensive comparative study has been done to compare the effectiveness and the efficiency of the existing systems. At this stage, it is hard to know what the best anti spamming system is and why. This paper proposes a unified evaluation workbench that allows researchers to access various user and content-based features, implement new features, and evaluate and compare the performance of their systems against existing systems. Through our analysis, we can identify the most effective and efficient social spammer detection features and help develop a faster and more accurate classifier model that has higher true positives and lower false positives.
Gupta, M., Bakliwal, A., Agarwal, S., Mehndiratta, P..  2018.  A Comparative Study of Spam SMS Detection Using Machine Learning Classifiers. 2018 Eleventh International Conference on Contemporary Computing (IC3). :1–7.
With technological advancements and increment in content based advertisement, the use of Short Message Service (SMS) on phones has increased to such a significant level that devices are sometimes flooded with a number of spam SMS. These spam messages can lead to loss of private data as well. There are many content-based machine learning techniques which have proven to be effective in filtering spam emails. Modern day researchers have used some stylistic features of text messages to classify them to be ham or spam. SMS spam detection can be greatly influenced by the presence of known words, phrases, abbreviations and idioms. This paper aims to compare different classifying techniques on different datasets collected from previous research works, and evaluate them on the basis of their accuracies, precision, recall and CAP Curve. The comparison has been performed between traditional machine learning techniques and deep learning methods.
Peng, W., Huang, L., Jia, J., Ingram, E..  2018.  Enhancing the Naive Bayes Spam Filter Through Intelligent Text Modification Detection. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :849–854.

Spam emails have been a chronic issue in computer security. They are very costly economically and extremely dangerous for computers and networks. Despite of the emergence of social networks and other Internet based information exchange venues, dependence on email communication has increased over the years and this dependence has resulted in an urgent need to improve spam filters. Although many spam filters have been created to help prevent these spam emails from entering a user's inbox, there is a lack or research focusing on text modifications. Currently, Naive Bayes is one of the most popular methods of spam classification because of its simplicity and efficiency. Naive Bayes is also very accurate; however, it is unable to correctly classify emails when they contain leetspeak or diacritics. Thus, in this proposes, we implemented a novel algorithm for enhancing the accuracy of the Naive Bayes Spam Filter so that it can detect text modifications and correctly classify the email as spam or ham. Our Python algorithm combines semantic based, keyword based, and machine learning algorithms to increase the accuracy of Naive Bayes compared to Spamassassin by over two hundred percent. Additionally, we have discovered a relationship between the length of the email and the spam score, indicating that Bayesian Poisoning, a controversial topic, is actually a real phenomenon and utilized by spammers.

Vishagini, V., Rajan, A. K..  2018.  An Improved Spam Detection Method with Weighted Support Vector Machine. 2018 International Conference on Data Science and Engineering (ICDSE). :1–5.
Email is the most admired method of exchanging messages using the Internet. One of the intimidations to email users is to detect the spam they receive. This can be addressed using different detection and filtering techniques. Machine learning algorithms, especially Support Vector Machine (SVM), can play vital role in spam detection. We propose the use of weighted SVM for spam filtering using weight variables obtained by KFCM algorithm. The weight variables reflect the importance of different classes. The misclassification of emails is reduced by the growth of weight value. We evaluate the impact of spam detection using SVM, WSVM with KPCM and WSVM with KFCM.UCI Repository SMS Spam base dataset is used for our experimentation.
Ali, S. S., Maqsood, J..  2018.  .Net library for SMS spam detection using machine learning: A cross platform solution. 2018 15th International Bhurban Conference on Applied Sciences and Technology (IBCAST). :470–476.

Short Message Service is now-days the most used way of communication in the electronic world. While many researches exist on the email spam detection, we haven't had the insight knowledge about the spam done within the SMS's. This might be because the frequency of spam in these short messages is quite low than the emails. This paper presents different ways of analyzing spam for SMS and a new pre-processing way to get the actual dataset of spam messages. This dataset was then used on different algorithm techniques to find the best working algorithm in terms of both accuracy and recall. Random Forest algorithm was then implemented in a real world application library written in C\# for cross platform .Net development. This library is capable of using a prebuild model for classifying a new dataset for spam and ham.