Visible to the public Biblio

Filters: Keyword is Blogs  [Clear All Filters]
2023-09-20
Rawat, Amarjeet, Maheshwari, Himani, Khanduja, Manisha, Kumar, Rajiv, Memoria, Minakshi, Kumar, Sanjeev.  2022.  Sentiment Analysis of Covid19 Vaccines Tweets Using NLP and Machine Learning Classifiers. 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON). 1:225—230.
Sentiment Analysis (SA) is an approach for detecting subjective information such as thoughts, outlooks, reactions, and emotional state. The majority of previous SA work treats it as a text-classification problem that requires labelled input to train the model. However, obtaining a tagged dataset is difficult. We will have to do it by hand the majority of the time. Another concern is that the absence of sufficient cross-domain portability creates challenging situation to reuse same-labelled data across applications. As a result, we will have to manually classify data for each domain. This research work applies sentiment analysis to evaluate the entire vaccine twitter dataset. The work involves the lexicon analysis using NLP libraries like neattext, textblob and multi class classification using BERT. This word evaluates and compares the results of the machine learning algorithms.
2023-04-14
Zuo, Xiaojiang, Wang, Xiao, Han, Rui.  2022.  An Empirical Analysis of CAPTCHA Image Design Choices in Cloud Services. IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). :1–6.
Cloud service uses CAPTCHA to protect itself from malicious programs. With the explosive development of AI technology and the emergency of third-party recognition services, the factors that influence CAPTCHA’s security are going to be more complex. In such a situation, evaluating the security of mainstream CAPTCHAs in cloud services is helpful to guide better CAPTCHA design choices for providers. In this paper, we evaluate and analyze the security of 6 mainstream CAPTCHA image designs in public cloud services. According to the evaluation results, we made some suggestions of CAPTCHA image design choices to cloud service providers. In addition, we particularly discussed the CAPTCHA images adopted by Facebook and Twitter. The evaluations are separated into two stages: (i) using AI techniques alone; (ii) using both AI techniques and third-party services. The former is based on open source models; the latter is conducted under our proposed framework: CAPTCHAMix.
2023-02-17
Das, Lipsa, Ahuja, Laxmi, Pandey, Adesh.  2022.  Analysis of Twitter Spam Detection Using Machine Learning Approach. 2022 3rd International Conference on Intelligent Engineering and Management (ICIEM). :764–769.
Now a days there are many online social networks (OSN) which are very popular among Internet users and use this platform for finding new connections, sharing their activities and thoughts. Twitter is such social media platforms which is very popular among this users. Survey says, it has more than 310 million monthly users who are very active and post around 500+ million tweets in a day and this attracts, the spammer or cyber-criminal to misuse this platform for their malicious benefits. Product advertisement, phishing true users, pornography propagation, stealing the trending news, sharing malicious link to get the victims for making money are the common example of the activities of spammers. In Aug-2014, Twitter made public that 8.5% of its active Twitter users (monthly) that is approx. 23+ million users, who have automatically contacted their servers for regular updates. Thus for a spam free environment in twitter, it is greatly required to detect and filter these spammer from the legitimate users. Here in our research paper, effectiveness & features of twitter spam detection, various methods are summarized with their benefits and limitations are presented. [1]
2022-11-08
Drakopoulos, Georgios, Giannoukou, Ioanna, Mylonas, Phivos, Sioutas, Spyros.  2020.  A Graph Neural Network For Assessing The Affective Coherence Of Twitter Graphs. 2020 IEEE International Conference on Big Data (Big Data). :3618–3627.
Graph neural networks (GNNs) is an emerging class of iterative connectionist models taking full advantage of the interaction patterns in an underlying domain. Depending on their configuration GNNs aggregate local state information to obtain robust estimates of global properties. Since graphs inherently represent high dimensional data, GNNs can effectively perform dimensionality reduction for certain aggregator selections. One such task is assigning sentiment polarity labels to the vertices of a large social network based on local ground truth state vectors containing structural, functional, and affective attributes. Emotions have been long identified as key factors in the overall social network resiliency and determining such labels robustly would be a major indicator of it. As a concrete example, the proposed methodology has been applied to two benchmark graphs obtained from political Twitter with topic sampling regarding the Greek 1821 Independence Revolution and the US 2020 Presidential Elections. Based on the results recommendations for researchers and practitioners are offered.
2022-09-09
Cardaioli, Matteo, Conti, Mauro, Sorbo, Andrea Di, Fabrizio, Enrico, Laudanna, Sonia, Visaggio, Corrado A..  2021.  It’s a Matter of Style: Detecting Social Bots through Writing Style Consistency. 2021 International Conference on Computer Communications and Networks (ICCCN). :1—9.
Social bots are computer algorithms able to produce content and interact with other users on social media autonomously, trying to emulate and possibly influence humans’ behavior. Indeed, bots are largely employed for malicious purposes, like spreading disinformation and conditioning electoral campaigns. Nowadays, bots’ capability of emulating human behaviors has become increasingly sophisticated, making their detection harder. In this paper, we aim at recognizing bot-driven accounts by evaluating the consistency of users’ writing style over time. In particular, we leverage the intuition that while bots compose posts according to fairly deterministic processes, humans are influenced by subjective factors (e.g., emotions) that can alter their writing style. To verify this assumption, by using stylistic consistency indicators, we characterize the writing style of more than 12,000 among bot-driven and human-operated Twitter accounts and find that statistically significant differences can be observed between the different types of users. Thus, we evaluate the effectiveness of different machine learning (ML) algorithms based on stylistic consistency features in discerning between human-operated and bot-driven Twitter accounts and show that the experimented ML algorithms can achieve high performance (i.e., F-measure values up to 98%) in social bot detection tasks.
2022-05-19
Fareed, Samsad Beagum Sheik.  2021.  API Pipeline for Visualising Text Analytics Features of Twitter Texts. 2021 International Conference of Women in Data Science at Taif University (WiDSTaif ). :1–6.
Twitter text analysis is quite useful in analysing emotions, sentiments and feedbacks of consumers on products and services. This helps the service providers and the manufacturers to improve their products and services, address serious issues before they lead to a crisis and improve business acumen. Twitter texts also form a data source for various research studies. They are used in topic analysis, sentiment analysis, content analysis and thematic analysis. In this paper, we present a pipeline for searching, analysing and visualizing the text analytics features of twitter texts using web APIs. It allows to build a simple yet powerful twitter text analytics tool for researchers and other interested users.
2022-02-25
Bolbol, Noor, Barhoom, Tawfiq.  2021.  Mitigating Web Scrapers using Markup Randomization. 2021 Palestinian International Conference on Information and Communication Technology (PICICT). :157—162.

Web Scraping is the technique of extracting desired data in an automated way by scanning the internal links and content of a website, this activity usually performed by systematically programmed bots. This paper explains our proposed solution to protect the blog content from theft and from being copied to other destinations by mitigating the scraping bots. To achieve our purpose we applied two steps in two levels, the first one, on the main blog page level, mitigated the work of crawler bots by adding extra empty articles anchors among real articles, and the next step, on the article page level, we add a random number of empty and hidden spans with randomly generated text among the article's body. To assess this solution we apply it to a local project developed using PHP language in Laravel framework, and put four criteria that measure the effectiveness. The results show that the changes in the file size before and after the application do not affect it, also, the processing time increased by few milliseconds which still in the acceptable range. And by using the HTML-similarity tool we get very good results that show the symmetric over style, with a few bit changes over the structure. Finally, to assess the effects on the bots, scraper bot reused and get the expected results from the programmed middleware. These results show that the solution is feasible to be adopted and use to protect blogs content.

2021-11-29
Gupta, Hritvik, Patel, Mayank.  2020.  Study of Extractive Text Summarizer Using The Elmo Embedding. 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :829–834.
In recent times, data excessiveness has become a major problem in the field of education, news, blogs, social media, etc. Due to an increase in such a vast amount of text data, it became challenging for a human to extract only the valuable amount of data in a concise form. In other words, summarizing the text, enables human to retrieves the relevant and useful texts, Text summarizing is extracting the data from the document and generating the short or concise text of the document. One of the major approaches that are used widely is Automatic Text summarizer. Automatic text summarizer analyzes the large textual data and summarizes it into the short summaries containing valuable information of the data. Automatic text summarizer further divided into two types 1) Extractive text summarizer, 2) Abstractive Text summarizer. In this article, the extractive text summarizer approach is being looked for. Extractive text summarization is the approach in which model generates the concise summary of the text by picking up the most relevant sentences from the text document. This paper focuses on retrieving the valuable amount of data using the Elmo embedding in Extractive text summarization. Elmo embedding is a contextual embedding that had been used previously by many researchers in abstractive text summarization techniques, but this paper focus on using it in extractive text summarizer.
2021-02-15
Drakopoulos, G., Giotopoulos, K., Giannoukou, I., Sioutas, S..  2020.  Unsupervised Discovery Of Semantically Aware Communities With Tensor Kruskal Decomposition: A Case Study In Twitter. 2020 15th International Workshop on Semantic and Social Media Adaptation and Personalization (SMA. :1–8.
Substantial empirical evidence, including the success of synthetic graph generation models as well as of analytical methodologies, suggests that large, real graphs have a recursive community structure. The latter results, in part at least, in other important properties of these graphs such as low diameter, high clustering coefficient values, heavy degree distribution tail, and clustered graph spectrum. Notice that this structure need not be official or moderated like Facebook groups, but it can also take an ad hoc and unofficial form depending on the functionality of the social network under study as for instance the follow relationship on Twitter or the connections between news aggregators on Reddit. Community discovery is paramount in numerous applications such as political campaigns, digital marketing, crowdfunding, and fact checking. Here a tensor representation for Twitter subgraphs is proposed which takes into consideration both the followfollower relationships but also the coherency in hashtags. Community structure discovery then reduces to the computation of Tucker tensor decomposition, a higher order counterpart of the well-known unsupervised learning method of singular value decomposition (SVD). Tucker decomposition clearly outperforms the SVD in terms of finding a more compact community size distribution in experiments done in Julia on a Twitter subgraph. This can be attributed to the facts that the proposed methodology combines both structural and functional Twitter elements and that hashtags carry an increased semantic weight in comparison to ordinary tweets.
2020-09-04
Khan, Aasher, Rehman, Suriya, Khan, Muhammad U.S, Ali, Mazhar.  2019.  Synonym-based Attack to Confuse Machine Learning Classifiers Using Black-box Setting. 2019 4th International Conference on Emerging Trends in Engineering, Sciences and Technology (ICEEST). :1—7.
Twitter being the most popular content sharing platform is giving rise to automated accounts called “bots”. Majority of the users on Twitter are bots. Various machine learning (ML) algorithms are designed to detect bots avoiding the vulnerability constraints of ML-based models. This paper contributes to exploit vulnerabilities of machine learning (ML) algorithms through black-box attack. An adversarial text sequence misclassifies the results of deep learning (DL) classifiers for bot detection. Literature shows that ML models are vulnerable to attacks. The aim of this paper is to compromise the accuracy of ML-based bot detection algorithms by replacing original words in tweets with their synonyms. Our results show 7.2% decrease in the accuracy for bot tweets, therefore classifying bot tweets as legitimate tweets.
2020-08-28
Jafariakinabad, Fereshteh, Hua, Kien A..  2019.  Style-Aware Neural Model with Application in Authorship Attribution. 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). :325—328.

Writing style is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. In this paper, we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more to capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature.

2020-02-18
Tung Hoang, Xuan, Dung Bui, Ngoc.  2019.  An Enhanced Semantic-Based Cache Replacement Algorithm for Web Systems. 2019 IEEE-RIVF International Conference on Computing and Communication Technologies (RIVF). :1–6.

As Web traffics is increasing on the Internet, caching solutions for Web systems are becoming more important since they can greatly expand system scalability. An important part of a caching solution is cache replacement policy, which is responsible for selecting victim items that should be removed in order to make space for new objects. Typical replacement policies used in practice only take advantage of temporal reference locality by removing the least recently/frequently requested items from the cache. Although those policies work well in memory or filesystem cache, they are inefficient for Web systems since they do not exploit semantic relationship between Web items. This paper presents a semantic-aware caching policy that can be used in Web systems to enhance scalability. The proposed caching mechanism defines semantic distance from a web page to a set of pivot pages and use the semantic distances as a metric for choosing victims. Also, it use a function-based metric that combines access frequency and cache item size for tie-breaking. Our simulations show that out enhancements outperform traditional methods in terms of hit rate, which can be useful for websites with many small and similar-in-size web objects.

2020-02-10
Li, Meng, Wu, Bin, Wang, Yaning.  2019.  Comment Spam Detection via Effective Features Combination. ICC 2019 - 2019 IEEE International Conference on Communications (ICC). :1–6.

Comment spam is one of the great challenges faced by forum administrators. Detecting and blocking comment spam can relieve the load on servers, improve user experience and purify the network conditions. This paper focuses on the detection of comment spam. The behaviors of spammer and the content of spam were analyzed. According to analysis results, two types of effective features are extracted which can make a better description of spammer characteristics. Additionally, a gradient boosting tree algorithm was used to construct the comment spam detector based on the extracted features. Our proposed method is examined on a blog spam dataset which was published by previous research, and the result illustrates that our method performs better than the previous method on detection accuracy. Moreover, the CPU time is recorded to demonstrate that the time spent on both training and testing maintains a small value.

2018-03-19
Faust, C., Dozier, G., Xu, J., King, M. C..  2017.  Adversarial Authorship, Interactive Evolutionary Hill-Climbing, and Author CAAT-III. 2017 IEEE Symposium Series on Computational Intelligence (SSCI). :1–8.

We are currently witnessing the development of increasingly effective author identification systems (AISs) that have the potential to track users across the internet based on their writing style. In this paper, we discuss two methods for providing user anonymity with respect to writing style: Adversarial Stylometry and Adversarial Authorship. With Adversarial Stylometry, a user attempts to obfuscate their writing style by consciously altering it. With Adversarial Authorship, a user can select an author cluster target (ACT) and write toward this target with the intention of subverting an AIS so that the user's writing sample will be misclassified Our results show that Adversarial Authorship via interactive evolutionary hill-climbing outperforms Adversarial Stylometry.

2015-05-06
Xingbang Tian, Baohua Huang, Min Wu.  2014.  A transparent middleware for encrypting data in MongoDB. Electronics, Computer and Applications, 2014 IEEE Workshop on. :906-909.

Due to the development of cloud computing and NoSQL database, more and more sensitive information are stored in NoSQL databases, which exposes quite a lot security vulnerabilities. This paper discusses security features of MongoDB database and proposes a transparent middleware implementation. The analysis of experiment results show that this transparent middleware can efficiently encrypt sensitive data specified by users on a dataset level. Existing application systems do not need too many modifications in order to apply this middleware.

Goseva-Popstojanova, K., Dimitrijevikj, A..  2014.  Distinguishing between Web Attacks and Vulnerability Scans Based on Behavioral Characteristics. Advanced Information Networking and Applications Workshops (WAINA), 2014 28th International Conference on. :42-48.

The number of vulnerabilities and reported attacks on Web systems are showing increasing trends, which clearly illustrate the need for better understanding of malicious cyber activities. In this paper we use clustering to classify attacker activities aimed at Web systems. The empirical analysis is based on four datasets, each in duration of several months, collected by high-interaction honey pots. The results show that behavioral clustering analysis can be used to distinguish between attack sessions and vulnerability scan sessions. However, the performance heavily depends on the dataset. Furthermore, the results show that attacks differ from vulnerability scans in a small number of features (i.e., session characteristics). Specifically, for each dataset, the best feature selection method (in terms of the high probability of detection and low probability of false alarm) selects only three features and results into three to four clusters, significantly improving the performance of clustering compared to the case when all features are used. The best subset of features and the extent of the improvement, however, also depend on the dataset.

2015-05-04
Okuno, S., Asai, H., Yamana, H..  2014.  A challenge of authorship identification for ten-thousand-scale microblog users. Big Data (Big Data), 2014 IEEE International Conference on. :52-54.

Internet security issues require authorship identification for all kinds of internet contents; however, authorship identification for microblog users is much harder than other documents because microblog texts are too short. Moreover, when the number of candidates becomes large, i.e., big data, it will take long time to identify. Our proposed method solves these problems. The experimental results show that our method successfully identifies the authorship with 53.2% of precision out of 10,000 microblog users in the almost half execution time of previous method.
 

2015-04-30
Fonseca, J., Seixas, N., Vieira, M., Madeira, H..  2014.  Analysis of Field Data on Web Security Vulnerabilities. Dependable and Secure Computing, IEEE Transactions on. 11:89-100.

Most web applications have critical bugs (faults) affecting their security, which makes them vulnerable to attacks by hackers and organized crime. To prevent these security problems from occurring it is of utmost importance to understand the typical software faults. This paper contributes to this body of knowledge by presenting a field study on two of the most widely spread and critical web application vulnerabilities: SQL Injection and XSS. It analyzes the source code of security patches of widely used web applications written in weak and strong typed languages. Results show that only a small subset of software fault types, affecting a restricted collection of statements, is related to security. To understand how these vulnerabilities are really exploited by hackers, this paper also presents an analysis of the source code of the scripts used to attack them. The outcomes of this study can be used to train software developers and code inspectors in the detection of such faults and are also the foundation for the research of realistic vulnerability and attack injectors that can be used to assess security mechanisms, such as intrusion detection systems, vulnerability scanners, and static code analyzers.