Biblio
Social media has been one of the most efficacious and precise by speakers of public opinion. A strategy which sanctions the utilization and illustration of twitter data to conclude public conviction is discussed in this paper. Sentiments on exclusive entities with diverse strengths and intenseness are stated by public, where these sentiments are strenuously cognate to their personal mood and emotions. To examine the sentiments from natural language texts, addressing various opinions, a lot of methods and lexical resources have been propounded. A path for boosting twitter sentiment classification using various sentiment proportions as meta-level features has been proposed by this article. Analysis of tweets was done on the product iPhone 6.
In recent years, websites that incorporate user reviews, such as Amazon, IMDB and YELP, have become exceedingly popular. As an important factor affecting users purchasing behavior, review information has been becoming increasingly important, and accordingly, the reliability of review information becomes an important issue. This paper proposes a method to more accurately detect the appearance period of spam reviews and to identify the spam reviews by verifying the consistency of review information among multiple review sites. Evaluation experiments were conducted to show the accuracy of the detection results, and compared the newly proposed method with our previously proposed method.
Although Stylometry has been effectively used for Authorship Attribution, there is a growing number of methods being developed that allow authors to mask their identity [2, 13]. In this paper, we investigate the usage of non-traditional feature sets for Authorship Attribution. By using non-traditional feature sets, one may be able to reveal the identity of adversarial authors who are attempting to evade detection from Authorship Attribution systems that are based on more traditional feature sets. In addition, we demonstrate how GEFeS (Genetic & Evolutionary Feature Selection) can be used to evolve high-performance hybrid feature sets composed of two non-traditional feature sets for Authorship Attribution: LIWC (Linguistic Inquiry & Word Count) and Sentiment Analysis. These hybrids were able to reduce the Adversarial Effectiveness on a test set presented in [2] by approximately 33.4%.
Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to a black-box attack, which is a more realistic scenario. In this paper, we present a novel algorithm, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input. We develop novel scoring strategies to find the most important words to modify such that the deep classifier makes a wrong prediction. Simple character-level transformations are applied to the highest-ranked words in order to minimize the edit distance of the perturbation. We evaluated DeepWordBug on two real-world text datasets: Enron spam emails and IMDB movie reviews. Our experimental results indicate that DeepWordBug can reduce the classification accuracy from 99% to 40% on Enron and from 87% to 26% on IMDB. Our results strongly demonstrate that the generated adversarial sequences from a deep-learning model can similarly evade other deep models.
We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). The cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments, so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary and that one can often predict the goodness/badness of a comment even ignoring the question, based on the comment contents only. This leads us to the idea to build a good/bad polarity lexicon as an analogy to the positive/negative sentiment polarity lexicons, commonly used in sentiment analysis. In particular, we use pointwise mutual information in order to build large-scale goodness polarity lexicons in a semi-supervised manner starting with a small number of initial seeds. The evaluation results show an improvement of 0.7 MAP points absolute over a very strong baseline, and state-of-the art performance on SemEval-2016 Task 3.
We present a novel multimodal fusion model for affective content analysis, combining visual, audio and deep visual-sentiment descriptors from the media content with automated facial action measurements from naturalistic responses to the media. We collected a dataset of 48,867 facial responses to 384 media clips and extracted a rich feature set from the facial responses and media content. The stimulus videos were validated to be informative, inspiring, persuasive, sentimental or amusing. By combining the features, we were able to obtain a classification accuracy of 63% (weighted F1-score: 0.62) for a five-class task. This was a significant improvement over using the media content features alone. By analyzing the feature sets independently, we found that states of informed and persuaded were difficult to differentiate from facial responses alone due to the presence of similar sets of action units in each state (AU 2 occurring frequently in both cases). Facial actions were beneficial in differentiating between amused and informed states whereas media content features alone performed less well due to similarities in the visual and audio make up of the content. We highlight examples of content and reactions from each class. This is the first affective content analysis based on reactions of 10,000s of people.
The traditional text classification methods usually follow this process: first, a sentence can be considered as a bag of words (BOW), then transformed into sentence feature vector which can be classified by some methods, such as maximum entropy (ME), Naive Bayes (NB), support vector machines (SVM), and so on. However, when these methods are applied to text classification, we usually can not obtain an ideal result. The most important reason is that the semantic relations between words is very important for text categorization, however, the traditional method can not capture it. Sentiment classification, as a special case of text classification, is binary classification (positive or negative). Inspired by the sentiment analysis, we use a novel deep learning-based recurrent neural networks (RNNs)model for automatic security audit of short messages from prisons, which can classify short messages(secure and non-insecure). In this paper, the feature of short messages is extracted by word2vec which captures word order information, and each sentence is mapped to a feature vector. In particular, words with similar meaning are mapped to a similar position in the vector space, and then classified by RNNs. RNNs are now widely used and the network structure of RNNs determines that it can easily process the sequence data. We preprocess short messages, extract typical features from existing security and non-security short messages via word2vec, and classify short messages through RNNs which accept a fixed-sized vector as input and produce a fixed-sized vector as output. The experimental results show that the RNNs model achieves an average 92.7% accuracy which is higher than SVM.
Nowadays, sentiment analysis methods become more and more popular especially with the proliferation of social media platform users number. In the same context, this paper presents a sentiment analysis approach which can faithfully translate the sentimental orientation of Arabic Twitter posts, based on a novel data representation and machine learning techniques. The proposed approach applied a wide range of features: lexical, surface-form, syntactic, etc. We also made use of lexicon features inferred from two Arabic sentiment words lexicons. To build our supervised sentiment analysis system, we use several standard classification methods (Support Vector Machines, K-Nearest Neighbour, Naïve Bayes, Decision Trees, Random Forest) known by their effectiveness over such classification issues. In our study, Support Vector Machines classifier outperforms other supervised algorithms in Arabic Twitter sentiment analysis. Via an ablation experiments, we show the positive impact of lexicon based features on providing higher prediction performance.
Building trust among remote developers is challenging because trust typically grows through close face-to-face interaction. In this paper, we present the preparatory design of an empirical study aimed to assess whether affective trust, established through social communication between developers, is a predictor of successful collaboration in distributed projects. Specifically, we intend to measure affective trust through sentiment analysis of pull-request comments.
Information threatening the security of critical infrastructures are exchanged over the Internet through communication platforms, such as online discussion forums. This information can be used by malicious hackers to attack critical computer networks and data systems. Much of the literature on the hacking of critical infrastructure has focused on developing typologies of cyber-attacks, but has not examined the communication activities of the actors involved. To address this gap in the literature, the language of hackers was analyzed to identify potential threats against critical infrastructures using automated analysis tools. First, discussion posts were collected from a selected hacker forum using a customized web-crawler. Posts were analyzed using a parts of speech tagger, which helped determine a list of keywords used to query the data. Next, a sentiment analysis tool scored these keywords, which were then analyzed to determine the effectiveness of this method.
In this paper we propose a twitter sentiment analytics that mines for opinion polarity about a given topic. Most of current semantic sentiment analytics depends on polarity lexicons. However, many key tone words are frequently bipolar. In this paper we demonstrate a technique which can accommodate the bipolarity of tone words by context sensitive tone lexicon learning mechanism where the context is modeled by the semantic neighborhood of the main target. Performance analysis shows that ability to contextualize the tone word polarity significantly improves the accuracy.
An increasing number of people are using online social networking services (SNSs), and a significant amount of information related to experiences in consumption is shared in this new media form. Text mining is an emerging technique for mining useful information from the web. We aim at discovering in particular tweets semantic patterns in consumers' discussions on social media. Specifically, the purposes of this study are twofold: 1) finding similarity and dissimilarity between two sets of textual documents that include consumers' sentiment polarities, two forms of positive vs. negative opinions and 2) driving actual content from the textual data that has a semantic trend. The considered tweets include consumers' opinions on US retail companies (e.g., Amazon, Walmart). Cosine similarity and K-means clustering methods are used to achieve the former goal, and Latent Dirichlet Allocation (LDA), a popular topic modeling algorithm, is used for the latter purpose. This is the first study which discover semantic properties of textual data in consumption context beyond sentiment analysis. In addition to major findings, we apply LDA (Latent Dirichlet Allocations) to the same data and drew latent topics that represent consumers' positive opinions and negative opinions on social media.