Visible to the public Biblio

Filters: Keyword is Word2Vec  [Clear All Filters]
2021-08-05
Ramasubramanian, Muthukumaran, Muhammad, Hassan, Gurung, Iksha, Maskey, Manil, Ramachandran, Rahul.  2020.  ES2Vec: Earth Science Metadata Keyword Assignment using Domain-Specific Word Embeddings. 2020 SoutheastCon. :1—6.
Earth science metadata keyword assignment is a challenging problem. Dataset curators select appropriate keywords from the Global Change Master Directory (GCMD) set of keywords. The keywords are integral part of search and discovery of these datasets. Hence, selection of keywords are crucial in increasing the discoverability of datasets. Utilizing machine learning techniques, we provide users with automated keyword suggestions as an improved approach to complement manual selection. We trained a machine learning model that leverages the semantic embedding ability of Word2Vec models to process abstracts and suggest relevant keywords. A user interface tool we built to assist data curators in assignment of such keywords is also described.
2021-02-10
Lei, L., Chen, M., He, C., Li, D..  2020.  XSS Detection Technology Based on LSTM-Attention. 2020 5th International Conference on Control, Robotics and Cybernetics (CRC). :175—180.
Cross-site scripting (XSS) is one of the main threats of Web applications, which has great harm. How to effectively detect and defend against XSS attacks has become more and more important. Due to the malicious obfuscation of attack codes and the gradual increase in number, the traditional XSS detection methods have some defects such as poor recognition of malicious attack codes, inadequate feature extraction and low efficiency. Therefore, we present a novel approach to detect XSS attacks based on the attention mechanism of Long Short-Term Memory (LSTM) recurrent neural network. First of all, the data need to be preprocessed, we used decoding technology to restore the XSS codes to the unencoded state for improving the readability of the code, then we used word2vec to extract XSS payload features and map them to feature vectors. And then, we improved the LSTM model by adding attention mechanism, the LSTM-Attention detection model was designed to train and test the data. We used the ability of LSTM model to extract context-related features for deep learning, the added attention mechanism made the model extract more effective features. Finally, we used the classifier to classify the abstract features. Experimental results show that the proposed XSS detection model based on LSTM-Attention achieves a precision rate of 99.3% and a recall rate of 98.2% in the actually collected dataset. Compared with traditional machine learning methods and other deep learning methods, this method can more effectively identify XSS attacks.
2020-09-28
Akaishi, Sota, Uda, Ryuya.  2019.  Classification of XSS Attacks by Machine Learning with Frequency of Appearance and Co-occurrence. 2019 53rd Annual Conference on Information Sciences and Systems (CISS). :1–6.
Cross site scripting (XSS) attack is one of the attacks on the web. It brings session hijack with HTTP cookies, information collection with fake HTML input form and phishing with dummy sites. As a countermeasure of XSS attack, machine learning has attracted a lot of attention. There are existing researches in which SVM, Random Forest and SCW are used for the detection of the attack. However, in the researches, there are problems that the size of data set is too small or unbalanced, and that preprocessing method for vectorization of strings causes misclassification. The highest accuracy of the classification was 98% in existing researches. Therefore, in this paper, we improved the preprocessing method for vectorization by using word2vec to find the frequency of appearance and co-occurrence of the words in XSS attack scripts. Moreover, we also used a large data set to decrease the deviation of the data. Furthermore, we evaluated the classification results with two procedures. One is an inappropriate procedure which some researchers tend to select by mistake. The other is an appropriate procedure which can be applied to an attack detection filter in the real environment.
2020-05-18
Sel, Slhami, Hanbay, Davut.  2019.  E-Mail Classification Using Natural Language Processing. 2019 27th Signal Processing and Communications Applications Conference (SIU). :1–4.
Thanks to the rapid increase in technology and electronic communications, e-mail has become a serious communication tool. In many applications such as business correspondence, reminders, academic notices, web page memberships, e-mail is used as primary way of communication. If we ignore spam e-mails, there remain hundreds of e-mails received every day. In order to determine the importance of received e-mails, the subject or content of each e-mail must be checked. In this study we proposed an unsupervised system to classify received e-mails. Received e-mails' coordinates are determined by a method of natural language processing called as Word2Vec algorithm. According to the similarities, processed data are grouped by k-means algorithm with an unsupervised training model. In this study, 10517 e-mails were used in training. The success of the system is tested on a test group of 200 e-mails. In the test phase M3 model (window size 3, min. Word frequency 10, Gram skip) consolidated the highest success (91%). Obtained results are evaluated in section VI.
2019-06-10
Tran, T. K., Sato, H., Kubo, M..  2018.  One-Shot Learning Approach for Unknown Malware Classification. 2018 5th Asian Conference on Defense Technology (ACDT). :8-13.

Early detection of new kinds of malware always plays an important role in defending the network systems. Especially, if intelligent protection systems could themselves detect an existence of new malware types in their system, even with a very small number of malware samples, it must be a huge benefit for the organization as well as the social since it help preventing the spreading of that kind of malware. To deal with learning from few samples, term ``one-shot learning'' or ``fewshot learning'' was introduced, and mostly used in computer vision to recognize images, handwriting, etc. An approach introduced in this paper takes advantage of One-shot learning algorithms in solving the malware classification problem by using Memory Augmented Neural Network in combination with malware's API calls sequence, which is a very valuable source of information for identifying malware behavior. In addition, it also use some advantages of the development in Natural Language Processing field such as word2vec, etc. to convert those API sequences to numeric vectors before feeding to the one-shot learning network. The results confirm very good accuracies compared to the other traditional methods.

2019-02-14
Georgakopoulos, Spiros V., Tasoulis, Sotiris K., Vrahatis, Aristidis G., Plagianakos, Vassilis P..  2018.  Convolutional Neural Networks for Toxic Comment Classification. Proceedings of the 10th Hellenic Conference on Artificial Intelligence. :35:1-35:6.
Flood of information is produced in a daily basis through the global internet usage arising from the online interactive communications among users. While this situation contributes significantly to the quality of human life, unfortunately it involves enormous dangers, since online texts with high toxicity can cause personal attacks, online harassment and bullying behaviors. This has triggered both industrial and research community in the last few years while there are several attempts to identify an efficient model for online toxic comment prediction. However, these steps are still in their infancy and new approaches and frameworks are required. On parallel, the data explosion that appears constantly, makes the construction of new machine learning computational tools for managing this information, an imperative need. Thankfully advances in hardware, cloud computing and big data management allow the development of Deep Learning approaches appearing very promising performance so far. For text classification in particular the use of Convolutional Neural Networks (CNN) have recently been proposed approaching text analytics in a modern manner emphasizing in the structure of words in a document. In this work, we employ this approach to discover toxic comments in a large pool of documents provided by a current Kaggle's competition regarding Wikipedia's talk page edits. To justify this decision we choose to compare CNNs against the traditional bag-of-words approach for text analysis combined with a selection of algorithms proven to be very effective in text classification. The reported results provide enough evidence that CNN enhance toxic comment classification reinforcing research interest towards this direction.
2019-02-08
Fang, Yong, Li, Yang, Liu, Liang, Huang, Cheng.  2018.  DeepXSS: Cross Site Scripting Detection Based on Deep Learning. Proceedings of the 2018 International Conference on Computing and Artificial Intelligence. :47-51.

Nowadays, Cross Site Scripting (XSS) is one of the major threats to Web applications. Since it's known to the public, XSS vulnerability has been in the TOP 10 Web application vulnerabilities based on surveys published by the Open Web Applications Security Project (OWASP). How to effectively detect and defend XSS attacks are still one of the most important security issues. In this paper, we present a novel approach to detect XSS attacks based on deep learning (called DeepXSS). First of all, we used word2vec to extract the feature of XSS payloads which captures word order information and map each payload to a feature vector. And then, we trained and tested the detection model using Long Short Term Memory (LSTM) recurrent neural networks. Experimental results show that the proposed XSS detection model based on deep learning achieves a precision rate of 99.5% and a recall rate of 97.9% in real dataset, which means that the novel approach can effectively identify XSS attacks.

2018-03-19
Greenstein-Messica, Asnat, Rokach, Lior, Friedman, Michael.  2017.  Session-Based Recommendations Using Item Embedding. Proceedings of the 22Nd International Conference on Intelligent User Interfaces. :629–633.

Recent methods for learning vector space representations of words, word embedding, such as GloVe and Word2Vec have succeeded in capturing fine-grained semantic and syntactic regularities. We analyzed the effectiveness of these methods for e-commerce recommender systems by transferring the sequence of items generated by users' browsing journey in an e-commerce website into a sentence of words. We examined the prediction of fine-grained item similarity (such as item most similar to iPhone 6 64GB smart phone) and item analogy (such as iPhone 5 is to iPhone 6 as Samsung S5 is to Samsung S6) using real life users' browsing history of an online European department store. Our results reveal that such methods outperform related models such as singular value decomposition (SVD) with respect to item similarity and analogy tasks across different product categories. Furthermore, these methods produce a highly condensed item vector space representation, item embedding, with behavioral meaning sub-structure. These vectors can be used as features in a variety of recommender system applications. In particular, we used these vectors as features in a neural network based models for anonymous user recommendation based on session's first few clicks. It is found that recurrent neural network that preserves the order of user's clicks outperforms standard neural network, item-to-item similarity and SVD (recall@10 value of 42% based on first three clicks) for this task.

2017-11-20
You, L., Li, Y., Wang, Y., Zhang, J., Yang, Y..  2016.  A deep learning-based RNNs model for automatic security audit of short messages. 2016 16th International Symposium on Communications and Information Technologies (ISCIT). :225–229.

The traditional text classification methods usually follow this process: first, a sentence can be considered as a bag of words (BOW), then transformed into sentence feature vector which can be classified by some methods, such as maximum entropy (ME), Naive Bayes (NB), support vector machines (SVM), and so on. However, when these methods are applied to text classification, we usually can not obtain an ideal result. The most important reason is that the semantic relations between words is very important for text categorization, however, the traditional method can not capture it. Sentiment classification, as a special case of text classification, is binary classification (positive or negative). Inspired by the sentiment analysis, we use a novel deep learning-based recurrent neural networks (RNNs)model for automatic security audit of short messages from prisons, which can classify short messages(secure and non-insecure). In this paper, the feature of short messages is extracted by word2vec which captures word order information, and each sentence is mapped to a feature vector. In particular, words with similar meaning are mapped to a similar position in the vector space, and then classified by RNNs. RNNs are now widely used and the network structure of RNNs determines that it can easily process the sequence data. We preprocess short messages, extract typical features from existing security and non-security short messages via word2vec, and classify short messages through RNNs which accept a fixed-sized vector as input and produce a fixed-sized vector as output. The experimental results show that the RNNs model achieves an average 92.7% accuracy which is higher than SVM.

2017-05-18
Nguyen, Trong Duc, Nguyen, Anh Tuan, Nguyen, Tien N..  2016.  Mapping API Elements for Code Migration with Vector Representations. Proceedings of the 38th International Conference on Software Engineering Companion. :756–758.

Problem. Code migration between languages is challenging partly because different languages require developers to use different software libraries and frameworks. For example, in Java, Java Development Kit library (JDK) is a popular toolkit while .NET is the main framework used in C\# software development. Code migration requires not only the mappings between the language constructs (e.g., statements, expressions) but also the mappings among the APIs of the libraries/frameworks used in two languages. For example, in Java, to write to a file, one can use FileWriter.write of FileWriter, and in C\#, one can achieve the same function with StreamWriter.Write of StreamWriter. Such mapping is called API mapping.