Visible to the public Biblio

Filters: Keyword is word embedding  [Clear All Filters]
2023-09-18
Amer, Eslam, Samir, Adham, Mostafa, Hazem, Mohamed, Amer, Amin, Mohamed.  2022.  Malware Detection Approach Based on the Swarm-Based Behavioural Analysis over API Calling Sequence. 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC). :27—32.
The rapidly increasing malware threats must be coped with new effective malware detection methodologies. Current malware threats are not limited to daily personal transactions but dowelled deeply within large enterprises and organizations. This paper introduces a new methodology for detecting and discriminating malicious versus normal applications. In this paper, we employed Ant-colony optimization to generate two behavioural graphs that characterize the difference in the execution behavior between malware and normal applications. Our proposed approach relied on the API call sequence generated when an application is executed. We used the API calls as one of the most widely used malware dynamic analysis features. Our proposed method showed distinctive behavioral differences between malicious and non-malicious applications. Our experimental results showed a comparative performance compared to other machine learning methods. Therefore, we can employ our method as an efficient technique in capturing malicious applications.
2021-11-29
Piazza, Nancirose.  2020.  Classification Between Machine Translated Text and Original Text By Part Of Speech Tagging Representation. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). :739–740.
Classification between machine-translated text and original text are often tokenized on vocabulary of the corpi. With N-grams larger than uni-gram, one can create a model that estimates a decision boundary based on word frequency probability distribution; however, this approach is exponentially expensive because of high dimensionality and sparsity. Instead, we let samples of the corpi be represented by part-of-speech tagging which is significantly less vocabulary. With less trigram permutations, we can create a model with its tri-gram frequency probability distribution. In this paper, we explore less conventional ways of approaching techniques for handling documents, dictionaries, and the likes.
2021-02-22
Koda, S., Kambara, Y., Oikawa, T., Furukawa, K., Unno, Y., Murakami, M..  2020.  Anomalous IP Address Detection on Traffic Logs Using Novel Word Embedding. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). :1504–1509.
This paper presents an anomalous IP address detection algorithm for network traffic logs. It is based on word embedding techniques derived from natural language processing to extract the representative features of IP addresses. However, the features extracted from vanilla word embeddings are not always compatible with machine learning-based anomaly detection algorithms. Therefore, we developed an algorithm that enables the extraction of more compatible features of IP addresses for anomaly detection than conventional methods. The proposed algorithm optimizes the objective functions of word embedding-based feature extraction and anomaly detection, simultaneously. According to the experimental results, the proposed algorithm outperformed conventional approaches; it improved the detection performance from 0.876 to 0.990 in the area under the curve criterion in a task of detecting the IP addresses of attackers from network traffic logs.
2021-01-22
Burr, B., Wang, S., Salmon, G., Soliman, H..  2020.  On the Detection of Persistent Attacks using Alert Graphs and Event Feature Embeddings. NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium. :1—4.
Intrusion Detection Systems (IDS) generate a high volume of alerts that security analysts do not have the resources to explore fully. Modelling attacks, especially the coordinated campaigns of Advanced Persistent Threats (APTs), in a visually-interpretable way is a useful approach for network security. Graph models combine multiple alerts and are well suited for visualization and interpretation, increasing security effectiveness. In this paper, we use feature embeddings, learned from network event logs, and community detection to construct and segment alert graphs of related alerts and networks hosts. We posit that such graphs can aid security analysts in investigating alerts and may capture multiple aspects of an APT attack. The eventual goal of this approach is to construct interpretable attack graphs and extract causality information to identify coordinated attacks.
2020-05-18
Lee, Hyun-Young, Kang, Seung-Shik.  2019.  Word Embedding Method of SMS Messages for Spam Message Filtering. 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). :1–4.
SVM has been one of the most popular machine learning method for the binary classification such as sentiment analysis and spam message filtering. We explored a word embedding method for the construction of a feature vector and the deep learning method for the binary classification. CBOW is used as a word embedding technique and feedforward neural network is applied to classify SMS messages into ham or spam. The accuracy of the two classification methods of SVM and neural network are compared for the binary classification. The experimental result shows that the accuracy of deep learning method is better than the conventional machine learning method of SVM-light in the binary classification.
Chen, Long.  2019.  Assertion Detection in Clinical Natural Language Processing: A Knowledge-Poor Machine Learning Approach. 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT). :37–40.
Natural language processing (NLP) have been recently used to extract clinical information from free text in Electronic Health Record (EHR). In clinical NLP one challenge is that the meaning of clinical entities is heavily affected by assertion modifiers such as negation, uncertain, hypothetical, experiencer and so on. Incorrect assertion assignment could cause inaccurate diagnosis of patients' condition or negatively influence following study like disease modeling. Thus, clinical NLP systems which can detect assertion status of given target medical findings (e.g. disease, symptom) in clinical context are highly demanded. Here in this work, we propose a deep-learning system based on word embedding, RNN and attention mechanism (more specifically: Attention-based Bidirectional Long Short-Term Memory networks) for assertion detection in clinical notes. Unlike previous state-of-art methods which require knowledge input or feature engineering, our system is a knowledge poor machine learning system and can be easily extended or transferred to other domains. The evaluation of our system on public benchmarking corpora demonstrates that a knowledge poor deep-learning system can also achieve high performance for detecting negation and assertions comparing to state-of-the-art systems.
2019-03-28
McDermott, C. D., Petrovski, A. V., Majdani, F..  2018.  Towards Situational Awareness of Botnet Activity in the Internet of Things. 2018 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA). :1-8.
The following topics are dealt with: security of data; risk management; decision making; computer crime; invasive software; critical infrastructures; data privacy; insurance; Internet of Things; learning (artificial intelligence).
2019-01-16
Gao, J., Lanchantin, J., Soffa, M. L., Qi, Y..  2018.  Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers. 2018 IEEE Security and Privacy Workshops (SPW). :50–56.

Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to a black-box attack, which is a more realistic scenario. In this paper, we present a novel algorithm, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input. We develop novel scoring strategies to find the most important words to modify such that the deep classifier makes a wrong prediction. Simple character-level transformations are applied to the highest-ranked words in order to minimize the edit distance of the perturbation. We evaluated DeepWordBug on two real-world text datasets: Enron spam emails and IMDB movie reviews. Our experimental results indicate that DeepWordBug can reduce the classification accuracy from 99% to 40% on Enron and from 87% to 26% on IMDB. Our results strongly demonstrate that the generated adversarial sequences from a deep-learning model can similarly evade other deep models.

2018-03-19
Greenstein-Messica, Asnat, Rokach, Lior, Friedman, Michael.  2017.  Session-Based Recommendations Using Item Embedding. Proceedings of the 22Nd International Conference on Intelligent User Interfaces. :629–633.

Recent methods for learning vector space representations of words, word embedding, such as GloVe and Word2Vec have succeeded in capturing fine-grained semantic and syntactic regularities. We analyzed the effectiveness of these methods for e-commerce recommender systems by transferring the sequence of items generated by users' browsing journey in an e-commerce website into a sentence of words. We examined the prediction of fine-grained item similarity (such as item most similar to iPhone 6 64GB smart phone) and item analogy (such as iPhone 5 is to iPhone 6 as Samsung S5 is to Samsung S6) using real life users' browsing history of an online European department store. Our results reveal that such methods outperform related models such as singular value decomposition (SVD) with respect to item similarity and analogy tasks across different product categories. Furthermore, these methods produce a highly condensed item vector space representation, item embedding, with behavioral meaning sub-structure. These vectors can be used as features in a variety of recommender system applications. In particular, we used these vectors as features in a neural network based models for anonymous user recommendation based on session's first few clicks. It is found that recurrent neural network that preserves the order of user's clicks outperforms standard neural network, item-to-item similarity and SVD (recall@10 value of 42% based on first three clicks) for this task.

2018-01-10
Zheng, Y., Shi, Y., Guo, K., Li, W., Zhu, L..  2017.  Enhanced word embedding with multiple prototypes. 2017 4th International Conference on Industrial Economics System and Industrial Security Engineering (IEIS). :1–5.

Word representation is one of the basic word repressentation methods in natural language processing, which mapped a word into a dense real-valued vector space based on a hypothesis: words with similar context have similar meanings. Models like NNLM, C&W, CBOW, Skip-gram have been designed for word embeddings learning, and get widely used in many NLP tasks. However, these models assume that one word had only one semantics meaning which is contrary to the real language rules. In this paper we pro-pose a new word unit with multiple meanings and an algorithm to distinguish them by it's context. This new unit can be embedded in most language models and get series of efficient representations by learning variable embeddings. We evaluate a new model MCBOW that integrate CBOW with our word unit on word similarity evaluation task and some downstream experiments, the result indicated our new model can learn different meanings of a word and get a better result on some other tasks.