Visible to the public Biblio

Filters: Keyword is Doc2Vec  [Clear All Filters]
2022-03-10
Ozan, Şükrü, Taşar, D. Emre.  2021.  Auto-tagging of Short Conversational Sentences using Natural Language Processing Methods. 2021 29th Signal Processing and Communications Applications Conference (SIU). :1—4.
In this study, we aim to find a method to autotag sentences specific to a domain. Our training data comprises short conversational sentences extracted from chat conversations between company's customer representatives and web site visitors. We manually tagged approximately 14 thousand visitor inputs into ten basic categories, which will later be used in a transformer-based language model with attention mechanisms for the ultimate goal of developing a chatbot application that can produce meaningful dialogue.We considered three different stateof- the-art models and reported their auto-tagging capabilities. We achieved the best performance with the bidirectional encoder representation from transformers (BERT) model. Implementation of the models used in these experiments can be cloned from our GitHub repository and tested for similar auto-tagging problems without much effort.
2021-01-15
Kadoguchi, M., Kobayashi, H., Hayashi, S., Otsuka, A., Hashimoto, M..  2020.  Deep Self-Supervised Clustering of the Dark Web for Cyber Threat Intelligence. 2020 IEEE International Conference on Intelligence and Security Informatics (ISI). :1—6.

In recent years, cyberattack techniques have become more and more sophisticated each day. Even if defense measures are taken against cyberattacks, it is difficult to prevent them completely. It can also be said that people can only fight defensively against cyber criminals. To address this situation, it is necessary to predict cyberattacks and take appropriate measures in advance, and the use of intelligence is important to make this possible. In general, many malicious hackers share information and tools that can be used for attacks on the dark web or in the specific communities. Therefore, we assume that a lot of intelligence, including this illegal content exists in cyber space. By using the threat intelligence, detecting attacks in advance and developing active defense is expected these days. However, such intelligence is currently extracted manually. In order to do this more efficiently, we apply machine learning to various forum posts that exist on the dark web, with the aim of extracting forum posts containing threat information. By doing this, we expect that detecting threat information in cyber space in a timely manner will be possible so that the optimal preventive measures will be taken in advance.

2020-01-28
KADOGUCHI, Masashi, HAYASHI, Shota, HASHIMOTO, Masaki, OTSUKA, Akira.  2019.  Exploring the Dark Web for Cyber Threat Intelligence Using Machine Leaning. 2019 IEEE International Conference on Intelligence and Security Informatics (ISI). :200–202.

In recent years, cyber attack techniques are increasingly sophisticated, and blocking the attack is more and more difficult, even if a kind of counter measure or another is taken. In order for a successful handling of this situation, it is crucial to have a prediction of cyber attacks, appropriate precautions, and effective utilization of cyber intelligence that enables these actions. Malicious hackers share various kinds of information through particular communities such as the dark web, indicating that a great deal of intelligence exists in cyberspace. This paper focuses on forums on the dark web and proposes an approach to extract forums which include important information or intelligence from huge amounts of forums and identify traits of each forum using methodologies such as machine learning, natural language processing and so on. This approach will allow us to grasp the emerging threats in cyberspace and take appropriate measures against malicious activities.

2018-12-10
Ndichu, S., Ozawa, S., Misu, T., Okada, K..  2018.  A Machine Learning Approach to Malicious JavaScript Detection using Fixed Length Vector Representation. 2018 International Joint Conference on Neural Networks (IJCNN). :1–8.

To add more functionality and enhance usability of web applications, JavaScript (JS) is frequently used. Even with many advantages and usefulness of JS, an annoying fact is that many recent cyberattacks such as drive-by-download attacks exploit vulnerability of JS codes. In general, malicious JS codes are not easy to detect, because they sneakily exploit vulnerabilities of browsers and plugin software, and attack visitors of a web site unknowingly. To protect users from such threads, the development of an accurate detection system for malicious JS is soliciting. Conventional approaches often employ signature and heuristic-based methods, which are prone to suffer from zero-day attacks, i.e., causing many false negatives and/or false positives. For this problem, this paper adopts a machine-learning approach to feature learning called Doc2Vec, which is a neural network model that can learn context information of texts. The extracted features are given to a classifier model (e.g., SVM and neural networks) and it judges the maliciousness of a JS code. In the performance evaluation, we use the D3M Dataset (Drive-by-Download Data by Marionette) for malicious JS codes and JSUPACK for benign ones for both training and test purposes. We then compare the performance to other feature learning methods. Our experimental results show that the proposed Doc2Vec features provide better accuracy and fast classification in malicious JS code detection compared to conventional approaches.