Biblio

Found 168 results

Filters: Keyword is natural language processing  [Clear All Filters]
2021-11-29
Gupta, Hritvik, Patel, Mayank.  2020.  Study of Extractive Text Summarizer Using The Elmo Embedding. 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :829–834.
In recent times, data excessiveness has become a major problem in the field of education, news, blogs, social media, etc. Due to an increase in such a vast amount of text data, it became challenging for a human to extract only the valuable amount of data in a concise form. In other words, summarizing the text, enables human to retrieves the relevant and useful texts, Text summarizing is extracting the data from the document and generating the short or concise text of the document. One of the major approaches that are used widely is Automatic Text summarizer. Automatic text summarizer analyzes the large textual data and summarizes it into the short summaries containing valuable information of the data. Automatic text summarizer further divided into two types 1) Extractive text summarizer, 2) Abstractive Text summarizer. In this article, the extractive text summarizer approach is being looked for. Extractive text summarization is the approach in which model generates the concise summary of the text by picking up the most relevant sentences from the text document. This paper focuses on retrieving the valuable amount of data using the Elmo embedding in Extractive text summarization. Elmo embedding is a contextual embedding that had been used previously by many researchers in abstractive text summarization techniques, but this paper focus on using it in extractive text summarizer.
2021-02-22
Alzahrani, A., Feki, J..  2020.  Toward a Natural Language-Based Approach for the Specification of Decisional-Users Requirements. 2020 3rd International Conference on Computer Applications Information Security (ICCAIS). :1–6.
The number of organizations adopting the Data Warehouse (DW) technology along with data analytics in order to improve the effectiveness of their decision-making processes is permanently increasing. Despite the efforts invested, the DW design remains a great challenge research domain. More accurately, the design quality of the DW depends on several aspects; among them, the requirement-gathering phase is a critical and complex task. In this context, we propose a Natural language (NL) NL-template based design approach, which is twofold; firstly, it facilitates the involvement of decision-makers in the early step of the DW design; indeed, using NL is a good and natural means to encourage the decision-makers to express their requirements as query-like English sentences. Secondly, our approach aims to generate a DW multidimensional schema from a set of gathered requirements (as OLAP: On-Line-Analytical-Processing queries, written according to the NL suggested templates). This approach articulates around: (i) two NL-templates for specifying multidimensional components, and (ii) a set of five heuristic rules for extracting the multidimensional concepts from requirements. Really, we are developing a software prototype that accepts the decision-makers' requirements then automatically identifies the multidimensional components of the DW model.
2021-05-18
Fidalgo, Ana, Medeiros, Ibéria, Antunes, Paulo, Neves, Nuno.  2020.  Towards a Deep Learning Model for Vulnerability Detection on Web Application Variants. 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). :465–476.
Reported vulnerabilities have grown significantly over the recent years, with SQL injection (SQLi) being one of the most prominent, especially in web applications. For these, such increase can be explained by the integration of multiple software parts (e.g., various plugins and modules), often developed by different organizations, composing thus web application variants. Machine Learning has the potential to be a great ally on finding vulnerabilities, aiding experts by reducing the search space or even by classifying programs on their own. However, previous work usually does not consider SQLi or utilizes techniques hard to scale. Moreover, there is a clear gap in vulnerability detection with machine learning for PHP, the most popular server-side language for web applications. This paper presents a Deep Learning model able to classify PHP slices as vulnerable (or not) to SQLi. As slices can belong to any variant, we propose the use of an intermediate language to represent the slices and interpret them as text, resorting to well-studied Natural Language Processing (NLP) techniques. Preliminary results of the use of the model show that it can discover SQLi, helping programmers and precluding attacks that would eventually cost a lot to repair.
2021-02-22
Martinelli, F., Marulli, F., Mercaldo, F., Marrone, S., Santone, A..  2020.  Enhanced Privacy and Data Protection using Natural Language Processing and Artificial Intelligence. 2020 International Joint Conference on Neural Networks (IJCNN). :1–8.

Artificial Intelligence systems have enabled significant benefits for users and society, but whilst the data for their feeding are always increasing, a side to privacy and security leaks is offered. The severe vulnerabilities to the right to privacy obliged governments to enact specific regulations to ensure privacy preservation in any kind of transaction involving sensitive information. In the case of digital and/or physical documents comprising sensitive information, the right to privacy can be preserved by data obfuscation procedures. The capability of recognizing sensitive information for obfuscation is typically entrusted to the experience of human experts, who are over-whelmed by the ever increasing amount of documents to process. Artificial intelligence could proficiently mitigate the effort of the human officers and speed up processes. Anyway, until enough knowledge won't be available in a machine readable format, automatic and effectively working systems can't be developed. In this work we propose a methodology for transferring and leveraging general knowledge across specific-domain tasks. We built, from scratch, specific-domain knowledge data sets, for training artificial intelligence models supporting human experts in privacy preserving tasks. We exploited a mixture of natural language processing techniques applied to unlabeled domain-specific documents corpora for automatically obtain labeled documents, where sensitive information are recognized and tagged. We performed preliminary tests just over 10.000 documents from the healthcare and justice domains. Human experts supported us during the validation. Results we obtained, estimated in terms of precision, recall and F1-score metrics across these two domains, were promising and encouraged us to further investigations.

2021-01-28
He, H. Y., Yang, Z. Guo, Chen, X. N..  2020.  PERT: Payload Encoding Representation from Transformer for Encrypted Traffic Classification. 2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K). :1—8.

Traffic identification becomes more important yet more challenging as related encryption techniques are rapidly developing nowadays. In difference to recent deep learning methods that apply image processing to solve such encrypted traffic problems, in this paper, we propose a method named Payload Encoding Representation from Transformer (PERT) to perform automatic traffic feature extraction using a state-of-the-art dynamic word embedding technique. Based on this, we further provide a traffic classification framework in which unlabeled traffic is utilized to pre-train an encoding network that learns the contextual distribution of traffic payload bytes. Then, the downward classification reuses the pre-trained network to obtain an enhanced classification result. By implementing experiments on a public encrypted traffic data set and our captured Android HTTPS traffic, we prove the proposed method can achieve an obvious better effectiveness than other compared baselines. To the best of our knowledge, this is the first time the encrypted traffic classification with the dynamic word embedding alone with its pre-training strategy has been addressed.

2021-02-08
Zhang, J..  2020.  DeepMal: A CNN-LSTM Model for Malware Detection Based on Dynamic Semantic Behaviours. 2020 International Conference on Computer Information and Big Data Applications (CIBDA). :313–316.
Malware refers to any software accessing or being installed in a system without the authorisation of administrators. Various malware has been widely used for cyber-criminals to accomplish their evil intentions and goals. To combat the increasing amount and reduce the threat of malicious programs, a novel deep learning framework, which uses NLP techniques for reference, combines CNN and LSTM neurones to capture the locally spatial correlations and learn from sequential longterm dependency is proposed. Hence, high-level abstractions and representations are automatically extracted for the malware classification task. The classification accuracy improves from 0.81 (best one by Random Forest) to approximately 1.0.
2021-02-22
Rivera, S., Fei, Z., Griffioen, J..  2020.  POLANCO: Enforcing Natural Language Network Policies. 2020 29th International Conference on Computer Communications and Networks (ICCCN). :1–9.
Network policies govern the use of an institution's networks, and are usually written in a high-level human-readable natural language. Normally these policies are enforced by low-level, technically detailed network configurations. The translation from network policies into network configurations is a tedious, manual and error-prone process. To address this issue, we propose a new intermediate language called POlicy LANguage for Campus Operations (POLANCO), which is a human-readable network policy definition language intended to approximate natural language. Because POLANCO is a high-level language, the translation from natural language policies to POLANCO is straightforward. Despite being a high-level human readable language, POLANCO can be used to express network policies in a technically precise way so that policies written in POLANCO can be automatically translated into a set of software defined networking (SDN) rules and actions that enforce the policies. Moreover, POLANCO is capable of incorporating information about the current network state, reacting to changes in the network and adjusting SDN rules to ensure network policies continue to be enforced correctly. We present policy examples found on various public university websites and show how they can be written as simplified human-readable statements using POLANCO and how they can be automatically translated into SDN rules that correctly enforce these policies.
2021-04-08
Bouzar-Benlabiod, L., Rubin, S. H., Belaidi, K., Haddar, N. E..  2020.  RNN-VED for Reducing False Positive Alerts in Host-based Anomaly Detection Systems. 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI). :17–24.
Host-based Intrusion Detection Systems HIDS are often based on anomaly detection. Several studies deal with anomaly detection by analyzing the system-call traces and get good detection rates but also a high rate off alse positives. In this paper, we propose a new anomaly detection approach applied on the system-call traces. The normal behavior learning is done using a Sequence to sequence model based on a Variational Encoder-Decoder (VED) architecture that integrates Recurrent Neural Networks (RNN) cells. We exploit the semantics behind the invoking order of system-calls that are then seen as sentences. A preprocessing phase is added to structure and optimize the model input-data representation. After the learning step, a one-class classification is run to categorize the sequences as normal or abnormal. The architecture may be used for predicting abnormal behaviors. The tests are achieved on the ADFA-LD dataset.
2021-06-24
Saletta, Martina, Ferretti, Claudio.  2020.  A Neural Embedding for Source Code: Security Analysis and CWE Lists. 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :523—530.
In this paper, we design a technique for mapping the source code into a vector space and we show its application in the recognition of security weaknesses. By applying ideas commonly used in Natural Language Processing, we train a model for producing an embedding of programs starting from their Abstract Syntax Trees. We then show how such embedding is able to infer clusters roughly separating different classes of software weaknesses. Even if the training of the embedding is unsupervised and made on a generic Java dataset, we show that the model can be used for supervised learning of specific classes of vulnerabilities, helping to capture some features distinguishing them in code. Finally, we discuss how our model performs over the different types of vulnerabilities categorized by the CWE initiative.
2021-05-18
Iorga, Denis, Corlătescu, Dragos, Grigorescu, Octavian, Săndescu, Cristian, Dascălu, Mihai, Rughiniş, Razvan.  2020.  Early Detection of Vulnerabilities from News Websites using Machine Learning Models. 2020 19th RoEduNet Conference: Networking in Education and Research (RoEduNet). :1–6.
The drawbacks of traditional methods of cybernetic vulnerability detection relate to the required time to identify new threats, to register them in the Common Vulnerabilities and Exposures (CVE) records, and to score them with the Common Vulnerabilities Scoring System (CVSS). These problems can be mitigated by early vulnerability detection systems relying on social media and open-source data. This paper presents a model that aims to identify emerging cybernetic vulnerabilities in cybersecurity news articles, as part of a system for automatic detection of early cybernetic threats using Open Source Intelligence (OSINT). Three machine learning models were trained on a novel dataset of 1000 labeled news articles to create a strong baseline for classifying cybersecurity articles as relevant (i.e., introducing new security threats), or irrelevant: Support Vector Machines, a Multinomial Naïve Bayes classifier, and a finetuned BERT model. The BERT model obtained the best performance with a mean accuracy of 88.45% on the test dataset. Our experiments support the conclusion that Natural Language Processing (NLP) models are an appropriate choice for early vulnerability detection systems in order to extract relevant information from cybersecurity news articles.
2021-02-22
Koda, S., Kambara, Y., Oikawa, T., Furukawa, K., Unno, Y., Murakami, M..  2020.  Anomalous IP Address Detection on Traffic Logs Using Novel Word Embedding. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). :1504–1509.
This paper presents an anomalous IP address detection algorithm for network traffic logs. It is based on word embedding techniques derived from natural language processing to extract the representative features of IP addresses. However, the features extracted from vanilla word embeddings are not always compatible with machine learning-based anomaly detection algorithms. Therefore, we developed an algorithm that enables the extraction of more compatible features of IP addresses for anomaly detection than conventional methods. The proposed algorithm optimizes the objective functions of word embedding-based feature extraction and anomaly detection, simultaneously. According to the experimental results, the proposed algorithm outperformed conventional approaches; it improved the detection performance from 0.876 to 0.990 in the area under the curve criterion in a task of detecting the IP addresses of attackers from network traffic logs.
2021-09-07
Vamsi, G Krishna, Rasool, Akhtar, Hajela, Gaurav.  2020.  Chatbot: A Deep Neural Network Based Human to Machine Conversation Model. 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT). :1–7.
A conversational agent (chatbot) is computer software capable of communicating with humans using natural language processing. The crucial part of building any chatbot is the development of conversation. Despite many developments in Natural Language Processing (NLP) and Artificial Intelligence (AI), creating a good chatbot model remains a significant challenge in this field even today. A conversational bot can be used for countless errands. In general, they need to understand the user's intent and deliver appropriate replies. This is a software program of a conversational interface that allows a user to converse in the same manner one would address a human. Hence, these are used in almost every customer communication platform, like social networks. At present, there are two basic models used in developing a chatbot. Generative based models and Retrieval based models. The recent advancements in deep learning and artificial intelligence, such as the end-to-end trainable neural networks have rapidly replaced earlier methods based on hand-written instructions and patterns or statistical methods. This paper proposes a new method of creating a chatbot using a deep neural learning method. In this method, a neural network with multiple layers is built to learn and process the data.
2021-02-01
Wu, L., Chen, X., Meng, L., Meng, X..  2020.  Multitask Adversarial Learning for Chinese Font Style Transfer. 2020 International Joint Conference on Neural Networks (IJCNN). :1–8.
Style transfer between Chinese fonts is challenging due to both the complexity of Chinese characters and the significant difference between fonts. Existing algorithms for this task typically learn a mapping between the reference and target fonts for each character. Subsequently, this mapping is used to generate the characters that do not exist in the target font. However, the characters available for training are unlikely to cover all fine-grained parts of the missing characters, leading to the overfitting problem. As a result, the generated characters of the target font may suffer problems of incomplete or even radicals and dirty dots. To address this problem, this paper presents a multi-task adversarial learning approach, termed MTfontGAN, to generate more vivid Chinese characters. MTfontGAN learns to transfer a reference font to multiple target ones simultaneously. An alignment is imposed on the encoders of different tasks to make them focus on the important parts of the characters in general style transfer. Such cross-task interactions at the feature level effectively improve the generalization capability of MTfontGAN. The performance of MTfontGAN is evaluated on three Chinese font datasets. Experimental results show that MTfontGAN outperforms the state-of-the-art algorithms in a single-task setting. More importantly, increasing the number of tasks leads to better performance in all of them.
2021-03-18
Bi, X., Liu, X..  2020.  Chinese Character Captcha Sequential Selection System Based on Convolutional Neural Network. 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL). :554—559.

To ensure security, Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is widely used in people's online lives. This paper presents a Chinese character captcha sequential selection system based on convolutional neural network (CNN). Captchas composed of English and digits can already be identified with extremely high accuracy, but Chinese character captcha recognition is still challenging. The task we need to complete is to identify Chinese characters with different colors and different fonts that are not on a straight line with rotation and affine transformation on pictures with complex backgrounds, and then perform word order restoration on the identified Chinese characters. We divide the task into several sub-processes: Chinese character detection based on Faster R-CNN, Chinese character recognition and word order recovery based on N-Gram. In the Chinese character recognition sub-process, we have made outstanding contributions. We constructed a single Chinese character data set and built a 10-layer convolutional neural network. Eventually we achieved an accuracy of 98.43%, and completed the task perfectly.

2021-02-22
Hirlekar, V. V., Kumar, A..  2020.  Natural Language Processing based Online Fake News Detection Challenges – A Detailed Review. 2020 5th International Conference on Communication and Electronics Systems (ICCES). :748–754.
Online social media plays an important role during real world events such as natural calamities, elections, social movements etc. Since the social media usage has increased, fake news has grown. The social media is often used by modifying true news or creating fake news to spread misinformation. The creation and distribution of fake news poses major threats in several respects from a national security point of view. Hence Fake news identification becomes an essential goal for enhancing the trustworthiness of the information shared on online social network. Over the period of time many researcher has used different methods, algorithms, tools and techniques to identify fake news content from online social networks. The aim of this paper is to review and examine these methodologies, different tools, browser extensions and analyze the degree of output in question. In addition, this paper discuss the general approach of fake news detection as well as taxonomy of feature extraction which plays an important role to achieve maximum accuracy with the help of different Machine Learning and Natural Language Processing algorithms.
2021-06-01
Ming, Kun.  2020.  Chinese Coreference Resolution via Bidirectional LSTMs using Word and Token Level Representations. 2020 16th International Conference on Computational Intelligence and Security (CIS). :73–76.
Coreference resolution is an important task in the field of natural language processing. Most existing methods usually utilize word-level representations, ignoring massive information from the texts. To address this issue, we investigate how to improve Chinese coreference resolution by using span-level semantic representations. Specifically, we propose a model which acquires word and character representations through pre-trained Skip-Gram embeddings and pre-trained BERT, then explicitly leverages span-level information by performing bidirectional LSTMs among above representations. Experiments on CoNLL-2012 shared task have demonstrated that the proposed model achieves 62.95% F1-score, outperforming our baseline methods.
2021-11-29
Nazemi, Kawa, Klepsch, Maike J., Burkhardt, Dirk, Kaupp, Lukas.  2020.  Comparison of Full-Text Articles and Abstracts for Visual Trend Analytics through Natural Language Processing. 2020 24th International Conference Information Visualisation (IV). :360–367.
Scientific publications are an essential resource for detecting emerging trends and innovations in a very early stage, by far earlier than patents may allow. Thereby Visual Analytics systems enable a deep analysis by applying commonly unsupervised machine learning methods and investigating a mass amount of data. A main question from the Visual Analytics viewpoint in this context is, do abstracts of scientific publications provide a similar analysis capability compared to their corresponding full-texts? This would allow to extract a mass amount of text documents in a much faster manner. We compare in this paper the topic extraction methods LSI and LDA by using full text articles and their corresponding abstracts to obtain which method and which data are better suited for a Visual Analytics system for Technology and Corporate Foresight. Based on a easy replicable natural language processing approach, we further investigate the impact of lemmatization for LDA and LSI. The comparison will be performed qualitative and quantitative to gather both, the human perception in visual systems and coherence values. Based on an application scenario a visual trend analytics system illustrates the outcomes.
2021-06-24
Stöckle, Patrick, Grobauer, Bernd, Pretschner, Alexander.  2020.  Automated Implementation of Windows-related Security-Configuration Guides. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). :598—610.
Hardening is the process of configuring IT systems to ensure the security of the systems' components and data they process or store. The complexity of contemporary IT infrastructures, however, renders manual security hardening and maintenance a daunting task. In many organizations, security-configuration guides expressed in the SCAP (Security Content Automation Protocol) are used as a basis for hardening, but these guides by themselves provide no means for automatically implementing the required configurations. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides. In many organizations, security-configuration guides expressed in the SCAP (Security Content Automation Protocol) are used as a basis for hardening, but these guides by themselves provide no means for automatically implementing the required configurations. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides.
2020-05-18
Sharma, Sarika, Kumar, Deepak.  2019.  Agile Release Planning Using Natural Language Processing Algorithm. 2019 Amity International Conference on Artificial Intelligence (AICAI). :934–938.
Once the requirement is gathered in agile, it is broken down into smaller pre-defined format called user stories. These user stories are then scoped in various sprint releases and delivered accordingly. Release planning in Agile becomes challenging when the number of user stories goes up in hundreds. In such scenarios it is very difficult to manually identify similar user stories and package them together into a release. Hence, this paper suggests application of natural language processing algorithms for identifying similar user stories and then scoping them into a release This paper takes the approach to build a word corpus for every project release identified in the project and then to convert the provided user stories into a vector of string using Java utility for calculating top 3 most occurring words from the given project corpus in a user story. Once all the user stories are represented as vector array then by using RV coefficient NLP algorithm the user stories are clustered into various releases of the software project. Using the proposed approach, the release planning for large and complex software engineering projects can be simplified resulting into efficient planning in less time. The automated commercial tools like JIRA and Rally can be enhanced to include suggested algorithms for managing release planning in Agile.
Chen, Long.  2019.  Assertion Detection in Clinical Natural Language Processing: A Knowledge-Poor Machine Learning Approach. 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT). :37–40.
Natural language processing (NLP) have been recently used to extract clinical information from free text in Electronic Health Record (EHR). In clinical NLP one challenge is that the meaning of clinical entities is heavily affected by assertion modifiers such as negation, uncertain, hypothetical, experiencer and so on. Incorrect assertion assignment could cause inaccurate diagnosis of patients' condition or negatively influence following study like disease modeling. Thus, clinical NLP systems which can detect assertion status of given target medical findings (e.g. disease, symptom) in clinical context are highly demanded. Here in this work, we propose a deep-learning system based on word embedding, RNN and attention mechanism (more specifically: Attention-based Bidirectional Long Short-Term Memory networks) for assertion detection in clinical notes. Unlike previous state-of-art methods which require knowledge input or feature engineering, our system is a knowledge poor machine learning system and can be easily extended or transferred to other domains. The evaluation of our system on public benchmarking corpora demonstrates that a knowledge poor deep-learning system can also achieve high performance for detecting negation and assertions comparing to state-of-the-art systems.
Thejaswini, S, Indupriya, C.  2019.  Big Data Security Issues and Natural Language Processing. 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI). :1307–1312.
Whenever we talk about big data, the concern is always about the security of the data. In recent days the most heard about technology is the Natural Language Processing. This new and trending technology helps in solving the ever ending security problems which are not completely solved using big data. Starting with the big data security issues, this paper deals with addressing the topics related to cyber security and information security using the Natural Language Processing technology. Including the well-known cyber-attacks such as phishing identification and spam detection, this paper also addresses issues on information assurance and security such as detection of Advanced Persistent Threat (APT) in DNS and vulnerability analysis. The goal of this paper is to provide the overview of how natural language processing can be used to address cyber security issues.
2020-03-23
Xu, Yilin, Ge, Weimin, Li, Xiaohong, Feng, Zhiyong, Xie, Xiaofei, Bai, Yude.  2019.  A Co-Occurrence Recommendation Model of Software Security Requirement. 2019 International Symposium on Theoretical Aspects of Software Engineering (TASE). :41–48.
To guarantee the quality of software, specifying security requirements (SRs) is essential for developing systems, especially for security-critical software systems. However, using security threat to determine detailed SR is quite difficult according to Common Criteria (CC), which is too confusing and technical for non-security specialists. In this paper, we propose a Co-occurrence Recommend Model (CoRM) to automatically recommend software SRs. In this model, the security threats of product are extracted from security target documents of software, in which the related security requirements are tagged. In order to establish relationships between software security threat and security requirement, semantic similarities between different security threat is calculated by Skip-thoughts Model. To evaluate our CoRM model, over 1000 security target documents of 9 types software products are exploited. The results suggest that building a CoRM model via semantic similarity is feasible and reliable.
2020-05-18
Nambiar, Sindhya K, Leons, Antony, Jose, Soniya, Arunsree.  2019.  Natural Language Processing Based Part of Speech Tagger using Hidden Markov Model. 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :782–785.
In various natural language processing applications, PART-OF-SPEECH (POS) tagging is performed as a preprocessing step. For making POS tagging accurate, various techniques have been explored. But in Indian languages, not much work has been done. This paper describes the methods to build a Part of speech tagger by using hidden markov model. Supervised learning approach is implemented in which, already tagged sentences in malayalam is used to build hidden markov model.
2020-07-06
Chai, Yadeng, Liu, Yong.  2019.  Natural Spoken Instructions Understanding for Robot with Dependency Parsing. 2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER). :866–871.
This paper presents a method based on syntactic information, which can be used for intent determination and slot filling tasks in a spoken language understanding system including the spoken instructions understanding module for robot. Some studies in recent years attempt to solve the problem of spoken language understanding via syntactic information. This research is a further extension of these approaches which is based on dependency parsing. In this model, the input for neural network are vectors generated by a dependency parsing tree, which we called window vector. This vector contains dependency features that improves performance of the syntactic-based model. The model has been evaluated on the benchmark ATIS task, and the results show that it outperforms many other syntactic-based approaches, especially in terms of slot filling, it has a performance level on par with some state of the art deep learning algorithms in recent years. Also, the model has been evaluated on FBM3, a dataset of the RoCKIn@Home competition. The overall rate of correctly understanding the instructions for robot is quite good but still not acceptable in practical use, which is caused by the small scale of FBM3.
2020-05-18
Bakhtin, Vadim V., Isaeva, Ekaterina V..  2019.  New TSBuilder: Shifting towards Cognition. 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). :179–181.
The paper reviews a project on the automation of term system construction. TSBuilder (Term System Builder) was developed in 2014 as a multilayer Rosenblatt's perceptron for supervised machine learning, namely 1-3 word terms identification in natural language texts and their rigid categorization. The program is being modified to reduce the rigidity of categorization which will bring text mining more in line with human thinking.We are expanding the range of parameters (semantical, morphological, and syntactical) for categorization, removing the restriction of the term length of three words, using convolution on a continuous sequence of terms, and present the probabilities of a term falling into different categories. The neural network will not assign a single category to a term but give N answers (where N is the number of predefined classes), each of which O ∈ [0, 1] is the probability of the term to belong to a given class.