Biblio

List
Filter

Found 165 results

Filters: Keyword is natural language processing [Clear All Filters]

2022-04-19

Srinivasan, Sudarshan, Begoli, Edmon, Mahbub, Maria, Knight, Kathryn. 2021. Nomen Est Omen - The Role of Signatures in Ascribing Email Author Identity with Transformer Neural Networks. 2021 IEEE Security and Privacy Workshops (SPW). :291–297.

Authorship attribution, an NLP problem where anonymous text is matched to its author, has important, cross-disciplinary applications, particularly those concerning cyber-defense. Our research examines the degree of sensitivity that attention-based models have to adversarial perturbations. We ask, what is the minimal amount of change necessary to maximally confuse a transformer model? In our investigation we examine a balanced subset of emails from the Enron email dataset, calculating the performance of our model before and after email signatures have been perturbed. Results show that the model's performance changed significantly in the absence of a signature, indicating the importance of email signatures in email authorship detection. Furthermore, we show that these models rely on signatures for shorter emails much more than for longer emails. We also indicate that additional research is necessary to investigate stylometric features and adversarial training to further improve classification model robustness.

2022-03-10

Pölöskei, István. 2021. Continuous natural language processing pipeline strategy. 2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI). :000221—000224.

Natural language processing (NLP) is a division of artificial intelligence. The constructed model's quality is entirely reliant on the training dataset's quality. A data streaming pipeline is an adhesive application, completing a managed connection from data sources to machine learning methods. The recommended NLP pipeline composition has well-defined procedures. The implemented message broker design is a usual apparatus for delivering events. It makes it achievable to construct a robust training dataset for machine learning use-case and serve the model's input. The reconstructed dataset is a valid input for the machine learning processes. Based on the data pipeline's product, the model recreation and redeployment can be scheduled automatically.

Sanyal, Hrithik, Shukla, Sagar, Agrawal, Rajneesh. 2021. Natural Language Processing Technique for Generation of SQL Queries Dynamically. 2021 6th International Conference for Convergence in Technology (I2CT). :1—6.

Natural Language Processing is being used in every field of human to machine interaction. Database queries although have a confined set of instructions, but still found to be complex and dedicated human resources are required to write, test, optimize and execute structured query language statements. This makes it difficult, time-consuming and many a time inaccurate too. Such difficulties can be overcome if the queries are formed dynamically with standard procedures. In this work, parsing, lexical analysis, synonym detection and formation processes of the natural language processing are being proposed to be used for dynamically generating SQL queries and optimization of them for fast processing with high accuracy. NLP parsing of the user inputted text for retrieving, creation and insertion of data are being proposed to be created dynamically from English text inputs. This will help users of the system to generate reports from the data as per the requirement without the complexities of SQL. The proposed system will not only generate queries dynamically but will also provide high accuracy and performance.

Ozan, Şükrü, Taşar, D. Emre. 2021. Auto-tagging of Short Conversational Sentences using Natural Language Processing Methods. 2021 29th Signal Processing and Communications Applications Conference (SIU). :1—4.

In this study, we aim to find a method to autotag sentences specific to a domain. Our training data comprises short conversational sentences extracted from chat conversations between company's customer representatives and web site visitors. We manually tagged approximately 14 thousand visitor inputs into ten basic categories, which will later be used in a transformer-based language model with attention mechanisms for the ultimate goal of developing a chatbot application that can produce meaningful dialogue.We considered three different stateof- the-art models and reported their auto-tagging capabilities. We achieved the best performance with the bidirectional encoder representation from transformers (BERT) model. Implementation of the models used in these experiments can be cloned from our GitHub repository and tested for similar auto-tagging problems without much effort.

Qin, Shuangling, Xu, Chaozhi, Zhang, Fang, Jiang, Tao, Ge, Wei, Li, Jihong. 2021. Research on Application of Chinese Natural Language Processing in Constructing Knowledge Graph of Chronic Diseases. 2021 International Conference on Communications, Information System and Computer Engineering (CISCE). :271—274.

Knowledge Graph can describe the concepts in the objective world and the relationships between these concepts in a structured way, and identify, discover and infer the relationships between things and concepts. It has been developed in the field of medical and health care. In this paper, the method of natural language processing has been used to build chronic disease knowledge graph, such as named entity recognition, relationship extraction. This method is beneficial to forecast analysis of chronic disease, network monitoring, basic education, etc. The research of this paper can greatly help medical experts in the treatment of chronic disease treatment, and assist primary clinicians with making more scientific decision, and can help Patients with chronic diseases to improve medical efficiency. In the end, it also has practical significance for clinical scientific research of chronic disease.

Gupta, Subhash Chand, Singh, Nidhi Raj, Sharma, Tulsi, Tyagi, Akshita, Majumdar, Rana. 2021. Generating Image Captions using Deep Learning and Natural Language Processing. 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO). :1—4.

In today's world, there is rapid progress in the field of artificial intelligence and image captioning. It becomes a fascinating task that has saw widespread interest. The task of image captioning comprises image description engendered based on the hybrid combination of deep learning, natural language processing, and various approaches of machine learning and computer vision. In this work authors emphasize on how the model generates a short description as an output of the input image using the functionalities of Deep Learning and Natural Language Processing, for helping visually impaired people, and can also be cast-off in various web sites to automate the generation of captions reducing the task of recitation with great ease.

Zhang, Zhongtang, Liu, Shengli, Yang, Qichao, Guo, Shichen. 2021. Semantic Understanding of Source and Binary Code based on Natural Language Processing. 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). 4:2010—2016.

With the development of open source projects, a large number of open source codes will be reused in binary software, and bugs in source codes will also be introduced into binary codes. In order to detect the reused open source codes in binary codes, it is sometimes necessary to compare and analyze the similarity between source codes and binary codes. One of the main challenge is that the compilation process can generate different binary code representations for the same source code, such as different compiler versions, compilation optimization options and target architectures, which greatly increases the difficulty of semantic similarity detection between source code and binary code. In order to solve the influence of the compilation process on the comparison of semantic similarity of codes, this paper transforms the source code and binary code into LLVM intermediate representation (LLVM IR), which is a universal intermediate representation independent of source code and binary code. We carry out semantic feature extraction and embedding training on LLVM IR based on natural language processing model. Experimental results show that LLVM IR eliminates the influence of compilation on the syntax differences between source code and binary code, and the semantic features of code are well represented and preserved.

Ge, Xin. 2021. Internet of things device recognition method based on natural language processing and text similarity. 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE). :137—140.

Effective identification of Internet of things devices in cyberspace is of great significance to the protection of Cyberspace Security. However, there are a large number of such devices in cyberspace, which can not be identified by the existing methods of identifying IoT devices because of the lack of key information such as manufacturer name and device name in the response message. Their existence brings hidden danger to Cyberspace Security. In order to identify the IoT devices with missing key information in these response messages, this paper proposes an IoT device identification method, IoTCatcher. IoTCatcher uses HTTP response message and the structure and style characteristics of HTML document, and based on natural language processing technology and text similarity technology, classifies and compares the IoT devices whose response message lacks key information, so as to generate their device finger information. This paper proves that the recognition precision of IoTCatcher is 95.29%, and the recall rate is 91.01%. Compared with the existing methods, the overall performance is improved by 38.83%.

Yang, Mengde. 2021. A Survey on Few-Shot Learning in Natural Language Processing. 2021 International Conference on Artificial Intelligence and Electromechanical Automation (AIEA). :294—297.

The annotated dataset is the foundation for Supervised Natural Language Processing. However, the cost of obtaining dataset is high. In recent years, the Few-Shot Learning has gradually attracted the attention of researchers. From the definition, in this paper, we conclude the difference in Few-Shot Learning between Natural Language Processing and Computer Vision. On that basis, the current Few-Shot Learning on Natural Language Processing is summarized, including Transfer Learning, Meta Learning and Knowledge Distillation. Furthermore, we conclude the solutions to Few-Shot Learning in Natural Language Processing, such as the method based on Distant Supervision, Meta Learning and Knowledge Distillation. Finally, we present the challenges facing Few-Shot Learning in Natural Language Processing.

Ahirrao, Mayur, Joshi, Yash, Gandhe, Atharva, Kotgire, Sumeet, Deshmukh, Rohini G.. 2021. Phrase Composing Tool using Natural Language Processing. 2021 International Conference on Intelligent Technologies (CONIT). :1—4.

In this fast-running world, machine communication plays a vital role. To compete with this world, human-machine interaction is a necessary thing. To enhance this, Natural Language Processing technique is used widely. Using this technique, we can reduce the interaction gap between the machine and human. Till now, many such applications are developed which are using this technique.This tool deals with the various methods which are used for development of grammar error correction. These methods include rule-based method, classifier-based method and machine translation-based method. Also, models regarding the Natural Language Processing (NLP) pipeline are trained and implemented in this project accordingly. Additionally, the tool can also perform speech to text operation.

2022-02-24

Duan, Xuanyu, Ge, Mengmeng, Minh Le, Triet Huynh, Ullah, Faheem, Gao, Shang, Lu, Xuequan, Babar, M. Ali. 2021. Automated Security Assessment for the Internet of Things. 2021 IEEE 26th Pacific Rim International Symposium on Dependable Computing (PRDC). :47–56.

Internet of Things (IoT) based applications face an increasing number of potential security risks, which need to be systematically assessed and addressed. Expert-based manual assessment of IoT security is a predominant approach, which is usually inefficient. To address this problem, we propose an automated security assessment framework for IoT networks. Our framework first leverages machine learning and natural language processing to analyze vulnerability descriptions for predicting vulnerability metrics. The predicted metrics are then input into a two-layered graphical security model, which consists of an attack graph at the upper layer to present the network connectivity and an attack tree for each node in the network at the bottom layer to depict the vulnerability information. This security model automatically assesses the security of the IoT network by capturing potential attack paths. We evaluate the viability of our approach using a proof-of-concept smart building system model which contains a variety of real-world IoT devices and poten-tial vulnerabilities. Our evaluation of the proposed framework demonstrates its effectiveness in terms of automatically predicting the vulnerability metrics of new vulnerabilities with more than 90% accuracy, on average, and identifying the most vulnerable attack paths within an IoT network. The produced assessment results can serve as a guideline for cybersecurity professionals to take further actions and mitigate risks in a timely manner.

Paudel, Upakar, Dolan, Andy, Majumdar, Suryadipta, Ray, Indrakshi. 2021. Context-Aware IoT Device Functionality Extraction from Specifications for Ensuring Consumer Security. 2021 IEEE Conference on Communications and Network Security (CNS). :155–163.

Internet of Thing (IoT) devices are being widely used in smart homes and organizations. An IoT device has some intended purposes, but may also have hidden functionalities. Typically, the device is installed in a home or an organization and the network traffic associated with the device is captured and analyzed to infer high-level functionality to the extent possible. However, such analysis is dynamic in nature, and requires the installation of the device and access to network data which is often hard to get for privacy and confidentiality reasons. We propose an alternative static approach which can infer the functionality of a device from vendor materials using Natural Language Processing (NLP) techniques. Information about IoT device functionality can be used in various applications, one of which is ensuring security in a smart home. We demonstrate how security policies associated with device functionality in a smart home can be formally represented using the NIST Next Generation Access Control (NGAC) model and automatically analyzed using Alloy, which is a formal verification tool. This will provide assurance to the consumer that these devices will be compliant to the home or organizational policy even before they have been purchased.

2022-01-31

Zhao, Rui. 2021. The Vulnerability of the Neural Networks Against Adversarial Examples in Deep Learning Algorithms. 2021 2nd International Conference on Computing and Data Science (CDS). :287–295.

With the further development in the fields of computer vision, network security, natural language processing and so on so forth, deep learning technology gradually exposed certain security risks. The existing deep learning algorithms cannot effectively describe the essential characteristics of data, making the algorithm unable to give the correct result in the face of malicious input. Based on current security threats faced by deep learning, this paper introduces the problem of adversarial examples in deep learning, sorts out the existing attack and defense methods of black box and white box, and classifies them. It briefly describes the application of some adversarial examples in different scenarios in recent years, compares several defense technologies of adversarial examples, and finally summarizes the problems in this research field and prospects its future development. This paper introduces the common white box attack methods in detail, and further compares the similarities and differences between the attack of black and white boxes. Correspondingly, the author also introduces the defense methods, and analyzes the performance of these methods against the black and white box attack.

2022-01-25

Marulli, Fiammetta, Balzanella, Antonio, Campanile, Lelio, Iacono, Mauro, Mastroianni, Michele. 2021. Exploring a Federated Learning Approach to Enhance Authorship Attribution of Misleading Information from Heterogeneous Sources. 2021 International Joint Conference on Neural Networks (IJCNN). :1–8.

Authorship Attribution (AA) is currently applied in several applications, among which fraud detection and anti-plagiarism checks: this task can leverage stylometry and Natural Language Processing techniques. In this work, we explored some strategies to enhance the performance of an AA task for the automatic detection of false and misleading information (e.g., fake news). We set up a text classification model for AA based on stylometry exploiting recurrent deep neural networks and implemented two learning tasks trained on the same collection of fake and real news, comparing their performances: one is based on Federated Learning architecture, the other on a centralized architecture. The goal was to discriminate potential fake information from true ones when the fake news comes from heterogeneous sources, with different styles. Preliminary experiments show that a distributed approach significantly improves recall with respect to the centralized model. As expected, precision was lower in the distributed model. This aspect, coupled with the statistical heterogeneity of data, represents some open issues that will be further investigated in future work.

2022-01-10

M, Babu, R, Hemchandhar, D, Harish Y., S, Akash, K, Abhishek Todi. 2021. Voice Prescription with End-to-End Security Enhancements. 2021 6th International Conference on Communication and Electronics Systems (ICCES). :1–8.

The recent analysis indicates more than 250,000 people in the United States of America (USA) die every year because of medical errors. World Health Organisation (WHO) reports states that 2.6 million deaths occur due to medical and its prescription errors. Many of the errors related to the wrong drug/dosage administration by caregivers to patients due to indecipherable handwritings, drug interactions, confusing drug names, etc. The espousal of Mobile-based speech recognition applications will eliminate the errors. This allows physicians to narrate the prescription instead of writing. The application can be accessed through smartphones and can be used easily by everyone. An application program interface has been created for handling requests. Natural language processing is used to read text, interpret and determine the important words for generating prescriptions. The patient data is stored and used according to the Health Insurance Portability and Accountability Act of 1996 (HIPAA) guidelines. The SMS4-BSK encryption scheme is used to provide the data transmission securely over Wireless LAN.

2021-12-21

Li, Kemeng, Zheng, Dong, Guo, Rui. 2021. An Anonymous Editable Blockchain Scheme Based on Certificateless Aggregate Signature. 2021 3rd International Conference on Natural Language Processing (ICNLP). :57–67.

Blockchain technology has gradually replaced traditional centralized data storage methods, and provided people reliable data storage services with its decentralized and non-tamperable features. However, the current blockchain data supervision is insufficient and the data cannot be modified once it is on the blockchain, which will cause the blockchain system to face various problems such as illegal information cannot be deleted and breach of smart contract cannot be fixed in time. To address these issues, we propose an anonymous editable blockchain scheme based on the reconstruction of the blockchain structure of the SpaceMint combining with the certificateless aggregate signature algorithm. Users register with their real identities and use pseudonyms in the system to achieve their anonymity. If the number of users who agree to edit meets the threshold, the data on the blockchain can be modified or deleted, and our scheme has the function of accountability for malicious behavior. The security analysis show that the proposed certificateless aggregate signature algorithm enjoys the unforgeability under the adaptive selected message attack. Moreover, the method of setting the threshold of related users is adopted to guarantee the effectiveness and security of editing blockchain data. At last, we evaluate the performance of our certificateless aggregate signature algorithm and related schemes in theoretical analysis and experimental simulation, which demonstrates our scheme is feasible and efficient in storage, bandwidth and computational cost.

2021-11-29

Hu, Shengze, He, Chunhui, Ge, Bin, Liu, Fang. 2020. Enhanced Word Embedding Method in Text Classification. 2020 6th International Conference on Big Data and Information Analytics (BigDIA). :18–22.

For the task of natural language processing (NLP), Word embedding technology has a certain impact on the accuracy of deep neural network algorithms. Considering that the current word embedding method cannot realize the coexistence of words and phrases in the same vector space. Therefore, we propose an enhanced word embedding (EWE) method. Before completing the word embedding, this method introduces a unique sentence reorganization technology to rewrite all the sentences in the original training corpus. Then, all the original corpus and the reorganized corpus are merged together as the training corpus of the distributed word embedding model, so as to realize the coexistence problem of words and phrases in the same vector space. We carried out experiment to demonstrate the effectiveness of the EWE algorithm on three classic benchmark datasets. The results show that the EWE method can significantly improve the classification performance of the CNN model.

Gupta, Hritvik, Patel, Mayank. 2020. Study of Extractive Text Summarizer Using The Elmo Embedding. 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :829–834.

In recent times, data excessiveness has become a major problem in the field of education, news, blogs, social media, etc. Due to an increase in such a vast amount of text data, it became challenging for a human to extract only the valuable amount of data in a concise form. In other words, summarizing the text, enables human to retrieves the relevant and useful texts, Text summarizing is extracting the data from the document and generating the short or concise text of the document. One of the major approaches that are used widely is Automatic Text summarizer. Automatic text summarizer analyzes the large textual data and summarizes it into the short summaries containing valuable information of the data. Automatic text summarizer further divided into two types 1) Extractive text summarizer, 2) Abstractive Text summarizer. In this article, the extractive text summarizer approach is being looked for. Extractive text summarization is the approach in which model generates the concise summary of the text by picking up the most relevant sentences from the text document. This paper focuses on retrieving the valuable amount of data using the Elmo embedding in Extractive text summarization. Elmo embedding is a contextual embedding that had been used previously by many researchers in abstractive text summarization techniques, but this paper focus on using it in extractive text summarizer.

Nazemi, Kawa, Klepsch, Maike J., Burkhardt, Dirk, Kaupp, Lukas. 2020. Comparison of Full-Text Articles and Abstracts for Visual Trend Analytics through Natural Language Processing. 2020 24th International Conference Information Visualisation (IV). :360–367.

Scientific publications are an essential resource for detecting emerging trends and innovations in a very early stage, by far earlier than patents may allow. Thereby Visual Analytics systems enable a deep analysis by applying commonly unsupervised machine learning methods and investigating a mass amount of data. A main question from the Visual Analytics viewpoint in this context is, do abstracts of scientific publications provide a similar analysis capability compared to their corresponding full-texts? This would allow to extract a mass amount of text documents in a much faster manner. We compare in this paper the topic extraction methods LSI and LDA by using full text articles and their corresponding abstracts to obtain which method and which data are better suited for a Visual Analytics system for Technology and Corporate Foresight. Based on a easy replicable natural language processing approach, we further investigate the impact of lemmatization for LDA and LSI. The comparison will be performed qualitative and quantitative to gather both, the human perception in visual systems and coherence values. Based on an application scenario a visual trend analytics system illustrates the outcomes.

2021-09-07

Vamsi, G Krishna, Rasool, Akhtar, Hajela, Gaurav. 2020. Chatbot: A Deep Neural Network Based Human to Machine Conversation Model. 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT). :1–7.

A conversational agent (chatbot) is computer software capable of communicating with humans using natural language processing. The crucial part of building any chatbot is the development of conversation. Despite many developments in Natural Language Processing (NLP) and Artificial Intelligence (AI), creating a good chatbot model remains a significant challenge in this field even today. A conversational bot can be used for countless errands. In general, they need to understand the user's intent and deliver appropriate replies. This is a software program of a conversational interface that allows a user to converse in the same manner one would address a human. Hence, these are used in almost every customer communication platform, like social networks. At present, there are two basic models used in developing a chatbot. Generative based models and Retrieval based models. The recent advancements in deep learning and artificial intelligence, such as the end-to-end trainable neural networks have rapidly replaced earlier methods based on hand-written instructions and patterns or statistical methods. This paper proposes a new method of creating a chatbot using a deep neural learning method. In this method, a neural network with multiple layers is built to learn and process the data.

2021-08-05

Ramasubramanian, Muthukumaran, Muhammad, Hassan, Gurung, Iksha, Maskey, Manil, Ramachandran, Rahul. 2020. ES2Vec: Earth Science Metadata Keyword Assignment using Domain-Specific Word Embeddings. 2020 SoutheastCon. :1—6.

Earth science metadata keyword assignment is a challenging problem. Dataset curators select appropriate keywords from the Global Change Master Directory (GCMD) set of keywords. The keywords are integral part of search and discovery of these datasets. Hence, selection of keywords are crucial in increasing the discoverability of datasets. Utilizing machine learning techniques, we provide users with automated keyword suggestions as an improved approach to complement manual selection. We trained a machine learning model that leverages the semantic embedding ability of Word2Vec models to process abstracts and suggest relevant keywords. A user interface tool we built to assist data curators in assignment of such keywords is also described.

Alecakir, Huseyin, Kabukcu, Muhammet, Can, Burcu, Sen, Sevil. 2020. Discovering Inconsistencies between Requested Permissions and Application Metadata by using Deep Learning. 2020 International Conference on Information Security and Cryptology (ISCTURKEY). :56—56.

Android gives us opportunity to extract meaningful information from metadata. From the security point of view, the missing important information in metadata of an application could be a sign of suspicious application, which could be directed for extensive analysis. Especially the usage of dangerous permissions is expected to be explained in app descriptions. The permission-to-description fidelity problem in the literature aims to discover such inconsistencies between the usage of permissions and descriptions. This study proposes a new method based on natural language processing and recurrent neural networks. The effect of user reviews on finding such inconsistencies is also investigated in addition to application descriptions. The experimental results show that high precision is obtained by the proposed solution, and the proposed method could be used for triage of Android applications.

2021-06-24

Stöckle, Patrick, Grobauer, Bernd, Pretschner, Alexander. 2020. Automated Implementation of Windows-related Security-Configuration Guides. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). :598—610.

Hardening is the process of configuring IT systems to ensure the security of the systems' components and data they process or store. The complexity of contemporary IT infrastructures, however, renders manual security hardening and maintenance a daunting task. In many organizations, security-configuration guides expressed in the SCAP (Security Content Automation Protocol) are used as a basis for hardening, but these guides by themselves provide no means for automatically implementing the required configurations. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides. In many organizations, security-configuration guides expressed in the SCAP (Security Content Automation Protocol) are used as a basis for hardening, but these guides by themselves provide no means for automatically implementing the required configurations. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides.

Saletta, Martina, Ferretti, Claudio. 2020. A Neural Embedding for Source Code: Security Analysis and CWE Lists. 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :523—530.

In this paper, we design a technique for mapping the source code into a vector space and we show its application in the recognition of security weaknesses. By applying ideas commonly used in Natural Language Processing, we train a model for producing an embedding of programs starting from their Abstract Syntax Trees. We then show how such embedding is able to infer clusters roughly separating different classes of software weaknesses. Even if the training of the embedding is unsupervised and made on a generic Java dataset, we show that the model can be used for supervised learning of specific classes of vulnerabilities, helping to capture some features distinguishing them in code. Finally, we discuss how our model performs over the different types of vulnerabilities categorized by the CWE initiative.

2021-06-01

Ming, Kun. 2020. Chinese Coreference Resolution via Bidirectional LSTMs using Word and Token Level Representations. 2020 16th International Conference on Computational Intelligence and Security (CIS). :73–76.

Coreference resolution is an important task in the field of natural language processing. Most existing methods usually utilize word-level representations, ignoring massive information from the texts. To address this issue, we investigate how to improve Chinese coreference resolution by using span-level semantic representations. Specifically, we propose a model which acquires word and character representations through pre-trained Skip-Gram embeddings and pre-trained BERT, then explicitly leverages span-level information by performing bidirectional LSTMs among above representations. Experiments on CoNLL-2012 shared task have demonstrated that the proposed model achieves 62.95% F1-score, outperforming our baseline methods.