Visible to the public Biblio

Filters: Keyword is named entity recognition  [Clear All Filters]
2023-09-01
Sumoto, Kensuke, Kanakogi, Kenta, Washizaki, Hironori, Tsuda, Naohiko, Yoshioka, Nobukazu, Fukazawa, Yoshiaki, Kanuka, Hideyuki.  2022.  Automatic labeling of the elements of a vulnerability report CVE with NLP. 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI). :164—165.
Common Vulnerabilities and Exposures (CVE) databases contain information about vulnerabilities of software products and source code. If individual elements of CVE descriptions can be extracted and structured, then the data can be used to search and analyze CVE descriptions. Herein we propose a method to label each element in CVE descriptions by applying Named Entity Recognition (NER). For NER, we used BERT, a transformer-based natural language processing model. Using NER with machine learning can label information from CVE descriptions even if there are some distortions in the data. An experiment involving manually prepared label information for 1000 CVE descriptions shows that the labeling accuracy of the proposed method is about 0.81 for precision and about 0.89 for recall. In addition, we devise a way to train the data by dividing it into labels. Our proposed method can be used to label each element automatically from CVE descriptions.
2023-06-09
Liu, Luchen, Lin, Xixun, Zhang, Peng, Zhang, Lei, Wang, Bin.  2022.  Learning Common Dependency Structure for Unsupervised Cross-Domain Ner. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :8347—8351.
Unsupervised cross-domain NER task aims to solve the issues when data in a new domain are fully-unlabeled. It leverages labeled data from source domain to predict entities in unlabeled target domain. Since training models on large domain corpus is time-consuming, in this paper, we consider an alternative way by introducing syntactic dependency structure. Such information is more accessible and can be shared between sentences from different domains. We propose a novel framework with dependency-aware GNN (DGNN) to learn these common structures from source domain and adapt them to target domain, alleviating the data scarcity issue and bridging the domain gap. Experimental results show that our method outperforms state-of-the-art methods.
2022-10-06
He, Bingjun, Chen, Jianfeng.  2021.  Named Entity Recognition Method in Network Security Domain Based on BERT-BiLSTM-CRF. 2021 IEEE 21st International Conference on Communication Technology (ICCT). :508–512.
With the increase of the number of network threats, the knowledge graph is an effective method to quickly analyze the network threats from the mass of network security texts. Named entity recognition in network security domain is an important task to construct knowledge graph. Aiming at the problem that key Chinese entity information in network security related text is difficult to identify, a named entity recognition model in network security domain based on BERT-BiLSTM-CRF is proposed to identify key named entities in network security related text. This model adopts the BERT pre-training model to obtain the word vectors of the preceding and subsequent text information, and the obtained word vectors will be input to the subsequent BiLSTM module and CRF module for encoding and sorting. The test results show that this model has a good effect on the data set of network security domain. The recognition effect of this model is better than that of LSTM-CRF, BERT-LSTM-CRF, BERT-CRF and other models, and the F1=93.81%.
Zhu, Xiaoyan, Zhang, Yu, Zhu, Lei, Hei, Xinhong, Wang, Yichuan, Hu, Feixiong, Yao, Yanni.  2021.  Chinese named entity recognition method for the field of network security based on RoBERTa. 2021 International Conference on Networking and Network Applications (NaNA). :420–425.
As the mobile Internet is developing rapidly, people who use cell phones to access the Internet dominate, and the mobile Internet has changed the development environment of online public opinion and made online public opinion events spread more widely. In the online environment, any kind of public issues may become a trigger for the generation of public opinion and thus need to be controlled for network supervision. The method in this paper can identify entities from the event texts obtained from mobile Today's Headlines, People's Daily, etc., and informatize security of public opinion in event instances, thus strengthening network supervision and control in mobile, and providing sufficient support for national security event management. In this paper, we present a SW-BiLSTM-CRF model, as well as a model combining the RoBERTa pre-trained model with the classical neural network BiLSTM model. Our experiments show that this approach provided achieves quite good results on Chinese emergency corpus, with accuracy and F1 values of 87.21% and 78.78%, respectively.
2022-09-20
Herwanto, Guntur Budi, Quirchmayr, Gerald, Tjoa, A Min.  2021.  A Named Entity Recognition Based Approach for Privacy Requirements Engineering. 2021 IEEE 29th International Requirements Engineering Conference Workshops (REW). :406—411.
The presence of experts, such as a data protection officer (DPO) and a privacy engineer is essential in Privacy Requirements Engineering. This task is carried out in various forms including threat modeling and privacy impact assessment. The knowledge required for performing privacy threat modeling can be a serious challenge for a novice privacy engineer. We aim to bridge this gap by developing an automated approach via machine learning that is able to detect privacy-related entities in the user stories. The relevant entities include (1) the Data Subject, (2) the Processing, and (3) the Personal Data entities. We use a state-of-the-art Named Entity Recognition (NER) model along with contextual embedding techniques. We argue that an automated approach can assist agile teams in performing privacy requirements engineering techniques such as threat modeling, which requires a holistic understanding of how personally identifiable information is used in a system. In comparison to other domain-specific NER models, our approach achieves a reasonably good performance in terms of precision and recall.
2022-04-12
Rane, Prachi, Rao, Aishwarya, Verma, Diksha, Mhaisgawali, Amrapali.  2021.  Redacting Sensitive Information from the Data. 2021 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON). :1—5.
Redaction of personal, confidential and sensitive information from documents is becoming increasingly important for individuals and organizations. In past years, there have been many well-publicized cases of data leaks from various popular companies. When the data contains sensitive information, these leaks pose a serious threat. To protect and conceal sensitive information, many companies have policies and laws about processing and sanitizing sensitive information in business documents.The traditional approach of manually finding and matching millions of words and then redacting is slow and error-prone. This paper examines different models to automate the identification and redaction of personal and sensitive information contained within the documents using named entity recognition. Sensitive entities example person’s name, bank account details or Aadhaar numbers targeted for redaction, are recognized based on the file’s content, providing users with an interactive approach to redact the documents by changing selected sensitive terms.
Evangelatos, Pavlos, Iliou, Christos, Mavropoulos, Thanassis, Apostolou, Konstantinos, Tsikrika, Theodora, Vrochidis, Stefanos, Kompatsiaris, Ioannis.  2021.  Named Entity Recognition in Cyber Threat Intelligence Using Transformer-based Models. 2021 IEEE International Conference on Cyber Security and Resilience (CSR). :348—353.
The continuous increase in sophistication of threat actors over the years has made the use of actionable threat intelligence a critical part of the defence against them. Such Cyber Threat Intelligence is published daily on several online sources, including vulnerability databases, CERT feeds, and social media, as well as on forums and web pages from the Surface and the Dark Web. Named Entity Recognition (NER) techniques can be used to extract the aforementioned information in an actionable form from such sources. In this paper we investigate how the latest advances in the NER domain, and in particular transformer-based models, can facilitate this process. To this end, the dataset for NER in Threat Intelligence (DNRTI) containing more than 300 pieces of threat intelligence reports from open source threat intelligence websites is used. Our experimental results demonstrate that transformer-based techniques are very effective in extracting cybersecurity-related named entities, by considerably outperforming the previous state- of-the-art approaches tested with DNRTI.
2022-03-10
Qin, Shuangling, Xu, Chaozhi, Zhang, Fang, Jiang, Tao, Ge, Wei, Li, Jihong.  2021.  Research on Application of Chinese Natural Language Processing in Constructing Knowledge Graph of Chronic Diseases. 2021 International Conference on Communications, Information System and Computer Engineering (CISCE). :271—274.
Knowledge Graph can describe the concepts in the objective world and the relationships between these concepts in a structured way, and identify, discover and infer the relationships between things and concepts. It has been developed in the field of medical and health care. In this paper, the method of natural language processing has been used to build chronic disease knowledge graph, such as named entity recognition, relationship extraction. This method is beneficial to forecast analysis of chronic disease, network monitoring, basic education, etc. The research of this paper can greatly help medical experts in the treatment of chronic disease treatment, and assist primary clinicians with making more scientific decision, and can help Patients with chronic diseases to improve medical efficiency. In the end, it also has practical significance for clinical scientific research of chronic disease.
2020-05-22
Khadilkar, Kunal, Kulkarni, Siddhivinayak, Bone, Poojarani.  2018.  Plagiarism Detection Using Semantic Knowledge Graphs. 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). :1—6.

Every day, huge amounts of unstructured text is getting generated. Most of this data is in the form of essays, research papers, patents, scholastic articles, book chapters etc. Many plagiarism softwares are being developed to be used in order to reduce the stealing and plagiarizing of Intellectual Property (IP). Current plagiarism softwares are mainly using string matching algorithms to detect copying of text from another source. The drawback of some of such plagiarism softwares is their inability to detect plagiarism when the structure of the sentence is changed. Replacement of keywords by their synonyms also fails to be detected by these softwares. This paper proposes a new method to detect such plagiarism using semantic knowledge graphs. The method uses Named Entity Recognition as well as semantic similarity between sentences to detect possible cases of plagiarism. The doubtful cases are visualized using semantic Knowledge Graphs for thorough analysis of authenticity. Rules for active and passive voice have also been considered in the proposed methodology.

2017-05-17
Burdick, Doug, De, Soham, Raschid, Louiqa, Shao, Mingchao, Xu, Zheng, Zotkina, Elena.  2016.  resMBS: Constructing a Financial Supply Chain from Prospectus. Proceedings of the Second International Workshop on Data Science for Macro-Modeling. :7:1–7:6.

Understanding the behavior of complex financial supply chains is usually difficult due to a lack of data capturing the interactions between financial institutions (FIs) and the roles that they play in financial contracts (FCs). resMBS is an example supply chain corresponding to the US residential mortgage backed securities that were critical in the 2008 US financial crisis. In this paper, we describe the process of creating the resMBS graph dataset from financial prospectus. We use the SystemT rule-based text extraction platform to develop two tools, ORG NER and Dict NER, for named entity recognition of financial institution (FI) names. The resMBS graph comprises a set of FC nodes (each prospectus) and the corresponding FI nodes that are extracted from the prospectus. A Role-FI extractor matches a role keyword such as originator, sponsor or servicer, with FI names. We study the performance of the Role-FI extractor, and ORG NER and Dict NER, in constructing the resMBS dataset. We also present preliminary results of a clustering based analysis to identify financial communities and their evolution in the resMBS financial supply chain.