Title | Automatic labeling of the elements of a vulnerability report CVE with NLP |
Publication Type | Conference Paper |
Year of Publication | 2022 |
Authors | Sumoto, Kensuke, Kanakogi, Kenta, Washizaki, Hironori, Tsuda, Naohiko, Yoshioka, Nobukazu, Fukazawa, Yoshiaki, Kanuka, Hideyuki |
Conference Name | 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI) |
Keywords | BERT, composability, compositionality, CVE, Data Science, Databases, distortion, Information Reuse, machine learning, named entity recognition, natural language processing, pubcrawl, resilience, Resiliency, security, security knowledge repository, Software, Technological, Transformers |
Abstract | Common Vulnerabilities and Exposures (CVE) databases contain information about vulnerabilities of software products and source code. If individual elements of CVE descriptions can be extracted and structured, then the data can be used to search and analyze CVE descriptions. Herein we propose a method to label each element in CVE descriptions by applying Named Entity Recognition (NER). For NER, we used BERT, a transformer-based natural language processing model. Using NER with machine learning can label information from CVE descriptions even if there are some distortions in the data. An experiment involving manually prepared label information for 1000 CVE descriptions shows that the labeling accuracy of the proposed method is about 0.81 for precision and about 0.89 for recall. In addition, we devise a way to train the data by dividing it into labels. Our proposed method can be used to label each element automatically from CVE descriptions. |
DOI | 10.1109/IRI54793.2022.00045 |
Citation Key | sumoto_automatic_2022 |