Biblio
Software1 vulnerabilities are closely associated with information systems security, a major and critical field in today's technology. Vulnerabilities constitute a constant and increasing threat for various aspects of everyday life, especially for safety and economy, since the social impact from the problems that they cause is complicated and often unpredictable. Although there is an entire research branch in software engineering that deals with the identification and elimination of vulnerabilities, the growing complexity of software products and the variability of software production procedures are factors contributing to the ongoing occurrence of vulnerabilities, Hence, another area that is being developed in parallel focuses on the study and management of the vulnerabilities that have already been reported and registered in databases. The information contained in such databases includes, a textual description and a number of metrics related to vulnerabilities. The purpose of this paper is to investigate to what extend the assessment of the vulnerability severity can be inferred directly from the corresponding textual description, or in other words, to examine the informative power of the description with respect to the vulnerability severity. For this purpose, text mining techniques, i.e. text analysis and three different classification methods (decision trees, neural networks and support vector machines) were employed. The application of text mining to a sample of 70,678 vulnerabilities from a public data source shows that the description itself is a reliable and highly accurate source of information for vulnerability prioritization.