Biblio
An important ingredient for a successful recipe for solving machine learning problems is the availability of a suitable dataset. However, such a dataset may have to be extracted from a large unstructured and semi-structured data like programming code, scripts, and text. In this work, we propose a plug-in based, extensible feature extraction framework for which we have prototyped as a tool. The proposed framework is demonstrated by extracting features from two different sources of semi-structured and unstructured data. The semi-structured data comprised of web page and script based data whereas the other data was taken from email data for spam filtering. The usefulness of the tool was also assessed on the aspect of ease of programming.
Mastery of the subtleties of object-oriented programming lan- guages is undoubtedly challenging to achieve. Design patterns have been proposed some decades ago in order to support soft- ware designers and developers in overcoming recurring challeng- es in the design of object-oriented software systems. However, given that dozens if not hundreds of patterns have emerged so far, it can be assumed that their mastery has become a serious chal- lenge in its own right. In this paper, we describe a proof of con- cept implementation of a recommendation system that aims to detect opportunities for the Strategy design pattern that developers have missed so far. For this purpose, we have formalized natural language pattern guidelines from the literature and quantified them for static code analysis with data mined from a significant collection of open source systems. Moreover, we present the re- sults from analyzing 25 different open source systems with this prototype as it discovered more than 200 candidates for imple- menting the Strategy pattern and the encouraging results of a pre- liminary evaluation with experienced developers. Finally, we sketch how we are currently extending this work to other patterns.
Software security is an important aspect of ensuring software quality. Early detection of vulnerable code during development is essential for the developers to make cost and time effective software testing. The traditional software metrics are used for early detection of software vulnerability, but they are not directly related to code constructs and do not specify any particular granularity level. The goal of this study is to help developers evaluate software security using class-level traceable patterns called micro patterns to reduce security risks. The concept of micro patterns is similar to design patterns, but they can be automatically recognized and mined from source code. If micro patterns can better predict vulnerable classes compared to traditional software metrics, they can be used in developing a vulnerability prediction model. This study explores the performance of class-level patterns in vulnerability prediction and compares them with traditional class-level software metrics. We studied security vulnerabilities as reported for one major release of Apache Tomcat, Apache Camel and three stand-alone Java web applications. We used machine learning techniques for predicting vulnerabilities using micro patterns and class-level metrics as features. We found that micro patterns have higher recall in detecting vulnerable classes than the software metrics.