Visible to the public Development of a Plugin Based Extensible Feature Extraction Framework

TitleDevelopment of a Plugin Based Extensible Feature Extraction Framework
Publication TypeConference Paper
Year of Publication2018
AuthorsMalviya, Vikas, Rai, Sawan, Gupta, Atul
Conference NameProceedings of the 33rd Annual ACM Symposium on Applied Computing
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-5191-1
Keywordsclassification, Cross Site Scripting, Design patterns, feature extraction, Human Behavior, machine learning, object oriented programming, plugin based framework, pubcrawl, resilience, Scalability, spam filtering
Abstract

An important ingredient for a successful recipe for solving machine learning problems is the availability of a suitable dataset. However, such a dataset may have to be extracted from a large unstructured and semi-structured data like programming code, scripts, and text. In this work, we propose a plug-in based, extensible feature extraction framework for which we have prototyped as a tool. The proposed framework is demonstrated by extracting features from two different sources of semi-structured and unstructured data. The semi-structured data comprised of web page and script based data whereas the other data was taken from email data for spam filtering. The usefulness of the tool was also assessed on the aspect of ease of programming.

URLhttp://doi.acm.org/10.1145/3167132.3167328
DOI10.1145/3167132.3167328
Citation Keymalviya_development_2018