Development of a Plugin Based Extensible Feature Extraction Framework
Title | Development of a Plugin Based Extensible Feature Extraction Framework |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Malviya, Vikas, Rai, Sawan, Gupta, Atul |
Conference Name | Proceedings of the 33rd Annual ACM Symposium on Applied Computing |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-5191-1 |
Keywords | classification, Cross Site Scripting, Design patterns, feature extraction, Human Behavior, machine learning, object oriented programming, plugin based framework, pubcrawl, resilience, Scalability, spam filtering |
Abstract | An important ingredient for a successful recipe for solving machine learning problems is the availability of a suitable dataset. However, such a dataset may have to be extracted from a large unstructured and semi-structured data like programming code, scripts, and text. In this work, we propose a plug-in based, extensible feature extraction framework for which we have prototyped as a tool. The proposed framework is demonstrated by extracting features from two different sources of semi-structured and unstructured data. The semi-structured data comprised of web page and script based data whereas the other data was taken from email data for spam filtering. The usefulness of the tool was also assessed on the aspect of ease of programming. |
URL | http://doi.acm.org/10.1145/3167132.3167328 |
DOI | 10.1145/3167132.3167328 |
Citation Key | malviya_development_2018 |