Visible to the public FeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature

TitleFeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature
Publication TypeConference Paper
Year of Publication2016
AuthorsZhu, Ziyun, Dumitras, Tudor
Conference NameProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4139-4
KeywordsAndroid malware, automatic feature engineering, Human Behavior, Metrics, natural language processing, pubcrawl, Resiliency, semantic network, text mining
Abstract

Malware detection increasingly relies on machine learning techniques, which utilize multiple features to separate the malware from the benign apps. The effectiveness of these techniques primarily depends on the manual feature engineering process, based on human knowledge and intuition. However, given the adversaries' efforts to evade detection and the growing volume of publications on malware behaviors, the feature engineering process likely draws from a fraction of the relevant knowledge. We propose an end-to-end approach for automatic feature engineering. We describe techniques for mining documents written in natural language (e.g. scientific papers) and for representing and querying the knowledge about malware in a way that mirrors the human feature engineering process. Specifically, we first identify abstract behaviors that are associated with malware, and then we map these behaviors to concrete features that can be tested experimentally. We implement these ideas in a system called FeatureSmith, which generates a feature set for detecting Android malware. We train a classifier using these features on a large data set of benign and malicious apps. This classifier achieves a 92.5% true positive rate with only 1% false positives, which is comparable to the performance of a state-of-the-art Android malware detector that relies on manually engineered features. In addition, FeatureSmith is able to suggest informative features that are absent from the manually engineered set and to link the features generated to abstract concepts that describe malware behaviors.

URLhttp://doi.acm.org/10.1145/2976749.2978304
DOI10.1145/2976749.2978304
Citation Keyzhu_featuresmith:_2016