FeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature
Title | FeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Zhu, Ziyun, Dumitras, Tudor |
Conference Name | Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4139-4 |
Keywords | Android malware, automatic feature engineering, Human Behavior, Metrics, natural language processing, pubcrawl, Resiliency, semantic network, text mining |
Abstract | Malware detection increasingly relies on machine learning techniques, which utilize multiple features to separate the malware from the benign apps. The effectiveness of these techniques primarily depends on the manual feature engineering process, based on human knowledge and intuition. However, given the adversaries' efforts to evade detection and the growing volume of publications on malware behaviors, the feature engineering process likely draws from a fraction of the relevant knowledge. We propose an end-to-end approach for automatic feature engineering. We describe techniques for mining documents written in natural language (e.g. scientific papers) and for representing and querying the knowledge about malware in a way that mirrors the human feature engineering process. Specifically, we first identify abstract behaviors that are associated with malware, and then we map these behaviors to concrete features that can be tested experimentally. We implement these ideas in a system called FeatureSmith, which generates a feature set for detecting Android malware. We train a classifier using these features on a large data set of benign and malicious apps. This classifier achieves a 92.5% true positive rate with only 1% false positives, which is comparable to the performance of a state-of-the-art Android malware detector that relies on manually engineered features. In addition, FeatureSmith is able to suggest informative features that are absent from the manually engineered set and to link the features generated to abstract concepts that describe malware behaviors. |
URL | http://doi.acm.org/10.1145/2976749.2978304 |
DOI | 10.1145/2976749.2978304 |
Citation Key | zhu_featuresmith:_2016 |