Malware Classification and Class Imbalance via Stochastic Hashed LZJD
Title | Malware Classification and Class Imbalance via Stochastic Hashed LZJD |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Raff, Edward, Nicholas, Charles |
Conference Name | Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-5202-4 |
Keywords | Collaboration, cyber security, Human Behavior, human factors, lzjd, malware classification, Metrics, policy-based governance, pubcrawl, resilience, Resiliency, security weaknesses, shwel |
Abstract | There are currently few methods that can be applied to malware classification problems which don't require domain knowledge to apply. In this work, we develop our new SHWeL feature vector representation, by extending the recently proposed Lempel-Ziv Jaccard Distance. These SHWeL vectors improve upon LZJD's accuracy, outperform byte n-grams, and allow us to build efficient algorithms for both training (a weakness of byte n-grams) and inference (a weakness of LZJD). Furthermore, our new SHWeL method also allows us to directly tackle the class imbalance problem, which is common for malware-related tasks. Compared to existing methods like SMOTE, SHWeL provides significantly improved accuracy while reducing algorithmic complexity to O(N). Because our approach is developed without the use of domain knowledge, it can be easily re-applied to any new domain where there is a need to classify byte sequences. |
URL | https://dl.acm.org/citation.cfm?doid=3128572.3140446 |
DOI | 10.1145/3128572.3140446 |
Citation Key | raff_malware_2017 |