Malware Classification and Class Imbalance via Stochastic Hashed LZJD

Submitted by grigby1 on Wed, 05/30/2018 - 4:02pm

Title	Malware Classification and Class Imbalance via Stochastic Hashed LZJD
Publication Type	Conference Paper
Year of Publication	2017
Authors	Raff, Edward, Nicholas, Charles
Conference Name	Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security
Publisher	ACM
Conference Location	New York, NY, USA
ISBN Number	978-1-4503-5202-4
Keywords	Collaboration, cyber security, Human Behavior, human factors, lzjd, malware classification, Metrics, policy-based governance, pubcrawl, resilience, Resiliency, security weaknesses, shwel
Abstract	There are currently few methods that can be applied to malware classification problems which don't require domain knowledge to apply. In this work, we develop our new SHWeL feature vector representation, by extending the recently proposed Lempel-Ziv Jaccard Distance. These SHWeL vectors improve upon LZJD's accuracy, outperform byte n-grams, and allow us to build efficient algorithms for both training (a weakness of byte n-grams) and inference (a weakness of LZJD). Furthermore, our new SHWeL method also allows us to directly tackle the class imbalance problem, which is common for malware-related tasks. Compared to existing methods like SMOTE, SHWeL provides significantly improved accuracy while reducing algorithmic complexity to O(N). Because our approach is developed without the use of domain knowledge, it can be easily re-applied to any new domain where there is a need to classify byte sequences.
URL	https://dl.acm.org/citation.cfm?doid=3128572.3140446
DOI	10.1145/3128572.3140446
Citation Key	raff_malware_2017

Groups:

Science of Security VO