Visible to the public Malware Classification and Class Imbalance via Stochastic Hashed LZJD

TitleMalware Classification and Class Imbalance via Stochastic Hashed LZJD
Publication TypeConference Paper
Year of Publication2017
AuthorsRaff, Edward, Nicholas, Charles
Conference NameProceedings of the 10th ACM Workshop on Artificial Intelligence and Security
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-5202-4
KeywordsCollaboration, cyber security, Human Behavior, human factors, lzjd, malware classification, Metrics, policy-based governance, pubcrawl, resilience, Resiliency, security weaknesses, shwel
Abstract

There are currently few methods that can be applied to malware classification problems which don't require domain knowledge to apply. In this work, we develop our new SHWeL feature vector representation, by extending the recently proposed Lempel-Ziv Jaccard Distance. These SHWeL vectors improve upon LZJD's accuracy, outperform byte n-grams, and allow us to build efficient algorithms for both training (a weakness of byte n-grams) and inference (a weakness of LZJD). Furthermore, our new SHWeL method also allows us to directly tackle the class imbalance problem, which is common for malware-related tasks. Compared to existing methods like SMOTE, SHWeL provides significantly improved accuracy while reducing algorithmic complexity to O(N). Because our approach is developed without the use of domain knowledge, it can be easily re-applied to any new domain where there is a need to classify byte sequences.

URLhttps://dl.acm.org/citation.cfm?doid=3128572.3140446
DOI10.1145/3128572.3140446
Citation Keyraff_malware_2017