Biblio
Ransomware, as a specialized form of malicious software, has recently emerged as a major threat in computer security. With an ability to lock out user access to their content, recent ransomware attacks have caused severe impact at an individual and organizational level. While research in malware detection can be adapted directly for ransomware, specific structural properties of ransomware can further improve the quality of detection. In this paper, we adapt the deep learning methods used in malware detection for detecting ransomware from emulation sequences. We present specialized recurrent neural networks for capturing local event patterns in ransomware sequences using the concept of attention mechanisms. We demonstrate the performance of enhanced LSTM models on a sequence dataset derived by the emulation of ransomware executables targeting the Windows environment.
Recently researchers have proposed using deep learning-based systems for malware detection. Unfortunately, all deep learning classification systems are vulnerable to adversarial learning-based attacks, or adversarial attacks, where miscreants can avoid detection by the classification algorithm with very few perturbations of the input data. Previous work has studied adversarial attacks against static analysis-based malware classifiers which only classify the content of the unknown file without execution. However, since the majority of malware is either packed or encrypted, malware classification based on static analysis often fails to detect these types of files. To overcome this limitation, anti-malware companies typically perform dynamic analysis by emulating each file in the anti-malware engine or performing in-depth scanning in a virtual machine. These strategies allow the analysis of the malware after unpacking or decryption. In this work, we study different strategies of crafting adversarial samples for dynamic analysis. These strategies operate on sparse, binary inputs in contrast to continuous inputs such as pixels in images. We then study the effects of two, previously proposed defensive mechanisms against crafted adversarial samples including the distillation and ensemble defenses. We also propose and evaluate the weight decay defense. Experiments show that with these three defenses, the number of successfully crafted adversarial samples is reduced compared to an unprotected baseline system. In particular, the ensemble defense is the most resilient to adversarial attacks. Importantly, none of the defenses significantly reduce the classification accuracy for detecting malware. Finally, we show that while adding additional hidden layers to neural models does not significantly improve the malware classification accuracy, it does significantly increase the classifier's robustness to adversarial attacks.
Despite widespread use of commercial anti-virus products, the number of malicious files detected on home and corporate computers continues to increase at a significant rate. Recently, anti-virus companies have started investing in machine learning solutions to augment signatures manually designed by analysts. A malicious file's determination is often represented as a hierarchical structure consisting of a type (e.g. Worm, Backdoor), a platform (e.g. Win32, Win64), a family (e.g. Rbot, Rugrat) and a family variant (e.g. A, B). While there has been substantial research in automated malware classification, the aforementioned hierarchical structure, which can provide additional information to the classification models, has been ignored. In this paper, we propose the novel idea and study the performance of employing hierarchical learning algorithms for automated classification of malicious files. To the best of our knowledge, this is the first research effort which incorporates the hierarchical structure of the malware label in its automated classification and in the security domain, in general. It is important to note that our method does not require any additional effort by analysts because they typically assign these hierarchical labels today. Our empirical results on a real world, industrial-scale malware dataset of 3.6 million files demonstrate that incorporation of the label hierarchy achieves a significant reduction of 33.1% in the binary error rate as compared to a non-hierarchical classifier which is traditionally used in such problems.