ScriptNet: Neural Static Analysis for Malicious JavaScript Detection

Submitted by aekwall on Mon, 11/30/2020 - 12:03pm

Title	ScriptNet: Neural Static Analysis for Malicious JavaScript Detection
Publication Type	Conference Paper
Year of Publication	2019
Authors	Stokes, J. W., Agrawal, R., McDonald, G., Hausknecht, M.
Conference Name	MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM)
Keywords	composability, computer infection threat vector, Deep Learning, deep learning model, discriminative training, Human Behavior, Internet-scale Computing Security, Internet-scale processing, invasive software, Java, JavaScript files, learning (artificial intelligence), LSTM, machine learning, Malware, malware detection, Metrics, neural malicious JavaScript detection, Neural models, neural nets, Neural networks, neural static analysis, PIL model, policy governance, preinformant learning, Privacy-invasive software, program diagnostics, pubcrawl, Resiliency, ScriptNet system, sequential processing layers, Vectors
Abstract	Malicious scripts are an important computer infection threat vector for computer users. For internet-scale processing, static analysis offers substantial computing efficiencies. We propose the ScriptNet system for neural malicious JavaScript detection which is based on static analysis. We also propose a novel deep learning model, Pre-Informant Learning (PIL), which processes Javascript files as byte sequences. Lower layers capture the sequential nature of these byte sequences while higher layers classify the resulting embedding as malicious or benign. Unlike previously proposed solutions, our model variants are trained in an end-to-end fashion allowing discriminative training even for the sequential processing layers. Evaluating this model on a large corpus of 212,408 JavaScript files indicates that the best performing PIL model offers a 98.10% true positive rate (TPR) for the first 60K byte subsequences and 81.66% for the full-length files, at a false positive rate (FPR) of 0.50%. Both models significantly outperform several baseline models. The best performing PIL model can successfully detect 92.02% of unknown malware samples in a hindsight experiment where the true labels of the malicious JavaScript files were not known when the model was trained.
DOI	10.1109/MILCOM47813.2019.9020870
Citation Key	stokes_scriptnet_2019

Groups:

Science of Security VO