Visible to the public ScriptNet: Neural Static Analysis for Malicious JavaScript Detection

TitleScriptNet: Neural Static Analysis for Malicious JavaScript Detection
Publication TypeConference Paper
Year of Publication2019
AuthorsStokes, J. W., Agrawal, R., McDonald, G., Hausknecht, M.
Conference NameMILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM)
Keywordscomposability, computer infection threat vector, Deep Learning, deep learning model, discriminative training, Human Behavior, Internet-scale Computing Security, Internet-scale processing, invasive software, Java, JavaScript files, learning (artificial intelligence), LSTM, machine learning, Malware, malware detection, Metrics, neural malicious JavaScript detection, Neural models, neural nets, Neural networks, neural static analysis, PIL model, policy governance, preinformant learning, Privacy-invasive software, program diagnostics, pubcrawl, Resiliency, ScriptNet system, sequential processing layers, Vectors
AbstractMalicious scripts are an important computer infection threat vector for computer users. For internet-scale processing, static analysis offers substantial computing efficiencies. We propose the ScriptNet system for neural malicious JavaScript detection which is based on static analysis. We also propose a novel deep learning model, Pre-Informant Learning (PIL), which processes Javascript files as byte sequences. Lower layers capture the sequential nature of these byte sequences while higher layers classify the resulting embedding as malicious or benign. Unlike previously proposed solutions, our model variants are trained in an end-to-end fashion allowing discriminative training even for the sequential processing layers. Evaluating this model on a large corpus of 212,408 JavaScript files indicates that the best performing PIL model offers a 98.10% true positive rate (TPR) for the first 60K byte subsequences and 81.66% for the full-length files, at a false positive rate (FPR) of 0.50%. Both models significantly outperform several baseline models. The best performing PIL model can successfully detect 92.02% of unknown malware samples in a hindsight experiment where the true labels of the malicious JavaScript files were not known when the model was trained.
DOI10.1109/MILCOM47813.2019.9020870
Citation Keystokes_scriptnet_2019