Static Analysis with Paragraph Vector for Malware Detection

Submitted by grigby1 on Tue, 01/23/2018 - 3:52pm

Title	Static Analysis with Paragraph Vector for Malware Detection
Publication Type	Conference Paper
Year of Publication	2017
Authors	Nagano, Yuta, Uda, Ryuya
Conference Name	Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication
Date Published	January 2017
Publisher	ACM
Conference Location	New York, NY, USA
ISBN Number	978-1-4503-4888-1
Keywords	Human Behavior, k-nearest neighbor algorithm, machine learning, Malware, malware analysis, Metrics, paragraph vector, privacy, pubcrawl, Resiliency, static analysis, support vector machine, threat vectors
Abstract	Malware damages computers and the threat is a serious problem. Malware can be detected by pattern matching method or dynamic heuristic method. However, it is difficult to detect all new malware subspecies perfectly by existing methods. In this paper, we propose a new method which automatically detects new malware subspecies by static analysis of execution files and machine learning. The method can distinguish malware from benignware and it can also classify malware subspecies into malware families. We combine static analysis of execution files with machine learning classifier and natural language processing by machine learning. Information of DLL Import, assembly code and hexdump are acquired by static analysis of execution files of malware and benignware to create feature vectors. Paragraph vectors of information by static analysis of execution files are created by machine learning of PV-DBOW model for natural language processing. Support vector machine and classifier of k-nearest neighbor algorithm are used in our method, and the classifier learns paragraph vectors of information by static analysis. Unknown execution files are classified into malware or benignware by pre-learned SVM. Moreover, malware subspecies are also classified into malware families by pre-learned k-nearest. We evaluate the accuracy of the classification by experiments. We think that new malware subspecies can be effectively detected by our method without existing methods for malware analysis such as generic method and dynamic heuristic method.
URL	https://dl.acm.org/doi/10.1145/3022227.3022306
DOI	10.1145/3022227.3022306
Citation Key	nagano_static_2017

Groups:

Science of Security VO