Visible to the public Android Malware Family Classification Based on Sensitive Opcode Sequence

TitleAndroid Malware Family Classification Based on Sensitive Opcode Sequence
Publication TypeConference Paper
Year of Publication2019
AuthorsJiang, Jianguo, Li, Song, Yu, Min, Li, Gang, Liu, Chao, Chen, Kai, Liu, Hui, Huang, Weiqing
Conference Name2019 IEEE Symposium on Computers and Communications (ISCC)
Date PublishedJuly 2019
PublisherIEEE
KeywordsAndroid (operating system), Android malware, Android malware analysis, Android Malware Detection, Android malware family classification model, Android malware forensics, application program interfaces, code specific semantic information, digital forensics, Drebin dataset, family classification, feature extraction, Human Behavior, invasive software, learning (artificial intelligence), malware classification, Metrics, mobile computing, multiple class classification, oversampling technique, pattern classification, privacy, pubcrawl, resilience, Resiliency, Semantic, semantic related vector, sensitive API, sensitive opcode, sensitive semantic feature-sensitive opcode sequence
Abstract

Android malware family classification is an advanced task in Android malware analysis, detection and forensics. Existing methods and models have achieved a certain success for Android malware detection, but the accuracy and the efficiency are still not up to the expectation, especially in the context of multiple class classification with imbalanced training data. To address those challenges, we propose an Android malware family classification model by analyzing the code's specific semantic information based on sensitive opcode sequence. In this work, we construct a sensitive semantic feature-sensitive opcode sequence using opcodes, sensitive APIs, STRs and actions, and propose to analyze the code's specific semantic information, generate a semantic related vector for Android malware family classification based on this feature. Besides, aiming at the families with minority, we adopt an oversampling technique based on the sensitive opcode sequence. Finally, we evaluate our method on Drebin dataset, and select the top 40 malware families for experiments. The experimental results show that the Total Accuracy and Average AUC (Area Under Curve, AUC) reach 99.50% and 98.86% with 45. 17s per Android malware, and even if the number of malware families increases, these results remain good.

URLhttps://ieeexplore.ieee.org/document/8969656
DOI10.1109/ISCC47284.2019.8969656
Citation Keyjiang_android_2019