Visible to the public Pixel-based Feature for Android Malware Family Classification using Machine Learning Algorithms

TitlePixel-based Feature for Android Malware Family Classification using Machine Learning Algorithms
Publication TypeConference Paper
Year of Publication2021
AuthorsOsman, Mohd Zamri, Abidin, Ahmad Firdaus Zainal, Romli, Rahiwan Nazar, Darmawan, Mohd Faaizie
Conference Name2021 International Conference on Software Engineering Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM)
KeywordsAndroid malware, Decision Tree, Decision trees, Gray-scale, Human Behavior, k-nearest neighbours, machine learning, machine learning algorithms, Malware, malware classification, naive Bayes, pixel-based, Predictive Metrics, privacy, pubcrawl, Random Forest, Resiliency, Scientific computing, support vector machine, support vector machine classification
Abstract'Malicious software' or malware has been a serious threat to the security and privacy of all mobile phone users. Due to the popularity of smartphones, primarily Android, this makes them a very viable target for spreading malware. In the past, many solutions have proved ineffective and have resulted in many false positives. Having the ability to identify and classify malware will help prevent them from spreading and evolving. In this paper, we study the effectiveness of the proposed classification of the malware family using a pixel level as features. This study has implemented well-known machine learning and deep learning classifiers such as K-Nearest Neighbours (k-NN), Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree, and Random Forest. A binary file of 25 malware families is converted into a fixed grayscale image. The grayscale images were then extracted transforming the size 100x100 into a single format into 100000 columns. During this phase, none of the columns are removed as to remain the patterns in each malware family. The experimental results show that our approach achieved 92% accuracy in Random Forest, 88% in SVM, 81% in Decision Tree, 80% in k-NN and 56% in Naive Bayes classifier. Overall, the pixel-based feature also reveals a promising technique for identifying the family of malware with great accuracy, especially using the Random Forest classifier.
DOI10.1109/ICSECS52883.2021.00107
Citation Keyosman_pixel-based_2021