Machine Learning based Malware Detection in Cloud Environment using Clustering Approach

Submitted by grigby1 on Wed, 05/05/2021 - 12:56pm

Title	Machine Learning based Malware Detection in Cloud Environment using Clustering Approach
Publication Type	Conference Paper
Year of Publication	2020
Authors	Kumar, Rahul, Sethi, Kamalakanta, Prajapati, Nishant, Rout, Rashmi Ranjan, Bera, Padmalochan
Conference Name	2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT)
Date Published	July 2020
Publisher	IEEE
ISBN Number	978-1-7281-6851-7
Keywords	cloud, cloud computing, clustering, Collaboration, collaboration agreements, composability, Computational modeling, cuckoo sandbox, False Positive Rate (FPR), feature extraction, machine learning, Malware, malware detection, policy-based governance, Principal Component Analysis (PCA), pubcrawl, Sandboxing, Scalability, Solid modeling, Training, Trend Micro Locality Sensitive Hashing (TLSH)
Abstract	Enforcing security and resilience in a cloud platform is an essential but challenging problem due to the presence of a large number of heterogeneous applications running on shared resources. A security analysis system that can detect threats or malware must exist inside the cloud infrastructure. Much research has been done on machine learning-driven malware analysis, but it is limited in computational complexity and detection accuracy. To overcome these drawbacks, we proposed a new malware detection system based on the concept of clustering and trend micro locality sensitive hashing (TLSH). We used Cuckoo sandbox, which provides dynamic analysis reports of files by executing them in an isolated environment. We used a novel feature extraction algorithm to extract essential features from the malware reports obtained from the Cuckoo sandbox. Further, the most important features are selected using principal component analysis (PCA), random forest, and Chi-square feature selection methods. Subsequently, the experimental results are obtained for clustering and non-clustering approaches on three classifiers, including Decision Tree, Random Forest, and Logistic Regression. The model performance shows better classification accuracy and false positive rate (FPR) as compared to the state-of-the-art works and non-clustering approach at significantly lesser computation cost.
URL	https://ieeexplore.ieee.org/document/9225627
DOI	10.1109/ICCCNT49239.2020.9225627
Citation Key	kumar_machine_2020

Groups:

Science of Security VO