Machine Learning based Malware Detection in Cloud Environment using Clustering Approach
Title | Machine Learning based Malware Detection in Cloud Environment using Clustering Approach |
Publication Type | Conference Paper |
Year of Publication | 2020 |
Authors | Kumar, Rahul, Sethi, Kamalakanta, Prajapati, Nishant, Rout, Rashmi Ranjan, Bera, Padmalochan |
Conference Name | 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) |
Date Published | July 2020 |
Publisher | IEEE |
ISBN Number | 978-1-7281-6851-7 |
Keywords | cloud, cloud computing, clustering, Collaboration, collaboration agreements, composability, Computational modeling, cuckoo sandbox, False Positive Rate (FPR), feature extraction, machine learning, Malware, malware detection, policy-based governance, Principal Component Analysis (PCA), pubcrawl, Sandboxing, Scalability, Solid modeling, Training, Trend Micro Locality Sensitive Hashing (TLSH) |
Abstract | Enforcing security and resilience in a cloud platform is an essential but challenging problem due to the presence of a large number of heterogeneous applications running on shared resources. A security analysis system that can detect threats or malware must exist inside the cloud infrastructure. Much research has been done on machine learning-driven malware analysis, but it is limited in computational complexity and detection accuracy. To overcome these drawbacks, we proposed a new malware detection system based on the concept of clustering and trend micro locality sensitive hashing (TLSH). We used Cuckoo sandbox, which provides dynamic analysis reports of files by executing them in an isolated environment. We used a novel feature extraction algorithm to extract essential features from the malware reports obtained from the Cuckoo sandbox. Further, the most important features are selected using principal component analysis (PCA), random forest, and Chi-square feature selection methods. Subsequently, the experimental results are obtained for clustering and non-clustering approaches on three classifiers, including Decision Tree, Random Forest, and Logistic Regression. The model performance shows better classification accuracy and false positive rate (FPR) as compared to the state-of-the-art works and non-clustering approach at significantly lesser computation cost. |
URL | https://ieeexplore.ieee.org/document/9225627 |
DOI | 10.1109/ICCCNT49239.2020.9225627 |
Citation Key | kumar_machine_2020 |
- machine learning
- Trend Micro Locality Sensitive Hashing (TLSH)
- Training
- Solid modeling
- Scalability
- sandboxing
- pubcrawl
- Principal Component Analysis (PCA)
- policy-based governance
- malware detection
- malware
- cloud
- feature extraction
- False Positive Rate (FPR)
- cuckoo sandbox
- Computational modeling
- composability
- collaboration agreements
- collaboration
- clustering
- Cloud Computing