Title | Similarity Analysis of Ransomware based on Portable Executable (PE) File Metadata |
Publication Type | Conference Paper |
Year of Publication | 2021 |
Authors | Ayub, Md. Ahsan, Sirai, Ambareen |
Conference Name | 2021 IEEE Symposium Series on Computational Intelligence (SSCI) |
Date Published | dec |
Keywords | Classification algorithms, Clustering algorithms, composability, Data analysis, feature extraction, Forestry, machine learning, metadata, Metrics, pubcrawl, ransomware, Resiliency, static analysis, Support vector machines |
Abstract | Threats, posed by ransomware, are rapidly increasing, and its cost on both national and global scales is becoming significantly high as evidenced by the recent events. Ransomware carries out an irreversible process, where it encrypts victims' digital assets to seek financial compensations. Adversaries utilize different means to gain initial access to the target machines, such as phishing emails, vulnerable public-facing software, Remote Desktop Protocol (RDP), brute-force attacks, and stolen accounts. To combat these threats of ransomware, this paper aims to help researchers gain a better understanding of ransomware application profiles through static analysis, where we identify a list of suspicious indicators and similarities among 727 active ran-somware samples. We start with generating portable executable (PE) metadata for all the studied samples. With our domain knowledge and exploratory data analysis tasks, we introduce some of the suspicious indicators of the structure of ransomware files. We reduce the dimensionality of the generated dataset by using the Principal Component Analysis (PCA) technique and discover clusters by applying the KMeans algorithm. This motivates us to utilize the one-class classification algorithms on the generated dataset. As a result, the algorithms learn the common data boundary in the structure of our studied ransomware samples, and thereby, we achieve the data-driven similarities. We use the findings to evaluate the trained classifiers with the test samples and observe that the Local Outlier Factor (LoF) performs better on all the selected feature spaces compared to the One-Class SVM and the Isolation Forest algorithms. |
DOI | 10.1109/SSCI50451.2021.9660019 |
Citation Key | ayub_similarity_2021 |