Visible to the public Similarity Analysis of Ransomware based on Portable Executable (PE) File Metadata

TitleSimilarity Analysis of Ransomware based on Portable Executable (PE) File Metadata
Publication TypeConference Paper
Year of Publication2021
AuthorsAyub, Md. Ahsan, Sirai, Ambareen
Conference Name2021 IEEE Symposium Series on Computational Intelligence (SSCI)
Date Publisheddec
KeywordsClassification algorithms, Clustering algorithms, composability, Data analysis, feature extraction, Forestry, machine learning, metadata, Metrics, pubcrawl, ransomware, Resiliency, static analysis, Support vector machines
AbstractThreats, posed by ransomware, are rapidly increasing, and its cost on both national and global scales is becoming significantly high as evidenced by the recent events. Ransomware carries out an irreversible process, where it encrypts victims' digital assets to seek financial compensations. Adversaries utilize different means to gain initial access to the target machines, such as phishing emails, vulnerable public-facing software, Remote Desktop Protocol (RDP), brute-force attacks, and stolen accounts. To combat these threats of ransomware, this paper aims to help researchers gain a better understanding of ransomware application profiles through static analysis, where we identify a list of suspicious indicators and similarities among 727 active ran-somware samples. We start with generating portable executable (PE) metadata for all the studied samples. With our domain knowledge and exploratory data analysis tasks, we introduce some of the suspicious indicators of the structure of ransomware files. We reduce the dimensionality of the generated dataset by using the Principal Component Analysis (PCA) technique and discover clusters by applying the KMeans algorithm. This motivates us to utilize the one-class classification algorithms on the generated dataset. As a result, the algorithms learn the common data boundary in the structure of our studied ransomware samples, and thereby, we achieve the data-driven similarities. We use the findings to evaluate the trained classifiers with the test samples and observe that the Local Outlier Factor (LoF) performs better on all the selected feature spaces compared to the One-Class SVM and the Isolation Forest algorithms.
DOI10.1109/SSCI50451.2021.9660019
Citation Keyayub_similarity_2021