Visible to the public Text Mining for Malware Classification Using Multivariate All Repeated Patterns Detection

TitleText Mining for Malware Classification Using Multivariate All Repeated Patterns Detection
Publication TypeConference Paper
Year of Publication2019
AuthorsXylogiannopoulos, Konstantinos F., Karampelas, Panagiotis, Alhajj, Reda
Conference Name2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
Date PublishedAugust 2019
PublisherACM
KeywordsAndroid Malware Detection, Androids, ARPaD, bank accounts, cyber criminals, data mining, Human Behavior, Humanoid robots, Internet, invasive software, legitimate application, LERP-RSA, Malware, malware classification, malware dataset, malware detection, malware families, malware family classification, Metrics, mobile computing, mobile devices, Mobile handsets, mobile phones, mobile users, multivariate all repeated pattern detection, Payloads, privacy, pubcrawl, randomly selected malware applications, resilience, Resiliency, smart phones, smartphone users, Social network services, text mining
Abstract

Mobile phones have become nowadays a commodity to the majority of people. Using them, people are able to access the world of Internet and connect with their friends, their colleagues at work or even unknown people with common interests. This proliferation of the mobile devices has also been seen as an opportunity for the cyber criminals to deceive smartphone users and steel their money directly or indirectly, respectively, by accessing their bank accounts through the smartphones or by blackmailing them or selling their private data such as photos, credit card data, etc. to third parties. This is usually achieved by installing malware to smartphones masking their malevolent payload as a legitimate application and advertise it to the users with the hope that mobile users will install it in their devices. Thus, any existing application can easily be modified by integrating a malware and then presented it as a legitimate one. In response to this, scientists have proposed a number of malware detection and classification methods using a variety of techniques. Even though, several of them achieve relatively high precision in malware classification, there is still space for improvement. In this paper, we propose a text mining all repeated pattern detection method which uses the decompiled files of an application in order to classify a suspicious application into one of the known malware families. Based on the experimental results using a real malware dataset, the methodology tries to correctly classify (without any misclassification) all randomly selected malware applications of 3 categories with 3 different families each.

URLhttps://dl.acm.org/doi/10.1145/3341161.3350841
DOI10.1145/3341161.3350841
Citation Keyxylogiannopoulos_text_2019