Biblio
Filters: Keyword is malware analysis [Clear All Filters]
Detecting Malware, Malicious URLs and Virus Using Machine Learning and Signature Matching. 2021 2nd International Conference for Emerging Technology (INCET). :1–5.
.
2021. Nowadays most of our data is stored on an electronic device. The risk of that device getting infected by Viruses, Malware, Worms, Trojan, Ransomware, or any unwanted invader has increased a lot these days. This is mainly because of easy access to the internet. Viruses and malware have evolved over time so identification of these files has become difficult. Not only by viruses and malware your device can be attacked by a click on forged URLs. Our proposed solution for this problem uses machine learning techniques and signature matching techniques. The main aim of our solution is to identify the malicious programs/URLs and act upon them. The core idea in identifying the malware is selecting the key features from the Portable Executable file headers using these features we trained a random forest model. This RF model will be used for scanning a file and determining if that file is malicious or not. For identification of the virus, we are using the signature matching technique which is used to match the MD5 hash of the file with the virus signature database containing the MD5 hash of the identified viruses and their families. To distinguish between benign and illegitimate URLs there is a logistic regression model used. The regression model uses a tokenizer for feature extraction from the URL that is to be classified. The tokenizer separates all the domains, sub-domains and separates the URLs on every `/'. Then a TfidfVectorizer (Term Frequency - Inverse Document Frequency) is used to convert the text into a weighted value. These values are used to predict if the URL is safe to visit or not. On the integration of all three modules, the final application will provide full system protection against malicious software.
Machine Learning Based Improved Malware Detection Schemes. 2021 11th International Conference on Cloud Computing, Data Science Engineering (Confluence). :925–931.
.
2021. In recent years, cyber security has become a challenging task to protect the networks and computing systems from various types of digital attacks. Therefore, to preserve these systems, various innovative methods have been reported and implemented in practice. However, still more research work needs to be carried out to have malware free computing system. In this paper, an attempt has been made to develop simple but reliable ML based malware detection systems which can be implemented in practice. Keeping this in view, the present paper has proposed and compared the performance of three ML based malware detection systems applicable for computer systems. The proposed methods include k-NN, RF and LR for detection purpose and the features extracted comprise of Byte and ASM. The performance obtained from the simulation study of the proposed schemes has been evaluated in terms of ROC, Log loss plot, accuracy, precision, recall, specificity, sensitivity and F1-score. The analysis of the various results clearly demonstrates that the RF based malware detection scheme outperforms the model based on k-NN and LR The efficiency of detection of proposed ML models is either same or comparable to deep learning-based methods.
Malware Subspecies Detection Method by Suffix Arrays and Machine Learning. 2021 55th Annual Conference on Information Sciences and Systems (CISS). :1–6.
.
2021. Malware such as metamorphic virus changes its codes and it cannot be detected by pattern matching. Such malware can be detected by surface analysis, dynamic analysis or static analysis. We focused on surface analysis since neither virtual environments nor high level engineering is required. A representative method in surface analysis is n-gram with machine learning. On the other hand, important features are sometimes cut off by n-gram since n is not variable in some existing methods. Hence, scores of malware detection methods are not perfect. Moreover, creating n-gram features takes long time for comparing files. Furthermore, in some n-gram methods, invisible malware can be created when the methods are known to attackers. Therefore, we proposed a new malware subspecies detection method by suffix arrays and machine learning. We evaluated the method with four real malware subspecies families and succeeded to classify them with almost 100% accuracy.
Detection of Malware using Machine Learning based on Operation Code Frequency. 2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT). :214–220.
.
2021. One of the many methods for identifying malware is to disassemble the malware files and obtain the opcodes from them. Since malware have predominantly been found to contain specific opcode sequences in them, the presence of the same sequences in any incoming file or network content can be taken up as a possible malware identification scheme. Malware detection systems help us to understand more about ways on how malware attack a system and how it can be prevented. The proposed method analyses malware executable files with the help of opcode information by converting the incoming executable files to assembly language thereby extracting opcode information (opcode count) from the same. The opcode count is then converted into opcode frequency which is stored in a CSV file format. The CSV file is passed to various machine learning algorithms like Decision Tree Classifier, Random Forest Classifier and Naive Bayes Classifier. Random Forest Classifier produced the highest accuracy and hence the same model was used to predict whether an incoming file contains a potential malware or not.
Analysis of Data Transforming Technology for Malware Detection. 2021 21st ACIS International Winter Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD-Winter). :224–229.
.
2021. As AI technology advances and its use increases, efforts to incorporate machine learning for malware detection are increasing. However, for malware learning, a standardized data set is required. Because malware is unstructured data, it cannot be directly learned. In order to solve this problem, many studies have attempted to convert unstructured data into structured data. In this study, the features and limitations of each were analyzed by investigating and analyzing the method of converting unstructured data proposed in each study into structured data. As a result, most of the data conversion techniques suggest conversion mechanisms, but the scope of each technique has not been determined. The resulting data set is not suitable for use as training data because it has infinite properties.
Graph-Based Malware Detection Using Opcode Sequences. 2021 9th International Symposium on Digital Forensics and Security (ISDFS). :1–5.
.
2021. The impact of malware grows for IT (information technology) systems day by day. The number, the complexity, and the cost of them increase rapidly. While researchers are developing new and better detection algorithms, attackers are also evolving malware to fail the current detection techniques. Therefore malware detection becomes one of the most challenging tasks in cyber security. To increase the performance of the detection techniques, researchers benefit from different approaches. But some of them might cost a lot both in time and hardware resources. This situation puts forward fast and cheap detection methods. In this context, static analysis provides these utilities but it is important to keep detection accuracy high while reducing resource consumption. Opcodes (operational codes) are commonly used in static analysis but sometimes feature extraction from opcodes might be difficult since an opcode sequence might have a great length. Furthermore, most of the malware developers use obfuscation and encryption techniques to avoid detection methods based on static analysis. This kind of malware is called packed malware and according to common belief, packed malware should be either unpacked or analyzed dynamically in order to detect them. In this study, a graph-based malware detection method has been proposed to overcome these problems. The proposed method relies on obtaining the opcode graph of every executable file in the dataset and using them for future extraction. In this way, the proposed method reaches up to 98% detection accuracy. In addition to the accuracy rate, the proposed method makes it possible to detect packed malware without the need for unpacking or dynamic analysis.
Clustering Analysis of Email Malware Campaigns. 2021 IEEE International Conference on Cyber Security and Resilience (CSR). :95–102.
.
2021. The task of malware labeling on real datasets faces huge challenges—ever-changing datasets and lack of ground-truth labels—owing to the rapid growth of malware. Clustering malware on their respective families is a well known tool used for improving the efficiency of the malware labeling process. In this paper, we addressed the challenge of clustering email malware, and carried out a cluster analysis on a real dataset collected from email campaigns over a 13-month period. Our main original contribution is to analyze the usefulness of email’s header information for malware clustering (a novel approach proposed by Burton [1]), and compare it with features collected from the malware directly. We compare clustering based on email header’s information with traditional features extracted from varied resources provided by VirusTotal [2], including static and dynamic analysis. We show that email header information has an excellent performance.
EntropLyzer: Android Malware Classification and Characterization Using Entropy Analysis of Dynamic Characteristics. 2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS). :1–12.
.
2021. The unmatched threat of Android malware has tremendously increased the need for analyzing prominent malware samples. There are remarkable efforts in static and dynamic malware analysis using static features and API calls respectively. Nonetheless, there is a void to classify Android malware by analyzing its behavior using multiple dynamic characteristics. This paper proposes EntropLyzer, an entropy-based behavioral analysis technique for classifying the behavior of 12 eminent Android malware categories and 147 malware families taken from CCCS-CIC-AndMal2020 dataset. This work uses six classes of dynamic characteristics including memory, API, network, logcat, battery, and process to classify and characterize Android malware. Results reveal that the entropy-based analysis successfully determines the behavior of all malware categories and most of the malware families before and after rebooting the emulator.
Automatic Generation of Different Malware. 2021 29th Signal Processing and Communications Applications Conference (SIU). :1–4.
.
2021. The use of mobile devices has increased dramatically in recent years. These smart devices allow us to easily perform many functions such as e-mail, internet, Bluetooth, SMS and MMS without restriction of time and place. Thus, these devices have become an indispensable part of our lives today. Due to this high usage, malware developers have turned to this platform and many mobile malware has emerged in recent years. Many security companies and experts have developed methods to protect our mobile devices. In this study, in order to contribute to mobile malware detection and analysis, an application has been implemented that automatically injects payload into normal apk. With this application, it is aimed to create a data set that can be used by security companies and experts.
Pay Attention: Improving Classification of PE Malware Using Attention Mechanisms Based on System Call Analysis. 2021 International Joint Conference on Neural Networks (IJCNN). :1–8.
.
2021. Malware poses a threat to computing systems worldwide, and security experts work tirelessly to detect and classify malware as accurately and quickly as possible. Since malware can use evasion techniques to bypass static analysis and security mechanisms, dynamic analysis methods are more useful for accurately analyzing the behavioral patterns of malware. Previous studies showed that malware behavior can be represented by sequences of executed system calls and that machine learning algorithms can leverage such sequences for the task of malware classification (a.k.a. malware categorization). Accurate malware classification is helpful for malware signature generation and is thus beneficial to antivirus vendors; this capability is also valuable to organizational security experts, enabling them to mitigate malware attacks and respond to security incidents. In this paper, we propose an improved methodology for malware classification, based on analyzing sequences of system calls invoked by malware in a dynamic analysis environment. We show that adding an attention mechanism to a LSTM model improves accuracy for the task of malware classification, thus outperforming the state-of-the-art algorithm by up to 6%. We also show that the transformer architecture can be used to analyze very long sequences with significantly lower time complexity for training and prediction. Our proposed method can serve as the basis for a decision support system for security experts, for the task of malware categorization.
Cross Platform IoT-Malware Family Classification Based on Printable Strings. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :775–784.
.
2020. In this era of rapid network development, Internet of Things (IoT) security considerations receive a lot of attention from both the research and commercial sectors. With limited computation resource, unfriendly interface, and poor software implementation, legacy IoT devices are vulnerable to many infamous mal ware attacks. Moreover, the heterogeneity of IoT platforms and the diversity of IoT malware make the detection and classification of IoT malware even more challenging. In this paper, we propose to use printable strings as an easy-to-get but effective cross-platform feature to identify IoT malware on different IoT platforms. The discriminating capability of these strings are verified using a set of machine learning algorithms on malware family classification across different platforms. The proposed scheme shows a 99% accuracy on a large scale IoT malware dataset consisted of 120K executable fils in executable and linkable format when the training and test are done on the same platform. Meanwhile, it also achieves a 96% accuracy when training is carried out on a few popular IoT platforms but test is done on different platforms. Efficient malware prevention and mitigation solutions can be enabled based on the proposed method to prevent and mitigate IoT malware damages across different platforms.
Combinatorial Code Classification Amp; Vulnerability Rating. 2020 Second International Conference on Transdisciplinary AI (TransAI). :80–83.
.
2020. Empirical analysis of source code of Android Fluoride Bluetooth stack demonstrates a novel approach of classification of source code and rating for vulnerability. A workflow that combines deep learning and combinatorial techniques with a straightforward random forest regression is presented. Two kinds of embedding are used: code2vec and LSTM, resulting in a distance matrix that is interpreted as a (combinatorial) graph whose vertices represent code components, functions and methods. Cluster Editing is then applied to partition the vertex set of the graph into subsets representing nearly complete subgraphs. Finally, the vectors representing the components are used as features to model the components for vulnerability risk.
Malware Detection for Industrial Internet Based on GAN. 2020 IEEE International Conference on Information Technology,Big Data and Artificial Intelligence (ICIBA). 1:475–481.
.
2020. This thesis focuses on the detection of malware in industrial Internet. The basic flow of the detection of malware contains feature extraction and sample identification. API graph can effectively represent the behavior information of malware. However, due to the high algorithm complexity of solving the problem of subgraph isomorphism, the efficiency of analysis based on graph structure feature is low. Due to the different scales of API graph of different malicious codes, the API graph needs to be normalized. Considering the difficulties of sample collection and manual marking, it is necessary to expand the number of malware samples in industrial Internet. This paper proposes a method that combines PageRank with TF-IDF to process the API graph. Besides, this paper proposes a method to construct the adversarial samples of malwares based on GAN.
"Digital Bombs" Neutralization Method. 2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). :446–451.
.
2020. The article discusses new models and methods for timely identification and blocking of malicious code of critically important information infrastructure based on static and dynamic analysis of executable program codes. A two-stage method for detecting malicious code in the executable program codes (the so-called "digital bombs") is described. The first step of the method is to build the initial program model in the form of a control graph, the construction is carried out at the stage of static analysis of the program. The article discusses the purpose, features and construction criteria of an ordered control graph. The second step of the method is to embed control points in the program's executable code for organizing control of the possible behavior of the program using a specially designed recognition automaton - an automaton of dynamic control. Structural criteria for the completeness of the functional control of the subprogram are given. The practical implementation of the proposed models and methods was completed and presented in a special instrumental complex IRIDA.
Analysis and Modelling of Multi-Stage Attacks. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1268–1275.
.
2020. Honeypots are the information system resources used for capturing and analysis of cyber attacks. Highinteraction Honeypots are capable of capturing attacks in their totality and hence are an ideal choice for capturing multi-stage cyber attacks. The term multi-stage attack is an abstraction that refers to a class of cyber attacks consisting of multiple attack stages. These attack stages are executed either by malicious codes, scripts or sometimes even inbuilt system tools. In the work presented in this paper we have proposed a framework for capturing, analysis and modelling of multi-stage cyber attacks. The objective of our work is to devise an effective mechanism for the classification of multi-stage cyber attacks. The proposed framework comprise of a network of high interaction honeypots augmented with an attack analysis engine. The analysis engine performs rule based labeling of captured honeypot data. The labeling engine labels the attack data as generic events. These events are further fused to generate attack graphs. The hence generated attack graphs are used to characterize and later classify the multi-stage cyber attacks.
An Efficient Malware Detection Technique Using Complex Network-Based Approach. 2020 National Conference on Communications (NCC). :1–6.
.
2020. System security is becoming an indispensable part of our daily life due to the rapid proliferation of unknown malware attacks. Recent malware found to have a very complicated structure that is hard to detect by the traditional malware detection techniques such as antivirus, intrusion detection systems, and network scanners. In this paper, we propose a complex network-based malware detection technique, Malware Detection using Complex Network (MDCN), that considers Application Program Interface Call Transition Matrix (API-CTM) to generate complex network topology and then extracts various feature set by analyzing different metrics of the complex network to distinguish malware and benign applications. The generated feature set is then sent to several machine learning classifiers, which include naive-Bayes, support vector machine, random forest, and multilayer perceptron, to comparatively analyze the performance of MDCN-based technique. The analysis reveals that MDCN shows higher accuracy, with lower false-positive cases, when the multilayer perceptron-based classifier is used for the detection of malware. MDCN technique can efficiently be deployed in the design of an integrated enterprise network security system.
A Semi-Supervised Learning Scheme to Detect Unknown DGA Domain Names Based on Graph Analysis. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1578–1583.
.
2020. A large amount of malware families use the domain generation algorithms (DGA) to randomly generate a large amount of domain names. It is a good way to bypass conventional blacklists of domain names, because we cannot predict which of the randomly generated domain names are selected for command and control (C&C) communications. An effective approach for detecting known DGA families is to investigate the malware with reverse engineering to find the adopted generation algorithms. As reverse engineering cannot handle the variants of DGA families, some researches leverage supervised learning to find new variants. However, the explainability of supervised learning is low and cannot find previously unseen DGA families. In this paper, we propose a graph-based semi-supervised learning scheme to track the evolution of known DGA families and find previously unseen DGA families. With a domain relation graph, we can clearly figure out how new variants relate to known DGA domain names, which induces better explainability. We deployed the proposed scheme on real network scenarios and show that the proposed scheme can not only comprehensively and precisely find known DGA families, but also can find new DGA families which have not seen before.
Automatically Generating Malware Summary Using Semantic Behavior Graphs (SBGs). 2020 Information Communication Technologies Conference (ICTC). :282–291.
.
2020. In malware behavior analysis, there are limitations in the analysis method of control flow and data flow. Researchers analyzed data flow by dynamic taint analysis tools, however, it cost a lot. In this paper, we proposed a method of generating malware summary based on semantic behavior graphs (SBGs, Semantic Behavior Graphs) to address this issue. In this paper, we considered various situation where behaviors be capable of being associated, thus an algorithm of generating semantic behavior graphs was given firstly. Semantic behavior graphs are composed of behavior nodes and associated data edges. Then, we extracted behaviors and logical relationships between behaviors from semantic behavior graphs, and finally generated a summary of malware behaviors with true intension. Experimental results showed that our approach can effectively identify and describe malicious behaviors and generate accurate behavior summary.
LGMal: A Joint Framework Based on Local and Global Features for Malware Detection. 2020 International Wireless Communications and Mobile Computing (IWCMC). :463–468.
.
2020. With the gradual advancement of smart city construction, various information systems have been widely used in smart cities. In order to obtain huge economic benefits, criminals frequently invade the information system, which leads to the increase of malware. Malware attacks not only seriously infringe on the legitimate rights and interests of users, but also cause huge economic losses. Signature-based malware detection algorithms can only detect known malware, and are susceptible to evasion techniques such as binary obfuscation. Behavior-based malware detection methods can solve this problem well. Although there are some malware behavior analysis works, they may ignore semantic information in the malware API call sequence. In this paper, we design a joint framework based on local and global features for malware detection to solve the problem of network security of smart cities, called LGMal, which combines the stacked convolutional neural network and graph convolutional networks. Specially, the stacked convolutional neural network is used to learn API call sequence information to capture local semantic features and the graph convolutional networks is used to learn API call semantic graph structure information to capture global semantic features. Experiments on Alibaba Cloud Security Malware Detection datasets show that the joint framework gets better results. The experimental results show that the precision is 87.76%, the recall is 88.08%, and the F1-measure is 87.79%. We hope this paper can provide a useful way for malware detection and protect the network security of smart city.
A Malware Similarity Analysis Method Based on Network Control Structure Graph. 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS). :295–300.
.
2020. Recently, graph-based malware similarity analysis has been widely used in the field of malware detection. However, the wide application of code obfuscation, polymorphism, and deformation changes the structure of malicious code, which brings great challenges to the malware similarity analysis. To solve these problems, in this paper, we present a new approach to malware similarity analysis based on the network control structure graph (NCSG). This method analyzed the behavior of malware by application program interface (API) association and constructed NCSG. The graph could reflect the command-and-control(C&C) logic of malware. Therefore, it can resist the interference of code obfuscation technology. The structural features extracted from NCSG will be used as the basis of similarity analysis for training the detection model. Finally, we tested the dataset constructed from five known malware family samples, and the experimental results showed that the accuracy of this method for malware variation analysis reached 92.75%. In conclusion, the malware similarity analysis based on NCSG has a strong application value for identifying the same family of malware.
Analysis of Malware Prediction Based on Infection Rate Using Machine Learning Techniques. 2020 IEEE Region 10 Symposium (TENSYMP). :706–709.
.
2020. In this modern, technological age, the internet has been adopted by the masses. And with it, the danger of malicious attacks by cybercriminals have increased. These attacks are done via Malware, and have resulted in billions of dollars of financial damage. This makes the prevention of malicious attacks an essential part of the battle against cybercrime. In this paper, we are applying machine learning algorithms to predict the malware infection rates of computers based on its features. We are using supervised machine learning algorithms and gradient boosting algorithms. We have collected a publicly available dataset, which was divided into two parts, one being the training set, and the other will be the testing set. After conducting four different experiments using the aforementioned algorithms, it has been discovered that LightGBM is the best model with an AUC Score of 0.73926.
A framework for automated dynamic malware analysis for Linux. 2020 28th Telecommunications Forum (℡FOR). :1–4.
.
2020. Development of malware protection tools requires a more advanced test environment comparing to safe software. This kind of development includes a safe execution of many malware samples in order to evaluate the protective power of the tool. The host machine needs to be protected from the harmful effects of malware samples and provide a realistic simulation of the execution environment. In this paper, a framework for automated malware analysis on Linux is presented. Different types of malware analysis methods are discussed, as well as the properties of a good framework for dynamic malware analysis.
A Malware Detection Approach Using Malware Images and Autoencoders. 2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). :1–6.
.
2020. Most machine learning-based malware detection systems use various supervised learning methods to classify different instances of software as benign or malicious. This approach provides no information regarding the behavioral characteristics of malware. It also requires a large amount of training data and is prone to labeling difficulties and can reduce accuracy due to redundant training data. Therefore, we propose a malware detection method based on deep learning, which uses malware images and a set of autoencoders to detect malware. The method is to design an autoencoder to learn the functional characteristics of malware, and then to observe the reconstruction error of autoencoder to realize the classification and detection of malware and benign software. The proposed approach achieves 93% accuracy and comparatively better F1-score values while detecting malware and needs little training data when compared with traditional malware detection systems.
Forensic Malware Identification Using Naive Bayes Method. 2020 International Conference on Information Technology Systems and Innovation (ICITSI). :1–7.
.
2020. Malware is a kind of software that, if installed on a malware victim's device, might carry malicious actions. The malicious actions might be data theft, system failure, or denial of service. Malware analysis is a process to identify whether a piece of software is a malware or not. However, with the advancement of malware technologies, there are several evasion techniques that could be implemented by malware developers to prevent analysis, such as polymorphic and oligomorphic. Therefore, this research proposes an automatic malware detection system. In the system, the malware characteristics data were obtained through both static and dynamic analysis processes. Data from the analysis process were classified using Naive Bayes algorithm to identify whether the software is a malware or not. The process of identifying malware and benign files using the Naive Bayes machine learning method has an accuracy value of 93 percent for the detection process using static characteristics and 85 percent for detection through dynamic characteristics.
Malware Analysis using Machine Learning and Deep Learning techniques. 2020 SoutheastCon. 2:1–7.
.
2020. In this era, where the volume and diversity of malware is rising exponentially, new techniques need to be employed for faster and accurate identification of the malwares. Manual heuristic inspection of malware analysis are neither effective in detecting new malware, nor efficient as they fail to keep up with the high spreading rate of malware. Machine learning approaches have therefore gained momentum. They have been used to automate static and dynamic analysis investigation where malware having similar behavior are clustered together, and based on the proximity unknown malwares get classified to their respective families. Although many such research efforts have been conducted where data-mining and machine-learning techniques have been applied, in this paper we show how the accuracy can further be improved using deep learning networks. As deep learning offers superior classification by constructing neural networks with a higher number of potentially diverse layers it leads to improvement in automatic detection and classification of the malware variants.In this research, we present a framework which extracts various feature-sets such as system calls, operational codes, sections, and byte codes from the malware files. In the experimental and result section, we compare the accuracy obtained from each of these features and demonstrate that feature vector for system calls yields the highest accuracy. The paper concludes by showing how deep learning approach performs better than the traditional shallow machine learning approaches.