Biblio

List
Filter

Found 15086 results

Filters: Keyword is pubcrawl [Clear All Filters]

2023-09-20

He, Zhenghao. 2022. Comparison Of Different Machine Learning Methods Applied To Obesity Classification. 2022 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE). :467—472.

Estimation for obesity levels is always an important topic in medical field since it can provide useful guidance for people that would like to lose weight or keep fit. The article tries to find a model that can predict obesity and provides people with the information of how to avoid overweight. To be more specific, this article applied dimension reduction to the data set to simplify the data and tried to Figure out a most decisive feature of obesity through Principal Component Analysis (PCA) based on the data set. The article also used some machine learning methods like Support Vector Machine (SVM), Decision Tree to do prediction of obesity and wanted to find the major reason of obesity. In addition, the article uses Artificial Neural Network (ANN) to do prediction which has more powerful feature extraction ability to do this. Finally, the article found that family history of obesity is the most decisive feature, and it may because of obesity may be greatly affected by genes or the family eating diet may have great influence. And both ANN and Decision tree’s accuracy of prediction is higher than 90%.

Samia, Bougareche, Soraya, Zehani, Malika, Mimi. 2022. Fashion Images Classification using Machine Learning, Deep Learning and Transfer Learning Models. 2022 7th International Conference on Image and Signal Processing and their Applications (ISPA). :1—5.

Fashion is the way we present ourselves which mainly focuses on vision, has attracted great interest from computer vision researchers. It is generally used to search fashion products in online shopping malls to know the descriptive information of the product. The main objectives of our paper is to use deep learning (DL) and machine learning (ML) methods to correctly identify and categorize clothing images. In this work, we used ML algorithms (support vector machines (SVM), K-Nearest Neirghbors (KNN), Decision tree (DT), Random Forest (RF)), DL algorithms (Convolutionnal Neurals Network (CNN), AlexNet, GoogleNet, LeNet, LeNet5) and the transfer learning using a pretrained models (VGG16, MobileNet and RestNet50). We trained and tested our models online using google colaboratory with Tensorflow/Keras and Scikit-Learn libraries that support deep learning and machine learning in Python. The main metric used in our study to evaluate the performance of ML and DL algorithms is the accuracy and matrix confusion. The best result for the ML models is obtained with the use of ANN (88.71%) and for the DL models is obtained for the GoogleNet architecture (93.75%). The results obtained showed that the number of epochs and the depth of the network have an effect in obtaining the best results.

Zhang, Chengzhao, Tang, Huiyue. 2022. Empirical Research on Multifactor Quantitative Stock Selection Strategy Based on Machine Learning. 2022 3rd International Conference on Pattern Recognition and Machine Learning (PRML). :380—383.

In this paper, stock selection strategy design based on machine learning and multi-factor analysis is a research hotspot in quantitative investment field. Four machine learning algorithms including support vector machine, gradient lifting regression, random forest and linear regression are used to predict the rise and fall of stocks by taking stock fundamentals as input variables. The portfolio strategy is constructed on this basis. Finally, the stock selection strategy is further optimized. The empirical results show that the multifactor quantitative stock selection strategy has a good stock selection effect, and yield performance under the support vector machine algorithm is the best. With the increase of the number of factors, there is an inverse relationship between the fitting degree and the yield under various algorithms.

Dixit, Utkarsh, Bhatia, Suman, Bhatia, Pramod. 2022. Comparison of Different Machine Learning Algorithms Based on Intrusion Detection System. 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON). 1:667—672.

An IDS is a system that helps in detecting any kind of doubtful activity on a computer network. It is capable of identifying suspicious activities at both the levels i.e. locally at the system level and in transit at the network level. Since, the system does not have its own dataset as a result it is inefficient in identifying unknown attacks. In order to overcome this inefficiency, we make use of ML. ML assists in analysing and categorizing attacks on diverse datasets. In this study, the efficacy of eight machine learning algorithms based on KDD CUP99 is assessed. Based on our implementation and analysis, amongst the eight Algorithms considered here, Support Vector Machine (SVM), Random Forest (RF) and Decision Tree (DT) have the highest testing accuracy of which got SVM does have the highest accuracy

Rawat, Amarjeet, Maheshwari, Himani, Khanduja, Manisha, Kumar, Rajiv, Memoria, Minakshi, Kumar, Sanjeev. 2022. Sentiment Analysis of Covid19 Vaccines Tweets Using NLP and Machine Learning Classifiers. 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON). 1:225—230.

Sentiment Analysis (SA) is an approach for detecting subjective information such as thoughts, outlooks, reactions, and emotional state. The majority of previous SA work treats it as a text-classification problem that requires labelled input to train the model. However, obtaining a tagged dataset is difficult. We will have to do it by hand the majority of the time. Another concern is that the absence of sufficient cross-domain portability creates challenging situation to reuse same-labelled data across applications. As a result, we will have to manually classify data for each domain. This research work applies sentiment analysis to evaluate the entire vaccine twitter dataset. The work involves the lexicon analysis using NLP libraries like neattext, textblob and multi class classification using BERT. This word evaluates and compares the results of the machine learning algorithms.

Kumar Sahoo, Goutam, Kanike, Keerthana, Das, Santos Kumar, Singh, Poonam. 2022. Machine Learning-Based Heart Disease Prediction: A Study for Home Personalized Care. 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP). :01—06.

This study develops a framework for personalized care to tackle heart disease risk using an at-home system. The machine learning models used to predict heart disease are Logistic Regression, K - Nearest Neighbor, Support Vector Machine, Naive Bayes, Decision Tree, Random Forest and XG Boost. Timely and efficient detection of heart disease plays an important role in health care. It is essential to detect cardiovascular disease (CVD) at the earliest, consult a specialist doctor before the severity of the disease and start medication. The performance of the proposed model was assessed using the Cleveland Heart Disease dataset from the UCI Machine Learning Repository. Compared to all machine learning algorithms, the Random Forest algorithm shows a better performance accuracy score of 90.16%. The best model may evaluate patient fitness rather than routine hospital visits. The proposed work will reduce the burden on hospitals and help hospitals reach only critical patients.

Shi, Yong. 2022. A Machine Learning Study on the Model Performance of Human Resources Predictive Algorithms. 2022 4th International Conference on Applied Machine Learning (ICAML). :405—409.

A good ecological environment is crucial to attracting talents, cultivating talents, retaining talents and making talents fully effective. This study provides a solution to the current mainstream problem of how to deal with excellent employee turnover in advance, so as to promote the sustainable and harmonious human resources ecological environment of enterprises with a shortage of talents.This study obtains open data sets and conducts data preprocessing, model construction and model optimization, and describes a set of enterprise employee turnover prediction models based on RapidMiner workflow. The data preprocessing is completed with the help of the data statistical analysis software IBM SPSS Statistic and RapidMiner.Statistical charts, scatter plots and boxplots for analysis are generated to realize data visualization analysis. Machine learning, model application, performance vector, and cross-validation through RapidMiner's multiple operators and workflows. Model design algorithms include support vector machines, naive Bayes, decision trees, and neural networks. Comparing the performance parameters of the algorithm model from the four aspects of accuracy, precision, recall and F1-score. It is concluded that the performance of the decision tree algorithm model is the highest. The performance evaluation results confirm the effectiveness of this model in sustainable exploring of enterprise employee turnover prediction in human resource management.

Hu, Ningyuan. 2022. Classification of Mobile Phone Price Dataset Using Machine Learning Algorithms. 2022 3rd International Conference on Pattern Recognition and Machine Learning (PRML). :438—443.

With the development of technology, mobile phones are an indispensable part of human life. Factors such as brand, internal memory, wifi, battery power, camera and availability of 4G are now modifying consumers' decisions on buying mobile phones. But people fail to link those factors with the price of mobile phones; in this case, this paper is aimed to figure out the problem by using machine learning algorithms like Support Vector Machine, Decision Tree, K Nearest Neighbors and Naive Bayes to train the mobile phone dataset before making predictions of the price level. We used appropriate algorithms to predict smartphone prices based on accuracy, precision, recall and F1 score. This not only helps customers have a better choice on the mobile phone but also gives advice to businesses selling mobile phones that the way to set reasonable prices with the different features they offer. This idea of predicting prices level will give support to customers to choose mobile phones wisely in the future. The result illustrates that among the 4 classifiers, SVM returns to the most desirable performance with 94.8% of accuracy, 97.3 of F1 score (without feature selection) and 95.5% of accuracy, 97.7% of F1 score (with feature selection).

Shen, Qiyuan. 2022. A machine learning approach to predict the result of League of Legends. 2022 International Conference on Machine Learning and Knowledge Engineering (MLKE). :38—45.

Nowadays, the MOBA game is the game type with the most audiences and players around the world. Recently, the League of Legends has become an official sport as an e-sport among 37 events in the 2022 Asia Games held in Hangzhou. As the development in the e-sport, analytical skills are also involved in this field. The topic of this research is to use the machine learning approach to analyze the data of the League of Legends and make a prediction about the result of the game. In this research, the method of machine learning is applied to the dataset which records the first 10 minutes in diamond-ranked games. Several popular machine learning (AdaBoost, GradientBoost, RandomForest, ExtraTree, SVM, Naïve Bayes, KNN, LogisticRegression, and DecisionTree) are applied to test the performance by cross-validation. Then several algorithms that outperform others are selected to make a voting classifier to predict the game result. The accuracy of the voting classifier is 72.68%.

Winahyu, R R Kartika, Somantri, Maman, Nurhayati, Oky Dwi. 2022. Predicting Creditworthiness of Smartphone Users in Indonesia during the COVID-19 pandemic using Machine Learning. 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE). :223—227.

In this research work, we attempted to predict the creditworthiness of smartphone users in Indonesia during the COVID-19 pandemic using machine learning. Principal Component Analysis (PCA) and Kmeans algorithms are used for the prediction of creditworthiness with the used a dataset of 1050 respondents consisting of twelve questions to smartphone users in Indonesia during the COVID-19 pandemic. The four different classification algorithms (Logistic Regression, Support Vector Machine, Decision Tree, and Naive Bayes) were tested to classify the creditworthiness of smartphone users in Indonesia. The tests carried out included testing for accuracy, precision, recall, F1-score, and Area Under Curve Receiver Operating Characteristics (AUCROC) assesment. Logistic Regression algorithm shows the perfect performances whereas Naïve Bayes (NB) shows the least. The results of this research also provide new knowledge about the influential and non-influential variables based on the twelve questions conducted to the respondents of smartphone users in Indonesia during the COVID-19 pandemic.

Zhang, Zhe, Wang, Yaonan, Zhang, Jing, Xiao, Xu. 2022. Dynamic analysis for a novel fractional-order malware propagation model system with time delay. 2022 China Automation Congress (CAC). :6561—6566.

The rapid development of network information technology, individual’s information networks security has become a very critical issue in our daily life. Therefore, it is necessary to study the malware propagation model system. In this paper, the traditional integer order malware propagation model system is extended to the field of fractional-order. Then we analyze the asymptotic stability of the fractional-order malware propagation model system when the equilibrium point is the origin and the time delay is 0. Next, the asymptotic stability and bifurcation analysis of the fractional-order malware propagation model system when the equilibrium point is the origin and the time delay is not 0 are carried out. Moreover, we study the asymptotic stability of the fractional-order malware propagation model system with an interior equilibrium point. In the end, so as to verify our theoretical results, many numerical simulations are provided.

Alsmadi, Izzat, Al-Ahmad, Bilal, Alsmadi, Mohammad. 2022. Malware analysis and multi-label category detection issues: Ensemble-based approaches. 2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA). :164—169.

Detection of malware and security attacks is a complex process that can vary in its details and analysis activities. As part of the detection process, malware scanners try to categorize a malware once it is detected under one of the known malware categories (e.g. worms, spywares, viruses, etc.). However, many studies and researches indicate problems with scanners categorizing or identifying a particular malware under more than one malware category. This paper, and several others, show that machine learning can be used for malware detection especially with ensemble base prediction methods. In this paper, we evaluated several custom-built ensemble models. We focused on multi-label malware classification as individual or classical classifiers showed low accuracy in such territory.This paper showed that recent machine models such as ensemble and deep learning can be used for malware detection with better performance in comparison with classical models. This is very critical in such a dynamic and yet important detection systems where challenges such as the detection of unknown or zero-day malware will continue to exist and evolve.

Haidros Rahima Manzil, Hashida, Naik S, Manohar. 2022. DynaMalDroid: Dynamic Analysis-Based Detection Framework for Android Malware Using Machine Learning Techniques. 2022 International Conference on Knowledge Engineering and Communication Systems (ICKES). :1—6.

Android malware is continuously evolving at an alarming rate due to the growing vulnerabilities. This demands more effective malware detection methods. This paper presents DynaMalDroid, a dynamic analysis-based framework to detect malicious applications in the Android platform. The proposed framework contains three modules: dynamic analysis, feature engineering, and detection. We utilized the well-known CICMalDroid2020 dataset, and the system calls of apps are extracted through dynamic analysis. We trained our proposed model to recognize malware by selecting features obtained through the feature engineering module. Further, with these selected features, the detection module applies different Machine Learning classifiers like Random Forest, Decision Tree, Logistic Regression, Support Vector Machine, Naïve-Bayes, K-Nearest Neighbour, and AdaBoost, to recognize whether an application is malicious or not. The experiments have shown that several classifiers have demonstrated excellent performance and have an accuracy of up to 99%. The models with Support Vector Machine and AdaBoost classifiers have provided better detection accuracy of 99.3% and 99.5%, respectively.

Abdullah, Muhammed Amin, Yu, Yongbin, Cai, Jingye, Imrana, Yakubu, Tettey, Nartey Obed, Addo, Daniel, Sarpong, Kwabena, Agbley, Bless Lord Y., Appiah, Benjamin. 2022. Disparity Analysis Between the Assembly and Byte Malware Samples with Deep Autoencoders. 2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP). :1—4.

Malware attacks in the cyber world continue to increase despite the efforts of Malware analysts to combat this problem. Recently, Malware samples have been presented as binary sequences and assembly codes. However, most researchers focus only on the raw Malware sequence in their proposed solutions, ignoring that the assembly codes may contain important details that enable rapid Malware detection. In this work, we leveraged the capabilities of deep autoencoders to investigate the presence of feature disparities in the assembly and raw binary Malware samples. First, we treated the task as outliers to investigate whether the autoencoder would identify and justify features as samples from the same family. Second, we added noise to all samples and used Deep Autoencoder to reconstruct the original samples by denoising. Experiments with the Microsoft Malware dataset showed that the byte samples' features differed from the assembly code samples.

Salsabila, Hanifah, Mardhiyah, Syafira, Budiarto Hadiprakoso, Raden. 2022. Flubot Malware Hybrid Analysis on Android Operating System. 2022 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS). :202—206.

The rising use of smartphones each year is matched by the development of the smartphone's operating system, Android. Due to the immense popularity of the Android operating system, many unauthorized users (in this case, the attackers) wish to exploit this vulnerability to get sensitive data from every Android user. The flubot malware assault, which happened in 2021 and targeted Android devices practically globally, is one of the attacks on Android smartphones. It was known at the time that the flubot virus stole information, particularly from banking applications installed on the victim's device. To prevent this from happening again, we research the signature and behavior of flubot malware. In this study, a hybrid analysis will be conducted on three samples of flubot malware that are available on the open-source Hatching Triage platform. Using the Android Virtual Device (AVD) as the primary environment for malware installation, the analysis was conducted with the Android Debug Bridge (ADB) and Burpsuite as supporting tools for dynamic analysis. During the static analysis, the Mobile Security Framework (MobSF) and the Bytecode Viewer were used to examine the source code of the three malware samples. Analysis of the flubot virus revealed that it extracts or drops dex files on the victim's device, where the file is the primary malware. The Flubot virus will clone the messaging application or Short Message Service (SMS) on the default device. Additionally, we discovered a form of flubot malware that operates as a Domain Generation Algorithm (DGA) and communicates with its Command and Control (C&C) server.

Mantoro, Teddy, Fahriza, Muhammad Elky, Agni Catur Bhakti, Muhammad. 2022. Effective of Obfuscated Android Malware Detection using Static Analysis. 2022 IEEE 8th International Conference on Computing, Engineering and Design (ICCED). :1—5.

The effective security system improvement from malware attacks on the Android operating system should be updated and improved. Effective malware detection increases the level of data security and high protection for the users. Malicious software or malware typically finds a means to circumvent the security procedure, even when the user is unaware whether the application can act as malware. The effectiveness of obfuscated android malware detection is evaluated by collecting static analysis data from a data set. The experiment assesses the risk level of which malware dataset using the hash value of the malware and records malware behavior. A set of hash SHA256 malware samples has been obtained from an internet dataset and will be analyzed using static analysis to record malware behavior and evaluate which risk level of the malware. According to the results, most of the algorithms provide the same total score because of the multiple crime inside the malware application.

Preeti, Agrawal, Animesh Kumar. 2022. A Comparative Analysis of Open Source Automated Malware Tools. 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom). :226—230.

Malwares are designed to cause harm to the machine without the user's knowledge. Malwares belonging to different families infect the system in its own unique way causing damage which could be irreversible and hence there is a need to detect and analyse the malwares. Manual analysis of all types of malwares is not a practical approach due to the huge effort involved and hence Automated Malware Analysis is resorted to so that the burden on humans can be decreased and the process is made robust. A lot of Automated Malware Analysis tools are present right now both offline and online but the problem arises as to which tool to select while analysing a suspicious binary. A comparative analysis of three most widely used automated tools has been done with different malware class samples. These tools are Cuckoo Sandbox, Any. Run and Intezer Analyze. In order to check the efficacy of the tool in both online and offline analysis, Cuckoo Sandbox was configured for offline use, and Any. Run and Intezer Analyze were configured for online analysis. Individual tools analyse each malware sample and after analysis is completed, a comparative chart is prepared to determine which tool is good at finding registry changes, processes created, files created, network connections, etc by the malicious binary. The findings conclude that Intezer Analyze tool recognizes file changes better than others but otherwise Cuckoo Sandbox and Any. Run tools are better in determining other functionalities.

Dhalaria, Meghna, Gandotra, Ekta. 2022. Android Malware Risk Evaluation Using Fuzzy Logic. 2022 Seventh International Conference on Parallel, Distributed and Grid Computing (PDGC). :341—345.

The static and dynamic malware analysis are used by industrialists and academics to understand malware capabilities and threat level. The antimalware industries calculate malware threat levels using different techniques which involve human involvement and a large number of resources and analysts. As malware complexity, velocity and volume increase, it becomes impossible to allocate so many resources. Due to this reason, it is projected that the number of malware apps will continue to rise, and that more devices will be targeted in order to commit various sorts of cybercrime. It is therefore necessary to develop techniques that can calculate the damage or threat posed by malware automatically as soon as it is identified. In this way, early warnings about zero-day (unknown) malware can assist in allocating resources for carrying out a close analysis of it as soon as it is identified. In this paper, a fuzzy modelling approach is described for calculating the potential risk of malicious programs through static malware analysis.

Ismael, Maher F., Thanoon, Karam H.. 2022. Investigation Malware Analysis Depend on Reverse Engineering Using IDAPro. 2022 8th International Conference on Contemporary Information Technology and Mathematics (ICCITM). :227—231.

Any software that runs malicious payloads on victims’ computers is referred to as malware. It is an increasing threat that costs people, businesses, and organizations a lot of money. Attacks on security have developed significantly in recent years. Malware may infiltrate both offline and online media, like: chat, SMS, and spam (email, or social media), because it has a built-in defensive mechanism and may conceal itself from antivirus software or even corrupt it. As a result, there is an urgent need to detect and prevent malware before it damages critical assets around the world. In fact, there are lots of different techniques and tools used to combat versus malware. In this paper, the malware samples were analyzing in the Virtual Box environment using in-depth analysis based on reverse engineering using advanced static malware analysis techniques. The results Obtained from malware analysis which represent a set of valuable information, all anti-malware and anti-virus program companies need for in order to update their products.

Khalil, Md Yusuf, Vivek, Anand, Kumar, Paul, Antarlina, Grover, Rahul. 2022. PDF Malware Analysis. 2022 7th International Conference on Computing, Communication and Security (ICCCS). :1—4.

This document addresses the issue of the actual security level of PDF documents. Two types of detection approaches are utilized to detect dangerous elements within malware: static analysis and dynamic analysis. Analyzing malware binaries to identify dangerous strings, as well as reverse-engineering is included in static analysis for t1he malware to disassemble it. On the other hand, dynamic analysis monitors malware activities by running them in a safe environment, such as a virtual machine. Each method has its own set of strengths and weaknesses, and it is usually best to employ both methods while analyzing malware. Malware detection could be simplified without sacrificing accuracy by reducing the number of malicious traits. This may allow the researcher to devote more time to analysis. Our worry is that there is no obvious need to identify malware with numerous functionalities when it isn't necessary. We will solve this problem by developing a system that will identify if the given file is infected with malware or not.

2023-09-18

Dvorak, Stepan, Prochazka, Pavel, Bajer, Lukas. 2022. GNN-Based Malicious Network Entities Identification In Large-Scale Network Data. NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium. :1—4.

A reliable database of Indicators of Compromise (IoC’s) is a cornerstone of almost every malware detection system. Building the database and keeping it up-to-date is a lengthy and often manual process where each IoC should be manually reviewed and labeled by an analyst. In this paper, we focus on an automatic way of identifying IoC’s intended to save analysts’ time and scale to the volume of network data. We leverage relations of each IoC to other entities on the internet to build a heterogeneous graph. We formulate a classification task on this graph and apply graph neural networks (GNNs) in order to identify malicious domains. Our experiments show that the presented approach provides promising results on the task of identifying high-risk malware as well as legitimate domains classification.

Amer, Eslam, Samir, Adham, Mostafa, Hazem, Mohamed, Amer, Amin, Mohamed. 2022. Malware Detection Approach Based on the Swarm-Based Behavioural Analysis over API Calling Sequence. 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC). :27—32.

The rapidly increasing malware threats must be coped with new effective malware detection methodologies. Current malware threats are not limited to daily personal transactions but dowelled deeply within large enterprises and organizations. This paper introduces a new methodology for detecting and discriminating malicious versus normal applications. In this paper, we employed Ant-colony optimization to generate two behavioural graphs that characterize the difference in the execution behavior between malware and normal applications. Our proposed approach relied on the API call sequence generated when an application is executed. We used the API calls as one of the most widely used malware dynamic analysis features. Our proposed method showed distinctive behavioral differences between malicious and non-malicious applications. Our experimental results showed a comparative performance compared to other machine learning methods. Therefore, we can employ our method as an efficient technique in capturing malicious applications.

Cao, Michael, Ahmed, Khaled, Rubin, Julia. 2022. Rotten Apples Spoil the Bunch: An Anatomy of Google Play Malware. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). :1919—1931.

This paper provides an in-depth analysis of Android malware that bypassed the strictest defenses of the Google Play application store and penetrated the official Android market between January 2016 and July 2021. We systematically identified 1,238 such malicious applications, grouped them into 134 families, and manually analyzed one application from 105 distinct families. During our manual analysis, we identified malicious payloads the applications execute, conditions guarding execution of the payloads, hiding techniques applications employ to evade detection by the user, and other implementation-level properties relevant for automated malware detection. As most applications in our dataset contain multiple payloads, each triggered via its own complex activation logic, we also contribute a graph-based representation showing activation paths for all application payloads in form of a control- and data-flow graph. Furthermore, we discuss the capabilities of existing malware detection tools, put them in context of the properties observed in the analyzed malware, and identify gaps and future research directions. We believe that our detailed analysis of the recent, evasive malware will be of interest to researchers and practitioners and will help further improve malware detection tools.

Herath, Jerome Dinal, Wakodikar, Priti Prabhakar, Yang, Ping, Yan, Guanhua. 2022. CFGExplainer: Explaining Graph Neural Network-Based Malware Classification from Control Flow Graphs. 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :172—184.

With the ever increasing threat of malware, extensive research effort has been put on applying Deep Learning for malware classification tasks. Graph Neural Networks (GNNs) that process malware as Control Flow Graphs (CFGs) have shown great promise for malware classification. However, these models are viewed as black-boxes, which makes it hard to validate and identify malicious patterns. To that end, we propose CFG-Explainer, a deep learning based model for interpreting GNN-oriented malware classification results. CFGExplainer identifies a subgraph of the malware CFG that contributes most towards classification and provides insight into importance of the nodes (i.e., basic blocks) within it. To the best of our knowledge, CFGExplainer is the first work that explains GNN-based mal-ware classification. We compared CFGExplainer against three explainers, namely GNNExplainer, SubgraphX and PGExplainer, and showed that CFGExplainer is able to identify top equisized subgraphs with higher classification accuracy than the other three models.

Jia, Jingyun, Chan, Philip K.. 2022. Representation Learning with Function Call Graph Transformations for Malware Open Set Recognition. 2022 International Joint Conference on Neural Networks (IJCNN). :1—8.

Open set recognition (OSR) problem has been a challenge in many machine learning (ML) applications, such as security. As new/unknown malware families occur regularly, it is difficult to exhaust samples that cover all the classes for the training process in ML systems. An advanced malware classification system should classify the known classes correctly while sensitive to the unknown class. In this paper, we introduce a self-supervised pre-training approach for the OSR problem in malware classification. We propose two transformations for the function call graph (FCG) based malware representations to facilitate the pretext task. Also, we present a statistical thresholding approach to find the optimal threshold for the unknown class. Moreover, the experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem.