Visible to the public Biblio

Found 871 results

Filters: Keyword is feature extraction  [Clear All Filters]
2019-12-02
Wang, Dinghua, Feng, Dongqin.  2018.  Intrusion Detection Model of SCADA Using Graphical Features. 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). :1208–1214.
Supervisory control and data acquisition system is an important part of the country's critical infrastructure, but its inherent network characteristics are vulnerable to attack by intruders. The vulnerability of supervisory control and data acquisition system was analyzed, combining common attacks such as information scanning, response injection, command injection and denial of service in industrial control systems, and proposed an intrusion detection model based on graphical features. The time series of message transmission were visualized, extracting the vertex coordinates and various graphic area features to constitute a new data set, and obtained classification model of intrusion detection through training. An intrusion detection experiment environment was built using tools such as MATLAB and power protocol testers. IEC 60870-5-104 protocol which is widely used in power systems had been taken as an example. The results of tests have good effectiveness.
2019-11-26
Zabihimayvan, Mahdieh, Doran, Derek.  2019.  Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection. 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). :1-6.

Phishing as one of the most well-known cybercrime activities is a deception of online users to steal their personal or confidential information by impersonating a legitimate website. Several machine learning-based strategies have been proposed to detect phishing websites. These techniques are dependent on the features extracted from the website samples. However, few studies have actually considered efficient feature selection for detecting phishing attacks. In this work, we investigate an agreement on the definitive features which should be used in phishing detection. We apply Fuzzy Rough Set (FRS) theory as a tool to select most effective features from three benchmarked data sets. The selected features are fed into three often used classifiers for phishing detection. To evaluate the FRS feature selection in developing a generalizable phishing detection, the classifiers are trained by a separate out-of-sample data set of 14,000 website samples. The maximum F-measure gained by FRS feature selection is 95% using Random Forest classification. Also, there are 9 universal features selected by FRS over all the three data sets. The F-measure value using this universal feature set is approximately 93% which is a comparable result in contrast to the FRS performance. Since the universal feature set contains no features from third-part services, this finding implies that with no inquiry from external sources, we can gain a faster phishing detection which is also robust toward zero-day attacks.

Patil, Srushti, Dhage, Sudhir.  2019.  A Methodical Overview on Phishing Detection along with an Organized Way to Construct an Anti-Phishing Framework. 2019 5th International Conference on Advanced Computing Communication Systems (ICACCS). :588-593.

Phishing is a security attack to acquire personal information like passwords, credit card details or other account details of a user by means of websites or emails. Phishing websites look similar to the legitimate ones which make it difficult for a layman to differentiate between them. As per the reports of Anti Phishing Working Group (APWG) published in December 2018, phishing against banking services and payment processor was high. Almost all the phishy URLs use HTTPS and use redirects to avoid getting detected. This paper presents a focused literature survey of methods available to detect phishing websites. A comparative study of the in-use anti-phishing tools was accomplished and their limitations were acknowledged. We analyzed the URL-based features used in the past to improve their definitions as per the current scenario which is our major contribution. Also, a step wise procedure of designing an anti-phishing model is discussed to construct an efficient framework which adds to our contribution. Observations made out of this study are stated along with recommendations on existing systems.

Wang, Pengfei, Wang, Fengyu, Lin, Fengbo, Cao, Zhenzhong.  2018.  Identifying Peer-to-Peer Botnets Through Periodicity Behavior Analysis. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :283-288.

Peer-to-Peer botnets have become one of the significant threat against network security due to their distributed properties. The decentralized nature makes their detection challenging. It is important to take measures to detect bots as soon as possible to minimize their harm. In this paper, we propose PeerGrep, a novel system capable of identifying P2P bots. PeerGrep starts from identifying hosts that are likely engaged in P2P communications, and then distinguishes P2P bots from P2P hosts by analyzing their active ratio, packet size and the periodicity of connection to destination IP addresses. The evaluation shows that PeerGrep can identify all P2P bots with quite low FPR even if the malicious P2P application and benign P2P application coexist within the same host or there is only one bot in the monitored network.

Shukla, Anjali, Rakshit, Arnab, Konar, Amit, Ghosh, Lidia, Nagar, Atulya K..  2018.  Decoding of Mind-Generated Pattern Locks for Security Checking Using Type-2 Fuzzy Classifier. 2018 IEEE Symposium Series on Computational Intelligence (SSCI). :1976-1981.

Brain Computer Interface (BCI) aims at providing a better quality of life to people suffering from neuromuscular disability. This paper establishes a BCI paradigm to provide a biometric security option, used for locking and unlocking personal computers or mobile phones. Although it is primarily meant for the people with neurological disorder, its application can safely be extended for the use of normal people. The proposed scheme decodes the electroencephalogram signals liberated by the brain of the subjects, when they are engaged in selecting a sequence of dots in(6×6)2-dimensional array, representing a pattern lock. The subject, while selecting the right dot in a row, would yield a P300 signal, which is decoded later by the brain-computer interface system to understand the subject's intention. In case the right dots in all the 6 rows are correctly selected, the subject would yield P300 signals six times, which on being decoded by a BCI system would allow the subject to access the system. Because of intra-subjective variation in the amplitude and wave-shape of the P300 signal, a type 2 fuzzy classifier has been employed to classify the presence/absence of the P300 signal in the desired window. A comparison of performances of the proposed classifier with others is also included. The functionality of the proposed system has been validated using the training instances generated for 30 subjects. Experimental results confirm that the classification accuracy for the present scheme is above 90% irrespective of subjects.

2019-11-19
Ying, Huan, Zhang, Yanmiao, Han, Lifang, Cheng, Yushi, Li, Jiyuan, Ji, Xiaoyu, Xu, Wenyuan.  2019.  Detecting Buffer-Overflow Vulnerabilities in Smart Grid Devices via Automatic Static Analysis. 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). :813-817.

As a modern power transmission network, smart grid connects plenty of terminal devices. However, along with the growth of devices are the security threats. Different from the previous separated environment, an adversary nowadays can destroy the power system by attacking these devices. Therefore, it's critical to ensure the security and safety of terminal devices. To achieve this goal, detecting the pre-existing vulnerabilities of the device program and enhance the terminal security, are of great importance and necessity. In this paper, we propose a novel approach that detects existing buffer-overflow vulnerabilities of terminal devices via automatic static analysis (ASA). We utilize the static analysis to extract the device program information and build corresponding program models. By further matching the generated program model with pre-defined vulnerability patterns, we achieve vulnerability detection and error reporting. The evaluation results demonstrate that our method can effectively detect buffer-overflow vulnerabilities of smart terminals with a high accuracy and a low false positive rate.

2019-11-04
Li, Teng, Ma, Jianfeng, Pei, Qingqi, Shen, Yulong, Sun, Cong.  2018.  Anomalies Detection of Routers Based on Multiple Information Learning. 2018 International Conference on Networking and Network Applications (NaNA). :206-211.

Routers are important devices in the networks that carry the burden of transmitting information among the communication devices on the Internet. If a malicious adversary wants to intercept the information or paralyze the network, it can directly attack the routers and then achieve the suspicious goals. Thus, preventing router security is of great importance. However, router systems are notoriously difficult to understand or diagnose for their inaccessibility and heterogeneity. The common way of gaining access to the router system and detecting the anomaly behaviors is to inspect the router syslogs or monitor the packets of information flowing to the routers. These approaches just diagnose the routers from one aspect but do not consider them from multiple views. In this paper, we propose an approach to detect the anomalies and faults of the routers with multiple information learning. We try to use the routers' information not from the developer's view but from the user' s view, which does not need any expert knowledge. First, we do the offline learning to transform the benign or corrupted user actions into the syslogs. Then, we try to decide whether the input routers' conditions are poor or not with clustering. During the detection phase, we use the distance between the event and the cluster to decide if it is the anomaly event and we can provide the corresponding solutions. We have applied our approach in a university network which contains Cisco, Huawei and Dlink routers for three months. We aligned our experiment with former work as a baseline for comparison. Our approach can gain 89.6% accuracy in detecting the attacks which is 5.1% higher than the former work. The results show that our approach performs in limited time as well as memory usages and has high detection and low false positives.

Sallam, Asmaa, Bertino, Elisa.  2018.  Detection of Temporal Data Ex-Filtration Threats to Relational Databases. 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC). :146–155.
According to recent reports, the most common insider threats to systems are unauthorized access to or use of corporate information and exposure of sensitive data. While anomaly detection techniques have proved to be effective in the detection of early signs of data theft, these techniques are not able to detect sophisticated data misuse scenarios in which malicious insiders seek to aggregate knowledge by executing and combining the results of several queries. We thus need techniques that are able to track users' actions across time to detect correlated ones that collectively flag anomalies. In this paper, we propose such techniques for the detection of anomalous accesses to relational databases. Our approach is to monitor users' queries, sequences of queries and sessions of database connection to detect queries that retrieve amounts of data larger than the normal. Our evaluation of the proposed techniques indicates that they are very effective in the detection of anomalies.
2019-10-15
Panagiotakis, C., Papadakis, H., Fragopoulou, P..  2018.  Detection of Hurriedly Created Abnormal Profiles in Recommender Systems. 2018 International Conference on Intelligent Systems (IS). :499–506.

Recommender systems try to predict the preferences of users for specific items. These systems suffer from profile injection attacks, where the attackers have some prior knowledge of the system ratings and their goal is to promote or demote a particular item introducing abnormal (anomalous) ratings. The detection of both cases is a challenging problem. In this paper, we propose a framework to spot anomalous rating profiles (outliers), where the outliers hurriedly create a profile that injects into the system either random ratings or specific ratings, without any prior knowledge of the existing ratings. The proposed detection method is based on the unpredictable behavior of the outliers in a validation set, on the user-item rating matrix and on the similarity between users. The proposed system is totally unsupervised, and in the last step it uses the k-means clustering method automatically spotting the spurious profiles. For the cases where labeling sample data is available, a random forest classifier is trained to show how supervised methods outperforms unsupervised ones. Experimental results on the MovieLens 100k and the MovieLens 1M datasets demonstrate the high performance of the proposed schemata.

Zhang, F., Deng, Z., He, Z., Lin, X., Sun, L..  2018.  Detection Of Shilling Attack In Collaborative Filtering Recommender System By Pca And Data Complexity. 2018 International Conference on Machine Learning and Cybernetics (ICMLC). 2:673–678.

Collaborative filtering (CF) recommender system has been widely used for its well performing in personalized recommendation, but CF recommender system is vulnerable to shilling attacks in which shilling attack profiles are injected into the system by attackers to affect recommendations. Design robust recommender system and propose attack detection methods are the main research direction to handle shilling attacks, among which unsupervised PCA is particularly effective in experiment, but if we have no information about the number of shilling attack profiles, the unsupervised PCA will be suffered. In this paper, a new unsupervised detection method which combine PCA and data complexity has been proposed to detect shilling attacks. In the proposed method, PCA is used to select suspected attack profiles, and data complexity is used to pick out the authentic profiles from suspected attack profiles. Compared with the traditional PCA, the proposed method could perform well and there is no need to determine the number of shilling attack profiles in advance.

2019-10-02
Hussein, A., Salman, O., Chehab, A., Elhajj, I., Kayssi, A..  2019.  Machine Learning for Network Resiliency and Consistency. 2019 Sixth International Conference on Software Defined Systems (SDS). :146–153.

Being able to describe a specific network as consistent is a large step towards resiliency. Next to the importance of security lies the necessity of consistency verification. Attackers are currently focusing on targeting small and crutial goals such as network configurations or flow tables. These types of attacks would defy the whole purpose of a security system when built on top of an inconsistent network. Advances in Artificial Intelligence (AI) are playing a key role in ensuring a fast responce to the large number of evolving threats. Software Defined Networking (SDN), being centralized by design, offers a global overview of the network. Robustness and adaptability are part of a package offered by programmable networking, which drove us to consider the integration between both AI and SDN. The general goal of our series is to achieve an Artificial Intelligence Resiliency System (ARS). The aim of this paper is to propose a new AI-based consistency verification system, which will be part of ARS in our future work. The comparison of different deep learning architectures shows that Convolutional Neural Networks (CNN) give the best results with an accuracy of 99.39% on our dataset and 96% on our consistency test scenario.

2019-09-05
Sun, Y., Zhang, L., Zhao, C..  2018.  A Study of Network Covert Channel Detection Based on Deep Learning. 2018 2nd IEEE Advanced Information Management,Communicates,Electronic and Automation Control Conference (IMCEC). :637-641.

Information security has become a growing concern. Computer covert channel which is regarded as an important area of information security research gets more attention. In order to detect these covert channels, a variety of detection algorithms are proposed in the course of the research. The algorithms of machine learning type show better results in these detection algorithms. However, the common machine learning algorithms have many problems in the testing process and have great limitations. Based on the deep learning algorithm, this paper proposes a new idea of network covert channel detection and forms a new detection model. On the one hand, this algorithmic model can detect more complex covert channels and, on the other hand, greatly improve the accuracy of detection due to the use of a new deep learning model. By optimizing this test model, we can get better results on the evaluation index.

Elsadig, M. A., Fadlalla, Y. A..  2018.  Packet Length Covert Channel: A Detection Scheme. 2018 1st International Conference on Computer Applications Information Security (ICCAIS). :1-7.

A covert channel is a communication channel that is subjugated for illegal flow of information in a way that violates system security policies. It is a dangerous, invisible, undetectable, and developed security attack. Recently, Packet length covert channel has motivated many researchers as it is a one of the most undetectable network covert channels. Packet length covert channel generates a covert traffic that is very similar to normal terrific which complicates the detection of such type of covert channels. This motivates us to introduce a machine learning based detection scheme. Recently, a machine learning approach has proved its capability in many different fields especially in security field as it usually brings up a reliable and realistic results. Based in our developed content and frequency-based features, the developed detection scheme has been fully trained and tested. Our detection scheme has gained an excellent degree of detection accuracy which reaches 98% (zero false negative rate and 0.02 false positive rate).

2019-08-05
Kaiafas, G., Varisteas, G., Lagraa, S., State, R., Nguyen, C. D., Ries, T., Ourdane, M..  2018.  Detecting Malicious Authentication Events Trustfully. NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium. :1-6.

Anomaly detection on security logs is receiving more and more attention. Authentication events are an important component of security logs, and being able to produce trustful and accurate predictions minimizes the effort of cyber-experts to stop false attacks. Observed events are classified into Normal, for legitimate user behavior, and Malicious, for malevolent actions. These classes are consistently excessively imbalanced which makes the classification problem harder; in the commonly used Los Alamos dataset, the malicious class comprises only 0.00033% of the total. This work proposes a novel method to extract advanced composite features, and a supervised learning technique for classifying authentication logs trustfully; the models are Random Forest, LogitBoost, Logistic Regression, and ultimately Majority Voting which leverages the predictions of the previous models and gives the final prediction for each authentication event. We measure the performance of our experiments by using the False Negative Rate and False Positive Rate. In overall we achieve 0 False Negative Rate (i.e. no attack was missed), and on average a False Positive Rate of 0.0019.

Hu, Xinyi, Zhao, Yaqun.  2018.  One to One Identification of Cryptosystem Using Fisher's Discriminant Analysis. Proceedings of the 6th ACM/ACIS International Conference on Applied Computing and Information Technology. :7–12.
Distinguishing analysis is an important part of cryptanalysis. It is an important content of discriminating analysis that how to identify ciphertext is encrypted by which cryptosystems when it knows only ciphertext. In this paper, Fisher's discriminant analysis (FDA), which is based on statistical method and machine learning, is used to identify 4 stream ciphers and 7 block ciphers one to one by extracting 9 different features. The results show that the accuracy rate of the FDA can reach 80% when identifying files that are encrypted by the stream cipher and the block cipher in ECB mode respectively, and files encrypted by the block cipher in ECB mode and CBC mode respectively. The average one to one identification accuracy rates of stream ciphers RC4, Grain, Sosemanuk are more than 55%. The maximum accuracy rate can reach 60% when identifying SMS4 from block ciphers in CBC mode one to one. The identification accuracy rate of entropy-based features is apparently higher than the probability-based features.
Aigner, Alexander.  2018.  FALKE-MC: A Neural Network Based Approach to Locate Cryptographic Functions in Machine Code. Proceedings of the 13th International Conference on Availability, Reliability and Security. :2:1–2:8.
The localization and classification of cryptographic functions in binary files is a growing challenge in information security, not least because of the increasing use of such functions in malware. Nevertheless, it is still a time consuming and laborious task. Some of the most commonly used techniques are based on dynamic methods, signatures or manual reverse engineering. In this paper we present FALKE-MC, a novel framework that creates classifiers for arbitrary cryptographic algorithms from sample binaries. It processes multiple file formats and architectures and is easily expandable due to its modular design. Functions are automatically detected and features as well as constants are extracted. They are used to train a neural network, which can then be applied to classify functions in unknown binary files. The framework is fully automated, from the input of binary files and the creation of a classifier through to the output of classification results. In addition to that, it can deal with class imbalance between cryptographic and non-cryptographic samples during training. Our evaluation shows that this approach offers a high detection rate in combination with a low false positive rate. We are confident that FALKE-MC can accelerate the localization and classification of cryptographic functions in practice.
2019-07-01
Amjad, N., Afzal, H., Amjad, M. F., Khan, F. A..  2018.  A Multi-Classifier Framework for Open Source Malware Forensics. 2018 IEEE 27th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE). :106-111.

Traditional anti-virus technologies have failed to keep pace with proliferation of malware due to slow process of their signatures and heuristics updates. Similarly, there are limitations of time and resources in order to perform manual analysis on each malware. There is a need to learn from this vast quantity of data, containing cyber attack pattern, in an automated manner to proactively adapt to ever-evolving threats. Machine learning offers unique advantages to learn from past cyber attacks to handle future cyber threats. The purpose of this research is to propose a framework for multi-classification of malware into well-known categories by applying different machine learning models over corpus of malware analysis reports. These reports are generated through an open source malware sandbox in an automated manner. We applied extensive pre-modeling techniques for data cleaning, features exploration and features engineering to prepare training and test datasets. Best possible hyper-parameters are selected to build machine learning models. These prepared datasets are then used to train the machine learning classifiers and to compare their prediction accuracy. Finally, these results are validated through a comprehensive 10-fold cross-validation methodology. The best results are achieved through Gaussian Naive Bayes classifier with random accuracy of 96% and 10-Fold Cross Validation accuracy of 91.2%. The said framework can be deployed in an operational environment to learn from malware attacks for proactively adapting matching counter measures.

2019-06-24
Ijaz, M., Durad, M. H., Ismail, M..  2019.  Static and Dynamic Malware Analysis Using Machine Learning. 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST). :687–691.

Malware detection is an indispensable factor in security of internet oriented machines. The combinations of different features are used for dynamic malware analysis. The different combinations are generated from APIs, Summary Information, DLLs and Registry Keys Changed. Cuckoo sandbox is used for dynamic malware analysis, which is customizable, and provide good accuracy. More than 2300 features are extracted from dynamic analysis of malware and 92 features are extracted statically from binary malware using PEFILE. Static features are extracted from 39000 malicious binaries and 10000 benign files. Dynamically 800 benign files and 2200 malware files are analyzed in Cuckoo Sandbox and 2300 features are extracted. The accuracy of dynamic malware analysis is 94.64% while static analysis accuracy is 99.36%. The dynamic malware analysis is not effective due to tricky and intelligent behaviours of malwares. The dynamic analysis has some limitations due to controlled network behavior and it cannot be analyzed completely due to limited access of network.

Naeem, H., Guo, B., Naeem, M. R..  2018.  A light-weight malware static visual analysis for IoT infrastructure. 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD). :240–244.

Recently a huge trend on the internet of things (IoT) and an exponential increase in automated tools are helping malware producers to target IoT devices. The traditional security solutions against malware are infeasible due to low computing power for large-scale data in IoT environment. The number of malware and their variants are increasing due to continuous malware attacks. Consequently, the performance improvement in malware analysis is critical requirement to stop rapid expansion of malicious attacks in IoT environment. To solve this problem, the paper proposed a novel framework for classifying malware in IoT environment. To achieve flne-grained malware classification in suggested framework, the malware image classification system (MICS) is designed for representing malware image globally and locally. MICS first converts the suspicious program into the gray-scale image and then captures hybrid local and global malware features to perform malware family classification. Preliminary experimental outcomes of MICS are quite promising with 97.4% classification accuracy on 9342 windows suspicious programs of 25 families. The experimental results indicate that proposed framework is quite capable to process large-scale IoT malware.

2019-06-10
Roseline, S. A., Geetha, S..  2018.  Intelligent Malware Detection Using Oblique Random Forest Paradigm. 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI). :330-336.

With the increase in the popularity of computerized online applications, the analysis, and detection of a growing number of newly discovered stealthy malware poses a significant challenge to the security community. Signature-based and behavior-based detection techniques are becoming inefficient in detecting new unknown malware. Machine learning solutions are employed to counter such intelligent malware and allow performing more comprehensive malware detection. This capability leads to an automatic analysis of malware behavior. The proposed oblique random forest ensemble learning technique is efficient for malware classification. The effectiveness of the proposed method is demonstrated with three malware classification datasets from various sources. The results are compared with other variants of decision tree learning models. The proposed system performs better than the existing system in terms of classification accuracy and false positive rate.

Jiang, J., Yin, Q., Shi, Z., Li, M..  2018.  Comprehensive Behavior Profiling Model for Malware Classification. 2018 IEEE Symposium on Computers and Communications (ISCC). :00129-00135.

In view of the great threat posed by malware and the rapid growing trend about malware variants, it is necessary to determine the category of new samples accurately for further analysis and taking appropriate countermeasures. The network behavior based classification methods have become more popular now. However, the behavior profiling models they used usually only depict partial network behavior of samples or require specific traffic selection in advance, which may lead to adverse effects on categorizing advanced malware with complex activities. In this paper, to overcome the shortages of traditional models, we raise a comprehensive behavior model for profiling the behavior of malware network activities. And we also propose a corresponding malware classification method which can extract and compare the major behavior of samples. The experimental and comparison results not only demonstrate our method can categorize samples accurately in both criteria, but also prove the advantage of our profiling model to two other approaches in accuracy performance, especially under scenario based criteria.

Kornish, D., Geary, J., Sansing, V., Ezekiel, S., Pearlstein, L., Njilla, L..  2018.  Malware Classification Using Deep Convolutional Neural Networks. 2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). :1-6.

In recent years, deep convolution neural networks (DCNNs) have won many contests in machine learning, object detection, and pattern recognition. Furthermore, deep learning techniques achieved exceptional performance in image classification, reaching accuracy levels beyond human capability. Malware variants from similar categories often contain similarities due to code reuse. Converting malware samples into images can cause these patterns to manifest as image features, which can be exploited for DCNN classification. Techniques for converting malware binaries into images for visualization and classification have been reported in the literature, and while these methods do reach a high level of classification accuracy on training datasets, they tend to be vulnerable to overfitting and perform poorly on previously unseen samples. In this paper, we explore and document a variety of techniques for representing malware binaries as images with the goal of discovering a format best suited for deep learning. We implement a database for malware binaries from several families, stored in hexadecimal format. These malware samples are converted into images using various approaches and are used to train a neural network to recognize visual patterns in the input and classify malware based on the feature vectors. Each image type is assessed using a variety of learning models, such as transfer learning with existing DCNN architectures and feature extraction for support vector machine classifier training. Each technique is evaluated in terms of classification accuracy, result consistency, and time per trial. Our preliminary results indicate that improved image representation has the potential to enable more effective classification of new malware.

Alsulami, B., Mancoridis, S..  2018.  Behavioral Malware Classification Using Convolutional Recurrent Neural Networks. 2018 13th International Conference on Malicious and Unwanted Software (MALWARE). :103-111.

Behavioral malware detection aims to improve on the performance of static signature-based techniques used by anti-virus systems, which are less effective against modern polymorphic and metamorphic malware. Behavioral malware classification aims to go beyond the detection of malware by also identifying a malware's family according to a naming scheme such as the ones used by anti-virus vendors. Behavioral malware classification techniques use run-time features, such as file system or network activities, to capture the behavioral characteristic of running processes. The increasing volume of malware samples, diversity of malware families, and the variety of naming schemes given to malware samples by anti-virus vendors present challenges to behavioral malware classifiers. We describe a behavioral classifier that uses a Convolutional Recurrent Neural Network and data from Microsoft Windows Prefetch files. We demonstrate the model's improvement on the state-of-the-art using a large dataset of malware families and four major anti-virus vendor naming schemes. The model is effective in classifying malware samples that belong to common and rare malware families and can incrementally accommodate the introduction of new malware samples and families.

Kim, H. M., Song, H. M., Seo, J. W., Kim, H. K..  2018.  Andro-Simnet: Android Malware Family Classification Using Social Network Analysis. 2018 16th Annual Conference on Privacy, Security and Trust (PST). :1-8.

While the rapid adaptation of mobile devices changes our daily life more conveniently, the threat derived from malware is also increased. There are lots of research to detect malware to protect mobile devices, but most of them adopt only signature-based malware detection method that can be easily bypassed by polymorphic and metamorphic malware. To detect malware and its variants, it is essential to adopt behavior-based detection for efficient malware classification. This paper presents a system that classifies malware by using common behavioral characteristics along with malware families. We measure the similarity between malware families with carefully chosen features commonly appeared in the same family. With the proposed similarity measure, we can classify malware by malware's attack behavior pattern and tactical characteristics. Also, we apply community detection algorithm to increase the modularity within each malware family network aggregation. To maintain high classification accuracy, we propose a process to derive the optimal weights of the selected features in the proposed similarity measure. During this process, we find out which features are significant for representing the similarity between malware samples. Finally, we provide an intuitive graph visualization of malware samples which is helpful to understand the distribution and likeness of the malware networks. In the experiment, the proposed system achieved 97% accuracy for malware classification and 95% accuracy for prediction by K-fold cross-validation using the real malware dataset.

Xue, S., Zhang, L., Li, A., Li, X., Ruan, C., Huang, W..  2018.  AppDNA: App Behavior Profiling via Graph-Based Deep Learning. IEEE INFOCOM 2018 - IEEE Conference on Computer Communications. :1475-1483.

Better understanding of mobile applications' behaviors would lead to better malware detection/classification and better app recommendation for users. In this work, we design a framework AppDNA to automatically generate a compact representation for each app to comprehensively profile its behaviors. The behavior difference between two apps can be measured by the distance between their representations. As a result, the versatile representation can be generated once for each app, and then be used for a wide variety of objectives, including malware detection, app categorizing, plagiarism detection, etc. Based on a systematic and deep understanding of an app's behavior, we propose to perform a function-call-graph-based app profiling. We carefully design a graph-encoding method to convert a typically extremely large call-graph to a 64-dimension fix-size vector to achieve robust app profiling. Our extensive evaluations based on 86,332 benign and malicious apps demonstrate that our system performs app profiling (thus malware detection, classification, and app recommendation) to a high accuracy with extremely low computation cost: it classifies 4024 (benign/malware) apps using around 5.06 second with accuracy about 93.07%; it classifies 570 malware's family (total 21 families) using around 0.83 second with accuracy 82.3%; it classifies 9,730 apps' functionality with accuracy 33.3% for a total of 7 categories and accuracy of 88.1 % for 2 categories.