Visible to the public Biblio

Found 5734 results

Filters: Keyword is Human Behavior  [Clear All Filters]
2020-11-20
Chin, J., Zufferey, T., Shyti, E., Hug, G..  2019.  Load Forecasting of Privacy-Aware Consumers. 2019 IEEE Milan PowerTech. :1—6.

The roll-out of smart meters (SMs) in the electric grid has enabled data-driven grid management and planning techniques. SM data can be used together with short-term load forecasts (STLFs) to overcome polling frequency constraints for better grid management. However, the use of SMs that report consumption data at high spatial and temporal resolutions entails consumer privacy risks, motivating work in protecting consumer privacy. The impact of privacy protection schemes on STLF accuracy is not well studied, especially for smaller aggregations of consumers, whose load profiles are subject to more volatility and are, thus, harder to predict. In this paper, we analyse the impact of two user demand shaping privacy protection schemes, model-distribution predictive control (MDPC) and load-levelling, on STLF accuracy. Support vector regression is used to predict the load profiles at different consumer aggregation levels. Results indicate that, while the MDPC algorithm marginally affects forecast accuracy for smaller consumer aggregations, this diminishes at higher aggregation levels. More importantly, the load-levelling scheme significantly improves STLF accuracy as it smoothens out the grid visible consumer load profile.

2020-11-04
Apruzzese, G., Colajanni, M., Ferretti, L., Marchetti, M..  2019.  Addressing Adversarial Attacks Against Security Systems Based on Machine Learning. 2019 11th International Conference on Cyber Conflict (CyCon). 900:1—18.

Machine-learning solutions are successfully adopted in multiple contexts but the application of these techniques to the cyber security domain is complex and still immature. Among the many open issues that affect security systems based on machine learning, we concentrate on adversarial attacks that aim to affect the detection and prediction capabilities of machine-learning models. We consider realistic types of poisoning and evasion attacks targeting security solutions devoted to malware, spam and network intrusion detection. We explore the possible damages that an attacker can cause to a cyber detector and present some existing and original defensive techniques in the context of intrusion detection systems. This paper contains several performance evaluations that are based on extensive experiments using large traffic datasets. The results highlight that modern adversarial attacks are highly effective against machine-learning classifiers for cyber detection, and that existing solutions require improvements in several directions. The paper paves the way for more robust machine-learning-based techniques that can be integrated into cyber security platforms.

Khalid, F., Hanif, M. A., Rehman, S., Ahmed, R., Shafique, M..  2019.  TrISec: Training Data-Unaware Imperceptible Security Attacks on Deep Neural Networks. 2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS). :188—193.

Most of the data manipulation attacks on deep neural networks (DNNs) during the training stage introduce a perceptible noise that can be catered by preprocessing during inference, or can be identified during the validation phase. There-fore, data poisoning attacks during inference (e.g., adversarial attacks) are becoming more popular. However, many of them do not consider the imperceptibility factor in their optimization algorithms, and can be detected by correlation and structural similarity analysis, or noticeable (e.g., by humans) in multi-level security system. Moreover, majority of the inference attack rely on some knowledge about the training dataset. In this paper, we propose a novel methodology which automatically generates imperceptible attack images by using the back-propagation algorithm on pre-trained DNNs, without requiring any information about the training dataset (i.e., completely training data-unaware). We present a case study on traffic sign detection using the VGGNet trained on the German Traffic Sign Recognition Benchmarks dataset in an autonomous driving use case. Our results demonstrate that the generated attack images successfully perform misclassification while remaining imperceptible in both “subjective” and “objective” quality tests.

Chacon, H., Silva, S., Rad, P..  2019.  Deep Learning Poison Data Attack Detection. 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). :971—978.

Deep neural networks are widely used in many walks of life. Techniques such as transfer learning enable neural networks pre-trained on certain tasks to be retrained for a new duty, often with much less data. Users have access to both pre-trained model parameters and model definitions along with testing data but have either limited access to training data or just a subset of it. This is risky for system-critical applications, where adversarial information can be maliciously included during the training phase to attack the system. Determining the existence and level of attack in a model is challenging. In this paper, we present evidence on how adversarially attacking training data increases the boundary of model parameters using as an example of a CNN model and the MNIST data set as a test. This expansion is due to new characteristics of the poisonous data that are added to the training data. Approaching the problem from the feature space learned by the network provides a relation between them and the possible parameters taken by the model on the training phase. An algorithm is proposed to determine if a given network was attacked in the training by comparing the boundaries of parameters distribution on intermediate layers of the model estimated by using the Maximum Entropy Principle and the Variational inference approach.

Zhang, J., Chen, J., Wu, D., Chen, B., Yu, S..  2019.  Poisoning Attack in Federated Learning using Generative Adversarial Nets. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :374—380.

Federated learning is a novel distributed learning framework, where the deep learning model is trained in a collaborative manner among thousands of participants. The shares between server and participants are only model parameters, which prevent the server from direct access to the private training data. However, we notice that the federated learning architecture is vulnerable to an active attack from insider participants, called poisoning attack, where the attacker can act as a benign participant in federated learning to upload the poisoned update to the server so that he can easily affect the performance of the global model. In this work, we study and evaluate a poisoning attack in federated learning system based on generative adversarial nets (GAN). That is, an attacker first acts as a benign participant and stealthily trains a GAN to mimic prototypical samples of the other participants' training set which does not belong to the attacker. Then these generated samples will be fully controlled by the attacker to generate the poisoning updates, and the global model will be compromised by the attacker with uploading the scaled poisoning updates to the server. In our evaluation, we show that the attacker in our construction can successfully generate samples of other benign participants using GAN and the global model performs more than 80% accuracy on both poisoning tasks and main tasks.

Liang, Y., He, D., Chen, D..  2019.  Poisoning Attack on Load Forecasting. 2019 IEEE Innovative Smart Grid Technologies - Asia (ISGT Asia). :1230—1235.

Short-term load forecasting systems for power grids have demonstrated high accuracy and have been widely employed for commercial use. However, classic load forecasting systems, which are based on statistical methods, are subject to vulnerability from training data poisoning. In this paper, we demonstrate a data poisoning strategy that effectively corrupts the forecasting model even in the presence of outlier detection. To the best of our knowledge, poisoning attack on short-term load forecasting with outlier detection has not been studied in previous works. Our method applies to several forecasting models, including the most widely-adapted and best-performing ones, such as multiple linear regression (MLR) and neural network (NN) models. Starting with the MLR model, we develop a novel closed-form solution to quickly estimate the new MLR model after a round of data poisoning without retraining. We then employ line search and simulated annealing to find the poisoning attack solution. Furthermore, we use the MLR attacking solution to generate a numerical solution for other models, such as NN. The effectiveness of our algorithm has been tested on the Global Energy Forecasting Competition (GEFCom2012) data set with the presence of outlier detection.

Kim, Y., Ahn, S., Thang, N. C., Choi, D., Park, M..  2019.  ARP Poisoning Attack Detection Based on ARP Update State in Software-Defined Networks. 2019 International Conference on Information Networking (ICOIN). :366—371.

Recently, the novel networking technology Software-Defined Networking(SDN) and Service Function Chaining(SFC) are rapidly growing, and security issues are also emerging for SDN and SFC. However, the research about security and safety on a novel networking environment is still unsatisfactory, and the vulnerabilities have been revealed continuously. Among these security issues, this paper addresses the ARP Poisoning attack to exploit SFC vulnerability, and proposes a method to defend the attack. The proposed method recognizes the repetitive ARP reply which is a feature of ARP Poisoning attack, and detects ARP Poisoning attack. The proposed method overcomes the limitations of the existing detection methods. The proposed method also detects the presence of an attack more accurately.

Jin, Y., Tomoishi, M., Matsuura, S..  2019.  A Detection Method Against DNS Cache Poisoning Attacks Using Machine Learning Techniques: Work in Progress. 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA). :1—3.

DNS based domain name resolution has been known as one of the most fundamental Internet services. In the meanwhile, DNS cache poisoning attacks also have become a critical threat in the cyber world. In addition to Kaminsky attacks, the falsified data from the compromised authoritative DNS servers also have become the threats nowadays. Several solutions have been proposed in order to prevent DNS cache poisoning attacks in the literature for the former case such as DNSSEC (DNS Security Extensions), however no effective solutions have been proposed for the later case. Moreover, due to the performance issue and significant workload increase on DNS cache servers, DNSSEC has not been deployed widely yet. In this work, we propose an advanced detection method against DNS cache poisoning attacks using machine learning techniques. In the proposed method, in addition to the basic 5-tuple information of a DNS packet, we intend to add a lot of special features extracted based on the standard DNS protocols as well as the heuristic aspects such as “time related features”, “GeoIP related features” and “trigger of cached DNS data”, etc., in order to identify the DNS response packets used for cache poisoning attacks especially those from compromised authoritative DNS servers. In this paper, as a work in progress, we describe the basic idea and concept of our proposed method as well as the intended network topology of the experimental environment while the prototype implementation, training data preparation and model creation as well as the evaluations will belong to the future work.

Khurana, N., Mittal, S., Piplai, A., Joshi, A..  2019.  Preventing Poisoning Attacks On AI Based Threat Intelligence Systems. 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP). :1—6.

As AI systems become more ubiquitous, securing them becomes an emerging challenge. Over the years, with the surge in online social media use and the data available for analysis, AI systems have been built to extract, represent and use this information. The credibility of this information extracted from open sources, however, can often be questionable. Malicious or incorrect information can cause a loss of money, reputation, and resources; and in certain situations, pose a threat to human life. In this paper, we use an ensembled semi-supervised approach to determine the credibility of Reddit posts by estimating their reputation score to ensure the validity of information ingested by AI systems. We demonstrate our approach in the cybersecurity domain, where security analysts utilize these systems to determine possible threats by analyzing the data scattered on social media websites, forums, blogs, etc.

Shen, J., Zhu, X., Ma, D..  2019.  TensorClog: An Imperceptible Poisoning Attack on Deep Neural Network Applications. IEEE Access. 7:41498—41506.

Internet application providers now have more incentive than ever to collect user data, which greatly increases the risk of user privacy violations due to the emerging of deep neural networks. In this paper, we propose TensorClog-a poisoning attack technique that is designed for privacy protection against deep neural networks. TensorClog has three properties with each of them serving a privacy protection purpose: 1) training on TensorClog poisoned data results in lower inference accuracy, reducing the incentive of abusive data collection; 2) training on TensorClog poisoned data converges to a larger loss, which prevents the neural network from learning the privacy; and 3) TensorClog regularizes the perturbation to remain a high structure similarity, so that the poisoning does not affect the actual content in the data. Applying our TensorClog poisoning technique to CIFAR-10 dataset results in an increase in both converged training loss and test error by 300% and 272%, respectively. It manages to maintain data's human perception with a high SSIM index of 0.9905. More experiments including different limited information attack scenarios and a real-world application transferred from pre-trained ImageNet models are presented to further evaluate TensorClog's effectiveness in more complex situations.

2020-11-02
Pan, C., Huang, J., Gong, J., Yuan, X..  2019.  Few-Shot Transfer Learning for Text Classification With Lightweight Word Embedding Based Models. IEEE Access. 7:53296–53304.
Many deep learning architectures have been employed to model the semantic compositionality for text sequences, requiring a huge amount of supervised data for parameters training, making it unfeasible in situations where numerous annotated samples are not available or even do not exist. Different from data-hungry deep models, lightweight word embedding-based models could represent text sequences in a plug-and-play way due to their parameter-free property. In this paper, a modified hierarchical pooling strategy over pre-trained word embeddings is proposed for text classification in a few-shot transfer learning way. The model leverages and transfers knowledge obtained from some source domains to recognize and classify the unseen text sequences with just a handful of support examples in the target problem domain. The extensive experiments on five datasets including both English and Chinese text demonstrate that the simple word embedding-based models (SWEMs) with parameter-free pooling operations are able to abstract and represent the semantic text. The proposed modified hierarchical pooling method exhibits significant classification performance in the few-shot transfer learning tasks compared with other alternative methods.
Bilanová, Z., Perháč, J..  2019.  About possibilities of applying logical analysis of natural language in computer science. 2019 IEEE 13th International Symposium on Applied Computational Intelligence and Informatics (SACI). :251–256.
This paper deals with the comparison of the most popular methods of a logical analysis of natural language Montague intensional logic and Transparent intensional logic. At first, these logical apparatuses are compared in terms of their founding theoretical principles. Later, the selected sentence is examined through the logical analysis. The aim of the paper is to identify a more expressive logical method, which will be a suitable basis for the future design of an algorithm for the automated translation of the natural language into a formal representation of its meaning through a semantic machine.
Zhong, J., Yang, C..  2019.  A Compositionality Assembled Model for Learning and Recognizing Emotion from Bodily Expression. 2019 IEEE 4th International Conference on Advanced Robotics and Mechatronics (ICARM). :821–826.
When we are express our internal status, such as emotions, the human body expression we use follows the compositionality principle. It is a theory in linguistic which proposes that the single components of the bodily presentation as well as the rules used to combine them are the major parts to finish this process. In this paper, such principle is applied to the process of expressing and recognizing emotional states through body expression, in which certain key features can be learned to represent certain primitives of the internal emotional state in the form of basic variables. This is done by a hierarchical recurrent neural learning framework (RNN) because of its nonlinear dynamic bifurcation, so that variables can be learned to represent different hierarchies. In addition, we applied some adaptive learning techniques in machine learning for the requirement of real-time emotion recognition, in which a stable representation can be maintained compared to previous work. The model is examined by comparing the PB values between the training and recognition phases. This hierarchical model shows the rationality of the compositionality hypothesis by the RNN learning and explains how key features can be used and combined in bodily expression to show the emotional state.
2020-10-29
Vi, Bao Ngoc, Noi Nguyen, Huu, Nguyen, Ngoc Tran, Truong Tran, Cao.  2019.  Adversarial Examples Against Image-based Malware Classification Systems. 2019 11th International Conference on Knowledge and Systems Engineering (KSE). :1—5.

Malicious software, known as malware, has become urgently serious threat for computer security, so automatic mal-ware classification techniques have received increasing attention. In recent years, deep learning (DL) techniques for computer vision have been successfully applied for malware classification by visualizing malware files and then using DL to classify visualized images. Although DL-based classification systems have been proven to be much more accurate than conventional ones, these systems have been shown to be vulnerable to adversarial attacks. However, there has been little research to consider the danger of adversarial attacks to visualized image-based malware classification systems. This paper proposes an adversarial attack method based on the gradient to attack image-based malware classification systems by introducing perturbations on resource section of PE files. The experimental results on the Malimg dataset show that by a small interference, the proposed method can achieve success attack rate when challenging convolutional neural network malware classifiers.

Xylogiannopoulos, Konstantinos F., Karampelas, Panagiotis, Alhajj, Reda.  2019.  Text Mining for Malware Classification Using Multivariate All Repeated Patterns Detection. 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). :887—894.

Mobile phones have become nowadays a commodity to the majority of people. Using them, people are able to access the world of Internet and connect with their friends, their colleagues at work or even unknown people with common interests. This proliferation of the mobile devices has also been seen as an opportunity for the cyber criminals to deceive smartphone users and steel their money directly or indirectly, respectively, by accessing their bank accounts through the smartphones or by blackmailing them or selling their private data such as photos, credit card data, etc. to third parties. This is usually achieved by installing malware to smartphones masking their malevolent payload as a legitimate application and advertise it to the users with the hope that mobile users will install it in their devices. Thus, any existing application can easily be modified by integrating a malware and then presented it as a legitimate one. In response to this, scientists have proposed a number of malware detection and classification methods using a variety of techniques. Even though, several of them achieve relatively high precision in malware classification, there is still space for improvement. In this paper, we propose a text mining all repeated pattern detection method which uses the decompiled files of an application in order to classify a suspicious application into one of the known malware families. Based on the experimental results using a real malware dataset, the methodology tries to correctly classify (without any misclassification) all randomly selected malware applications of 3 categories with 3 different families each.

Choi, Seok-Hwan, Shin, Jin-Myeong, Liu, Peng, Choi, Yoon-Ho.  2019.  Robustness Analysis of CNN-based Malware Family Classification Methods Against Various Adversarial Attacks. 2019 IEEE Conference on Communications and Network Security (CNS). :1—6.

As malware family classification methods, image-based classification methods have attracted much attention. Especially, due to the fast classification speed and the high classification accuracy, Convolutional Neural Network (CNN)-based malware family classification methods have been studied. However, previous studies on CNN-based classification methods focused only on improving the classification accuracy of malware families. That is, previous studies did not consider the cases that the accuracy of CNN-based malware classification methods can be decreased under the existence of adversarial attacks. In this paper, we analyze the robustness of various CNN-based malware family classification models under adversarial attacks. While adding imperceptible non-random perturbations to the input image, we measured how the accuracy of the CNN-based malware family classification model can be affected. Also, we showed the influence of three significant visualization parameters(i.e., the size of input image, dimension of input image, and conversion color of a special character)on the accuracy variation under adversarial attacks. From the evaluation results using the Microsoft malware dataset, we showed that even the accuracy over 98% of the CNN-based malware family classification method can be decreased to less than 7%.

Roseline, S. Abijah, Sasisri, A. D., Geetha, S., Balasubramanian, C..  2019.  Towards Efficient Malware Detection and Classification using Multilayered Random Forest Ensemble Technique. 2019 International Carnahan Conference on Security Technology (ICCST). :1—6.

The exponential growth rate of malware causes significant security concern in this digital era to computer users, private and government organizations. Traditional malware detection methods employ static and dynamic analysis, which are ineffective in identifying unknown malware. Malware authors develop new malware by using polymorphic and evasion techniques on existing malware and escape detection. Newly arriving malware are variants of existing malware and their patterns can be analyzed using the vision-based method. Malware patterns are visualized as images and their features are characterized. The alternative generation of class vectors and feature vectors using ensemble forests in multiple sequential layers is performed for classifying malware. This paper proposes a hybrid stacked multilayered ensembling approach which is robust and efficient than deep learning models. The proposed model outperforms the machine learning and deep learning models with an accuracy of 98.91%. The proposed system works well for small-scale and large-scale data since its adaptive nature of setting parameters (number of sequential levels) automatically. It is computationally efficient in terms of resources and time. The method uses very fewer hyper-parameters compared to deep neural networks.

Priyamvada Davuluru, Venkata Salini, Narayanan Narayanan, Barath, Balster, Eric J..  2019.  Convolutional Neural Networks as Classification Tools and Feature Extractors for Distinguishing Malware Programs. 2019 IEEE National Aerospace and Electronics Conference (NAECON). :273—278.

Classifying malware programs is a research area attracting great interest for Anti-Malware industry. In this research, we propose a system that visualizes malware programs as images and distinguishes those using Convolutional Neural Networks (CNNs). We study the performance of several well-established CNN based algorithms such as AlexNet, ResNet and VGG16 using transfer learning approaches. We also propose a computationally efficient CNN-based architecture for classification of malware programs. In addition, we study the performance of these CNNs as feature extractors by using Support Vector Machine (SVM) and K-nearest Neighbors (kNN) for classification purposes. We also propose fusion methods to boost the performance further. We make use of the publicly available database provided by Microsoft Malware Classification Challenge (BIG 2015) for this study. Our overall performance is 99.4% for a set of 2174 test samples comprising 9 different classes thereby setting a new benchmark.

Wei, Qu, Xiao, Shi, Dongbao, Li.  2019.  Malware Classification System Based on Machine Learning. 2019 Chinese Control And Decision Conference (CCDC). :647—652.

The main challenge for malware researchers is the large amount of data and files that need to be evaluated for potential threats. Researchers analyze a large number of new malware daily and classify them in order to extract common features. Therefore, a system that can ensure and improve the efficiency and accuracy of the classification is of great significance for the study of malware characteristics. A high-performance, high-efficiency automatic classification system based on multi-feature selection fusion of machine learning is proposed in this paper. Its performance and efficiency, according to our experiments, have been greatly improved compared to single-featured systems.

Mahajan, Ginika, Saini, Bhavna, Anand, Shivam.  2019.  Malware Classification Using Machine Learning Algorithms and Tools. 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP). :1—8.

Malware classification is the process of categorizing the families of malware on the basis of their signatures. This work focuses on classifying the emerging malwares on the basis of comparable features of similar malwares. This paper proposes a novel framework that categorizes malware samples into their families and can identify new malware samples for analysis. For this six diverse classification techniques of machine learning are used. To get more comparative and thus accurate classification results, analysis is done using two different tools, named as Knime and Orange. The work proposed can help in identifying and thus cleaning new malwares and classifying malware into their families. The correctness of family classification of malwares is investigated in terms of confusion matrix, accuracy and Cohen's Kappa. After evaluation it is analyzed that Random Forest gives the highest accuracy.

Tran, Trung Kien, Sato, Hiroshi, Kubo, Masao.  2019.  Image-Based Unknown Malware Classification with Few-Shot Learning Models. 2019 Seventh International Symposium on Computing and Networking Workshops (CANDARW). :401—407.

Knowing malware types in every malware attacks is very helpful to the administrators to have proper defense policies for their system. It must be a massive benefit for the organization as well as the social if the automatic protection systems could themselves detect, classify an existence of new malware types in the whole network system with a few malware samples. This feature helps to prevent the spreading of malware as soon as any damage is caused to the networks. An approach introduced in this paper takes advantage of One-shot/few-shot learning algorithms in solving the malware classification problems by using some well-known models such as Matching Networks, Prototypical Networks. To demonstrate an efficiency of the approach, we run the experiments on the two malware datasets (namely, MalImg and Microsoft Malware Classification Challenge), and both experiments all give us very high accuracies. We confirm that if applying models correctly from the machine learning area could bring excellent performance compared to the other traditional methods, open a new area of malware research.

Lo, Wai Weng, Yang, Xu, Wang, Yapeng.  2019.  An Xception Convolutional Neural Network for Malware Classification with Transfer Learning. 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS). :1—5.

In this work, we applied a deep Convolutional Neural Network (CNN) with Xception model to perform malware image classification. The Xception model is a recently developed special CNN architecture that is more powerful with less over- fitting problems than the current popular CNN models such as VGG16. However only a few use cases of the Xception model can be found in literature, and it has never been used to solve the malware classification problem. The performance of our approach was compared with other methods including KNN, SVM, VGG16 etc. The experiments on two datasets (Malimg and Microsoft Malware Dataset) demonstrated that the Xception model can achieve the highest training accuracy than all other approaches including the champion approach, and highest validation accuracy than all other approaches including VGG16 model which are using image-based malware classification (except the champion solution as this information was not provided). Additionally, we proposed a novel ensemble model to combine the predictions from .bytes files and .asm files, showing that a lower logloss can be achieved. Although the champion on the Microsoft Malware Dataset achieved a bit lower logloss, our approach does not require any features engineering, making it more effective to adapt to any future evolution in malware, and very much less time consuming than the champion's solution.

Jiang, Jianguo, Li, Song, Yu, Min, Li, Gang, Liu, Chao, Chen, Kai, Liu, Hui, Huang, Weiqing.  2019.  Android Malware Family Classification Based on Sensitive Opcode Sequence. 2019 IEEE Symposium on Computers and Communications (ISCC). :1—7.

Android malware family classification is an advanced task in Android malware analysis, detection and forensics. Existing methods and models have achieved a certain success for Android malware detection, but the accuracy and the efficiency are still not up to the expectation, especially in the context of multiple class classification with imbalanced training data. To address those challenges, we propose an Android malware family classification model by analyzing the code's specific semantic information based on sensitive opcode sequence. In this work, we construct a sensitive semantic feature-sensitive opcode sequence using opcodes, sensitive APIs, STRs and actions, and propose to analyze the code's specific semantic information, generate a semantic related vector for Android malware family classification based on this feature. Besides, aiming at the families with minority, we adopt an oversampling technique based on the sensitive opcode sequence. Finally, we evaluate our method on Drebin dataset, and select the top 40 malware families for experiments. The experimental results show that the Total Accuracy and Average AUC (Area Under Curve, AUC) reach 99.50% and 98.86% with 45. 17s per Android malware, and even if the number of malware families increases, these results remain good.

2020-10-26
Black, Paul, Gondal, Iqbal, Vamplew, Peter, Lakhotia, Arun.  2019.  Evolved Similarity Techniques in Malware Analysis. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :404–410.

Malware authors are known to reuse existing code, this development process results in software evolution and a sequence of versions of a malware family containing functions that show a divergence from the initial version. This paper proposes the term evolved similarity to account for this gradual divergence of similarity across the version history of a malware family. While existing techniques are able to match functions in different versions of malware, these techniques work best when the version changes are relatively small. This paper introduces the concept of evolved similarity and presents automated Evolved Similarity Techniques (EST). EST differs from existing malware function similarity techniques by focusing on the identification of significantly modified functions in adjacent malware versions and may also be used to identify function similarity in malware samples that differ by several versions. The challenge in identifying evolved malware function pairs lies in identifying features that are relatively invariant across evolved code. The research in this paper makes use of the function call graph to establish these features and then demonstrates the use of these techniques using Zeus malware.

Li, Huhua, Zhan, Dongyang, Liu, Tianrui, Ye, Lin.  2019.  Using Deep-Learning-Based Memory Analysis for Malware Detection in Cloud. 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems Workshops (MASSW). :1–6.
Malware is one of the biggest threats in cloud computing. Malware running inside virtual machines or containers could steal critical information or continue to attack other cloud nodes. To detect malware in cloud, especially zero-day malware, signature-and machine-learning-based approaches are proposed to analyze the execution binary. However, malicious binary files may not permanently be stored in the file system of virtual machine or container, periodically scanner may not find the target files. Dynamic analysis approach usually introduce run-time overhead to virtual machines, which is not widely used in cloud. To solve these problems, we propose a memory analysis approach to detect malware, employing the deep learning technology. The system analyzes the memory image periodically during malware execution, which will not introduce run-time overhead. We first extract the memory snapshot from running virtual machines or containers. Then, the snapshot is converted to a grayscale image. Finally, we employ CNN to detect malware. In the learning phase, malicious and benign software are trained. In the testing phase, we test our system with real-world malwares.