Visible to the public Biblio

Found 267 results

Filters: Keyword is graph theory  [Clear All Filters]
2023-09-18
Dvorak, Stepan, Prochazka, Pavel, Bajer, Lukas.  2022.  GNN-Based Malicious Network Entities Identification In Large-Scale Network Data. NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium. :1—4.
A reliable database of Indicators of Compromise (IoC’s) is a cornerstone of almost every malware detection system. Building the database and keeping it up-to-date is a lengthy and often manual process where each IoC should be manually reviewed and labeled by an analyst. In this paper, we focus on an automatic way of identifying IoC’s intended to save analysts’ time and scale to the volume of network data. We leverage relations of each IoC to other entities on the internet to build a heterogeneous graph. We formulate a classification task on this graph and apply graph neural networks (GNNs) in order to identify malicious domains. Our experiments show that the presented approach provides promising results on the task of identifying high-risk malware as well as legitimate domains classification.
Amer, Eslam, Samir, Adham, Mostafa, Hazem, Mohamed, Amer, Amin, Mohamed.  2022.  Malware Detection Approach Based on the Swarm-Based Behavioural Analysis over API Calling Sequence. 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC). :27—32.
The rapidly increasing malware threats must be coped with new effective malware detection methodologies. Current malware threats are not limited to daily personal transactions but dowelled deeply within large enterprises and organizations. This paper introduces a new methodology for detecting and discriminating malicious versus normal applications. In this paper, we employed Ant-colony optimization to generate two behavioural graphs that characterize the difference in the execution behavior between malware and normal applications. Our proposed approach relied on the API call sequence generated when an application is executed. We used the API calls as one of the most widely used malware dynamic analysis features. Our proposed method showed distinctive behavioral differences between malicious and non-malicious applications. Our experimental results showed a comparative performance compared to other machine learning methods. Therefore, we can employ our method as an efficient technique in capturing malicious applications.
Cao, Michael, Ahmed, Khaled, Rubin, Julia.  2022.  Rotten Apples Spoil the Bunch: An Anatomy of Google Play Malware. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). :1919—1931.
This paper provides an in-depth analysis of Android malware that bypassed the strictest defenses of the Google Play application store and penetrated the official Android market between January 2016 and July 2021. We systematically identified 1,238 such malicious applications, grouped them into 134 families, and manually analyzed one application from 105 distinct families. During our manual analysis, we identified malicious payloads the applications execute, conditions guarding execution of the payloads, hiding techniques applications employ to evade detection by the user, and other implementation-level properties relevant for automated malware detection. As most applications in our dataset contain multiple payloads, each triggered via its own complex activation logic, we also contribute a graph-based representation showing activation paths for all application payloads in form of a control- and data-flow graph. Furthermore, we discuss the capabilities of existing malware detection tools, put them in context of the properties observed in the analyzed malware, and identify gaps and future research directions. We believe that our detailed analysis of the recent, evasive malware will be of interest to researchers and practitioners and will help further improve malware detection tools.
Herath, Jerome Dinal, Wakodikar, Priti Prabhakar, Yang, Ping, Yan, Guanhua.  2022.  CFGExplainer: Explaining Graph Neural Network-Based Malware Classification from Control Flow Graphs. 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :172—184.
With the ever increasing threat of malware, extensive research effort has been put on applying Deep Learning for malware classification tasks. Graph Neural Networks (GNNs) that process malware as Control Flow Graphs (CFGs) have shown great promise for malware classification. However, these models are viewed as black-boxes, which makes it hard to validate and identify malicious patterns. To that end, we propose CFG-Explainer, a deep learning based model for interpreting GNN-oriented malware classification results. CFGExplainer identifies a subgraph of the malware CFG that contributes most towards classification and provides insight into importance of the nodes (i.e., basic blocks) within it. To the best of our knowledge, CFGExplainer is the first work that explains GNN-based mal-ware classification. We compared CFGExplainer against three explainers, namely GNNExplainer, SubgraphX and PGExplainer, and showed that CFGExplainer is able to identify top equisized subgraphs with higher classification accuracy than the other three models.
Jia, Jingyun, Chan, Philip K..  2022.  Representation Learning with Function Call Graph Transformations for Malware Open Set Recognition. 2022 International Joint Conference on Neural Networks (IJCNN). :1—8.
Open set recognition (OSR) problem has been a challenge in many machine learning (ML) applications, such as security. As new/unknown malware families occur regularly, it is difficult to exhaust samples that cover all the classes for the training process in ML systems. An advanced malware classification system should classify the known classes correctly while sensitive to the unknown class. In this paper, we introduce a self-supervised pre-training approach for the OSR problem in malware classification. We propose two transformations for the function call graph (FCG) based malware representations to facilitate the pretext task. Also, we present a statistical thresholding approach to find the optimal threshold for the unknown class. Moreover, the experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem.
Ding, Zhenquan, Xu, Hui, Guo, Yonghe, Yan, Longchuan, Cui, Lei, Hao, Zhiyu.  2022.  Mal-Bert-GCN: Malware Detection by Combining Bert and GCN. 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :175—183.
With the dramatic increase in malicious software, the sophistication and innovation of malware have increased over the years. In particular, the dynamic analysis based on the deep neural network has shown high accuracy in malware detection. However, most of the existing methods only employ the raw API sequence feature, which cannot accurately reflect the actual behavior of malicious programs in detail. The relationship between API calls is critical for detecting suspicious behavior. Therefore, this paper proposes a malware detection method based on the graph neural network. We first connect the API sequences executed by different processes to build a directed process graph. Then, we apply Bert to encode the API sequences of each process into node embedding, which facilitates the semantic execution information inside the processes. Finally, we employ GCN to mine the deep semantic information based on the directed process graph and node embedding. In addition to presenting the design, we have implemented and evaluated our method on 10,000 malware and 10,000 benign software datasets. The results show that the precision and recall of our detection model reach 97.84% and 97.83%, verifying the effectiveness of our proposed method.
Pranav, Putsa Rama Krishna, Verma, Sachin, Shenoy, Sahana, Saravanan, S..  2022.  Detection of Botnets in IoT Networks using Graph Theory and Machine Learning. 2022 6th International Conference on Trends in Electronics and Informatics (ICOEI). :590—597.
The Internet of things (IoT) is proving to be a boon in granting internet access to regularly used objects and devices. Sensors, programs, and other innovations interact and trade information with different gadgets and frameworks over the web. Even in modern times, IoT gadgets experience the ill effects of primary security threats, which expose them to many dangers and malware, one among them being IoT botnets. Botnets carry out attacks by serving as a vector and this has become one of the significant dangers on the Internet. These vectors act against associations and carry out cybercrimes. They are used to produce spam, DDOS attacks, click frauds, and steal confidential data. IoT gadgets bring various challenges unlike the common malware on PCs and Android devices as IoT gadgets have heterogeneous processor architecture. Numerous researches use static or dynamic analysis for detection and classification of botnets on IoT gadgets. Most researchers haven't addressed the multi-architecture issue and they use a lot of computing resources for analyzing. Therefore, this approach attempts to classify botnets in IoT by using PSI-Graphs which effectively addresses the problem of encryption in IoT botnet detection, tackles the multi-architecture problem, and reduces computation time. It proposes another methodology for describing and recognizing botnets utilizing graph-based Machine Learning techniques and Exploratory Data Analysis to analyze the data and identify how separable the data is to recognize bots at an earlier stage so that IoT devices can be prevented from being attacked.
Warmsley, Dana, Waagen, Alex, Xu, Jiejun, Liu, Zhining, Tong, Hanghang.  2022.  A Survey of Explainable Graph Neural Networks for Cyber Malware Analysis. 2022 IEEE International Conference on Big Data (Big Data). :2932—2939.
Malicious cybersecurity activities have become increasingly worrisome for individuals and companies alike. While machine learning methods like Graph Neural Networks (GNNs) have proven successful on the malware detection task, their output is often difficult to understand. Explainable malware detection methods are needed to automatically identify malicious programs and present results to malware analysts in a way that is human interpretable. In this survey, we outline a number of GNN explainability methods and compare their performance on a real-world malware detection dataset. Specifically, we formulated the detection problem as a graph classification problem on the malware Control Flow Graphs (CFGs). We find that gradient-based methods outperform perturbation-based methods in terms of computational expense and performance on explainer-specific metrics (e.g., Fidelity and Sparsity). Our results provide insights into designing new GNN-based models for cyber malware detection and attribution.
Wang, Rui, Zheng, Jun, Shi, Zhiwei, Tan, Yu'an.  2022.  Detecting Malware Using Graph Embedding and DNN. 2022 International Conference on Blockchain Technology and Information Security (ICBCTIS). :28—31.
Nowadays, the popularity of intelligent terminals makes malwares more and more serious. Among the many features of application, the call graph can accurately express the behavior of the application. The rapid development of graph neural network in recent years provides a new solution for the malicious analysis of application using call graphs as features. However, there are still problems such as low accuracy. This paper established a large-scale data set containing more than 40,000 samples and selected the class call graph, which was extracted from the application, as the feature and used the graph embedding combined with the deep neural network to detect the malware. The experimental results show that the accuracy of the detection model proposed in this paper is 97.7%; the precision is 96.6%; the recall is 96.8%; the F1-score is 96.4%, which is better than the existing detection model based on Markov chain and graph embedding detection model.
Oshio, Kei, Takada, Satoshi, Han, Chansu, Tanaka, Akira, Takeuchi, Jun'ichi.  2022.  Poster: Flexible Function Estimation of IoT Malware Using Graph Embedding Technique. 2022 IEEE Symposium on Computers and Communications (ISCC). :1—3.
Most IoT malware is variants generated by editing and reusing parts of the functions based on publicly available source codes. In our previous study, we proposed a method to estimate the functions of a specimen using the Function Call Sequence Graph (FCSG), which is a directed graph of execution sequence of function calls. In the FCSG-based method, the subgraph corresponding to a malware functionality is manually created and called a signature-FSCG. The specimens with the signature-FSCG are expected to have the corresponding functionality. However, this method cannot detect the specimens with a slightly different subgraph from the signature-FSCG. This paper found that these specimens were supposed to have the same functionality for a signature-FSCG. These specimens need more flexible signature matching, and we propose a graph embedding technique to realize it.
2023-05-12
Zhang, Chen, Wu, Zhouyang, Li, Xianghua, Liang, Jian, Jiang, Zhongyao, Luo, Ceheng, Wen, Fangjun, Wang, Guangda, Dai, Wei.  2022.  Resilience Assessment Method of Integrated Electricity and Gas System Based on Hetero-functional Graph Theory. 2022 2nd International Conference on Electrical Engineering and Control Science (IC2ECS). :34–39.
The resilience assessment of electric and gas networks gains importance due to increasing interdependencies caused by the coupling of gas-fired units. However, the gradually increasing scale of the integrated electricity and gas system (IEGS) poses a significant challenge to current assessment methods. The numerical analysis method is accurate but time-consuming, which may incur a significant computational cost in large-scale IEGS. Therefore, this paper proposes a resilience assessment method based on hetero-functional graph theory for IEGS to balance the accuracy with the computational complexity. In contrast to traditional graph theory, HFGT can effectively depict the coupled systems with inherent heterogeneity and can represent the structure of heterogeneous functional systems in a clear and unambiguous way. In addition, due to the advantages of modelling the system functionality, the effect of line-pack in the gas network on the system resilience is depicted more precisely in this paper. Simulation results on an IEGS with the IEEE 9-bus system and a 7-node gas system verify the effectiveness of the proposed method.
2023-04-28
Zhu, Yuwen, Yu, Lei.  2022.  A Modeling Method of Cyberspace Security Structure Based on Layer-Level Division. 2022 IEEE 5th International Conference on Computer and Communication Engineering Technology (CCET). :247–251.
As the cyberspace structure becomes more and more complex, the problems of dynamic network space topology, complex composition structure, large spanning space scale, and a high degree of self-organization are becoming more and more important. In this paper, we model the cyberspace elements and their dependencies by combining the knowledge of graph theory. Layer adopts a network space modeling method combining virtual and real, and level adopts a spatial iteration method. Combining the layer-level models into one, this paper proposes a fast modeling method for cyberspace security structure model with network connection relationship, hierarchical relationship, and vulnerability information as input. This method can not only clearly express the individual vulnerability constraints in the network space, but also clearly express the hierarchical relationship of the complex dependencies of network individuals. For independent network elements or independent network element groups, it has flexibility and can greatly reduce the computational complexity in later applications.
Gao, Hongbin, Wang, Shangxing, Zhang, Hongbin, Liu, Bin, Zhao, Dongmei, Liu, Zhen.  2022.  Network Security Situation Assessment Method Based on Absorbing Markov Chain. 2022 International Conference on Networking and Network Applications (NaNA). :556–561.
This paper has a new network security evaluation method as an absorbing Markov chain-based assessment method. This method is different from other network security situation assessment methods based on graph theory. It effectively refinement issues such as poor objectivity of other methods, incomplete consideration of evaluation factors, and mismatching of evaluation results with the actual situation of the network. Firstly, this method collects the security elements in the network. Then, using graph theory combined with absorbing Markov chain, the threat values of vulnerable nodes are calculated and sorted. Finally, the maximum possible attack path is obtained by blending network asset information to determine the current network security status. The experimental results prove that the method fully considers the vulnerability and threat node ranking and the specific case of system network assets, which makes the evaluation result close to the actual network situation.
Yang, Hongna, Zhang, Yiwei.  2022.  On an extremal problem of regular graphs related to fractional repetition codes. 2022 IEEE International Symposium on Information Theory (ISIT). :1566–1571.
Fractional repetition (FR) codes are a special family of regenerating codes with the repair-by-transfer property. The constructions of FR codes are naturally related to combinatorial designs, graphs, and hypergraphs. Given the file size of an FR code, it is desirable to determine the minimum number of storage nodes needed. The problem is related to an extremal graph theory problem, which asks for the minimum number of vertices of an α-regular graph such that any subgraph with k vertices has at most δ edges. In this paper, we present a class of regular graphs for this problem to give the bounds for the minimum number of storage nodes for the FR codes.
ISSN: 2157-8117
2022-12-09
Zhai, Lijing, Vamvoudakis, Kyriakos G., Hugues, Jérôme.  2022.  A Graph-Theoretic Security Index Based on Undetectability for Cyber-Physical Systems. 2022 American Control Conference (ACC). :1479—1484.
In this paper, we investigate the conditions for the existence of dynamically undetectable attacks and perfectly undetectable attacks. Then we provide a quantitative measure on the security for discrete-time linear time-invariant (LTI) systems under both actuator and sensor attacks based on undetectability. Finally, the computation of proposed security index is reduced to a min-cut problem for the structured systems by graph theory. Numerical examples are provided to illustrate the theoretical results.
2022-12-01
Kandaperumal, Gowtham, Pandey, Shikhar, Srivastava, Anurag.  2022.  AWR: Anticipate, Withstand, and Recover Resilience Metric for Operational and Planning Decision Support in Electric Distribution System. IEEE Transactions on Smart Grid. 13:179—190.

With the increasing number of catastrophic weather events and resulting disruption in the energy supply to essential loads, the distribution grid operators’ focus has shifted from reliability to resiliency against high impact, low-frequency events. Given the enhanced automation to enable the smarter grid, there are several assets/resources at the disposal of electric utilities to enhances resiliency. However, with a lack of comprehensive resilience tools for informed operational decisions and planning, utilities face a challenge in investing and prioritizing operational control actions for resiliency. The distribution system resilience is also highly dependent on system attributes, including network, control, generating resources, location of loads and resources, as well as the progression of an extreme event. In this work, we present a novel multi-stage resilience measure called the Anticipate-Withstand-Recover (AWR) metrics. The AWR metrics are based on integrating relevant ‘system characteristics based factors’, before, during, and after the extreme event. The developed methodology utilizes a pragmatic and flexible approach by adopting concepts from the national emergency preparedness paradigm, proactive and reactive controls of grid assets, graph theory with system and component constraints, and multi-criteria decision-making process. The proposed metrics are applied to provide decision support for a) the operational resilience and b) planning investments, and validated for a real system in Alaska during the entirety of the event progression.

2022-09-20
Koteshwara, Sandhya.  2021.  Security Risk Assessment of Server Hardware Architectures Using Graph Analysis. 2021 Asian Hardware Oriented Security and Trust Symposium (AsianHOST). :1—4.
The growing complexity of server architectures, which incorporate several components with state, has necessitated rigorous assessment of the security risk both during design and operation. In this paper, we propose a novel technique to model the security risk of servers by mapping their architectures to graphs. This allows us to leverage tools from computational graph theory, which we combine with probability theory for deriving quantitative metrics for risk assessment. Probability of attack is derived for server components, with prior probabilities assigned based on knowledge of existing vulnerabilities and countermeasures. The resulting analysis is further used to compute measures of impact and exploitability of attack. The proposed methods are demonstrated on two open-source server designs with different architectures.
2022-03-08
Choucri, Nazli, Agarwal, Gaurav.  2022.  International Law for Cyber Operations: Networks, Complexity, Transparency. MIT Political Science Network. :1-38.
Policy documents are usually written in text form—word after word, sentence after sentence, page after page, section after section, chapter after chapter—which often masks some of their most critical features. The text form cannot easily show interconnections among elements, identify the relative salience of issues, or represent feedback dynamics, for example. These are “hidden” features that are difficult to situate. This paper presents a computational analysis of Tallinn Manual 2.0 on the International Law Applicable to Cyber Operations, a seminal work in International Law. Tallinn Manual 2.0 is a seminal document for many reasons, including but not limited to, its (a) authoritative focus on cyber operations, (b) foundation in the fundamental legal principles of the international order and (c) direct relevance to theory, practice, and policy in international relations. The results identify the overwhelming dominance of specific Rules, the centrality of select Rules, the Rules with autonomous standing (that is, not connected to the rest of the corpus), and highlight different aspects of Tallinn Manual 2.0, notably situating authority, security of information -- the feedback structure that keeps the pieces together. This study serves as a “proof of concept” for the use of computational logics to enhance our understanding of policy documents.
2021-09-21
Barr, Joseph R., Shaw, Peter, Abu-Khzam, Faisal N., Yu, Sheng, Yin, Heng, Thatcher, Tyler.  2020.  Combinatorial Code Classification Amp; Vulnerability Rating. 2020 Second International Conference on Transdisciplinary AI (TransAI). :80–83.
Empirical analysis of source code of Android Fluoride Bluetooth stack demonstrates a novel approach of classification of source code and rating for vulnerability. A workflow that combines deep learning and combinatorial techniques with a straightforward random forest regression is presented. Two kinds of embedding are used: code2vec and LSTM, resulting in a distance matrix that is interpreted as a (combinatorial) graph whose vertices represent code components, functions and methods. Cluster Editing is then applied to partition the vertex set of the graph into subsets representing nearly complete subgraphs. Finally, the vectors representing the components are used as features to model the components for vulnerability risk.
Li, Mingxuan, Lv, Shichao, Shi, Zhiqiang.  2020.  Malware Detection for Industrial Internet Based on GAN. 2020 IEEE International Conference on Information Technology,Big Data and Artificial Intelligence (ICIBA). 1:475–481.
This thesis focuses on the detection of malware in industrial Internet. The basic flow of the detection of malware contains feature extraction and sample identification. API graph can effectively represent the behavior information of malware. However, due to the high algorithm complexity of solving the problem of subgraph isomorphism, the efficiency of analysis based on graph structure feature is low. Due to the different scales of API graph of different malicious codes, the API graph needs to be normalized. Considering the difficulties of sample collection and manual marking, it is necessary to expand the number of malware samples in industrial Internet. This paper proposes a method that combines PageRank with TF-IDF to process the API graph. Besides, this paper proposes a method to construct the adversarial samples of malwares based on GAN.
Petrenko, Sergei A., Petrenko, Alexey S., Makoveichuk, Krystina A., Olifirov, Alexander V..  2020.  "Digital Bombs" Neutralization Method. 2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). :446–451.
The article discusses new models and methods for timely identification and blocking of malicious code of critically important information infrastructure based on static and dynamic analysis of executable program codes. A two-stage method for detecting malicious code in the executable program codes (the so-called "digital bombs") is described. The first step of the method is to build the initial program model in the form of a control graph, the construction is carried out at the stage of static analysis of the program. The article discusses the purpose, features and construction criteria of an ordered control graph. The second step of the method is to embed control points in the program's executable code for organizing control of the possible behavior of the program using a specially designed recognition automaton - an automaton of dynamic control. Structural criteria for the completeness of the functional control of the subprogram are given. The practical implementation of the proposed models and methods was completed and presented in a special instrumental complex IRIDA.
Chamotra, Saurabh, Barbhuiya, Ferdous Ahmed.  2020.  Analysis and Modelling of Multi-Stage Attacks. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1268–1275.
Honeypots are the information system resources used for capturing and analysis of cyber attacks. Highinteraction Honeypots are capable of capturing attacks in their totality and hence are an ideal choice for capturing multi-stage cyber attacks. The term multi-stage attack is an abstraction that refers to a class of cyber attacks consisting of multiple attack stages. These attack stages are executed either by malicious codes, scripts or sometimes even inbuilt system tools. In the work presented in this paper we have proposed a framework for capturing, analysis and modelling of multi-stage cyber attacks. The objective of our work is to devise an effective mechanism for the classification of multi-stage cyber attacks. The proposed framework comprise of a network of high interaction honeypots augmented with an attack analysis engine. The analysis engine performs rule based labeling of captured honeypot data. The labeling engine labels the attack data as generic events. These events are further fused to generate attack graphs. The hence generated attack graphs are used to characterize and later classify the multi-stage cyber attacks.
Mohanasruthi, V., Chakraborty, Abhishek, Thanudas, B., Sreelal, S., Manoj, B. S..  2020.  An Efficient Malware Detection Technique Using Complex Network-Based Approach. 2020 National Conference on Communications (NCC). :1–6.
System security is becoming an indispensable part of our daily life due to the rapid proliferation of unknown malware attacks. Recent malware found to have a very complicated structure that is hard to detect by the traditional malware detection techniques such as antivirus, intrusion detection systems, and network scanners. In this paper, we propose a complex network-based malware detection technique, Malware Detection using Complex Network (MDCN), that considers Application Program Interface Call Transition Matrix (API-CTM) to generate complex network topology and then extracts various feature set by analyzing different metrics of the complex network to distinguish malware and benign applications. The generated feature set is then sent to several machine learning classifiers, which include naive-Bayes, support vector machine, random forest, and multilayer perceptron, to comparatively analyze the performance of MDCN-based technique. The analysis reveals that MDCN shows higher accuracy, with lower false-positive cases, when the multilayer perceptron-based classifier is used for the detection of malware. MDCN technique can efficiently be deployed in the design of an integrated enterprise network security system.
Yan, Fan, Liu, Jia, Gu, Liang, Chen, Zelong.  2020.  A Semi-Supervised Learning Scheme to Detect Unknown DGA Domain Names Based on Graph Analysis. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1578–1583.
A large amount of malware families use the domain generation algorithms (DGA) to randomly generate a large amount of domain names. It is a good way to bypass conventional blacklists of domain names, because we cannot predict which of the randomly generated domain names are selected for command and control (C&C) communications. An effective approach for detecting known DGA families is to investigate the malware with reverse engineering to find the adopted generation algorithms. As reverse engineering cannot handle the variants of DGA families, some researches leverage supervised learning to find new variants. However, the explainability of supervised learning is low and cannot find previously unseen DGA families. In this paper, we propose a graph-based semi-supervised learning scheme to track the evolution of known DGA families and find previously unseen DGA families. With a domain relation graph, we can clearly figure out how new variants relate to known DGA domain names, which induces better explainability. We deployed the proposed scheme on real network scenarios and show that the proposed scheme can not only comprehensively and precisely find known DGA families, but also can find new DGA families which have not seen before.
Yang, Ping, Shu, Hui, Kang, Fei, Bu, Wenjuan.  2020.  Automatically Generating Malware Summary Using Semantic Behavior Graphs (SBGs). 2020 Information Communication Technologies Conference (ICTC). :282–291.
In malware behavior analysis, there are limitations in the analysis method of control flow and data flow. Researchers analyzed data flow by dynamic taint analysis tools, however, it cost a lot. In this paper, we proposed a method of generating malware summary based on semantic behavior graphs (SBGs, Semantic Behavior Graphs) to address this issue. In this paper, we considered various situation where behaviors be capable of being associated, thus an algorithm of generating semantic behavior graphs was given firstly. Semantic behavior graphs are composed of behavior nodes and associated data edges. Then, we extracted behaviors and logical relationships between behaviors from semantic behavior graphs, and finally generated a summary of malware behaviors with true intension. Experimental results showed that our approach can effectively identify and describe malicious behaviors and generate accurate behavior summary.