Biblio
Advanced persistent threats (APT) have increased in recent times as a result of the rise in interest by nation-states and sophisticated corporations to obtain high profile information. Typically, APT attacks are more challenging to detect since they leverage zero-day attacks and common benign tools. Furthermore, these attack campaigns are often prolonged to evade detection. We leverage an approach that uses a provenance graph to obtain execution traces of host nodes in order to detect anomalous behavior. By using the provenance graph, we extract features that are then used to train an online adaptive metric learning. Online metric learning is a deep learning method that learns a function to minimize the separation between similar classes and maximizes the separation between dis-similar instances. We compare our approach with baseline models and we show our method outperforms the baseline models by increasing detection accuracy on average by 11.3 % and increases True positive rate (TPR) on average by 18.3 %.
In this paper, we formulate a combinatorial optimization problem that aims to maximize the accuracy of a lower bound estimate of the probability of security of a multi-robot system (MRS), while minimizing the computational complexity involved in its calculation. Security of an MRS is defined using the well-known control theoretic notion of left invertiblility, and the probability of security of an MRS can be calculated using binary decision diagrams (BDDs). The complexity of a BDD depends on the number of disjoint path sets considered during its construction. Taking into account all possible disjoint paths results in an exact probability of security, however, selecting an optimal subset of disjoint paths leads to a good estimate of the probability while significantly reducing computation. To deal with the dynamic nature of MRSs, we introduce two methods: (1) multi-point optimization, a technique that requires some a priori knowledge of the topology of the MRS over time, and (2) online optimization, a technique that does not require a priori knowledge, but must construct BDDs while the MRS is operating. Finally, our approach is validated on an MRS performing a rendezvous objective while exchanging information according to a noisy state agreement process.
Malicious domain names are consistently changing. It is challenging to keep blacklists of malicious domain names up-to-date because of the time lag between its creation and detection. Even if a website is clean itself, it does not necessarily mean that it won't be used as a pivot point to redirect users to malicious destinations. To address this issue, this paper demonstrates how to use linkage analysis and open-source threat intelligence to visualize the relationship of malicious domain names whilst verifying their categories, i.e., drive-by download, unwanted software etc. Featured by a graph-based model that could present the inter-connectivity of malicious domain names in a dynamic fashion, the proposed approach proved to be helpful for revealing the group patterns of different kinds of malicious domain names. When applied to analyze a blacklisted set of URLs in a real enterprise network, it showed better effectiveness than traditional methods and yielded a clearer view of the common patterns in the data.
{Static characteristic extraction method Control flow-based features proposed by Ding has the ability to detect malicious code with higher accuracy than traditional Text-based methods. However, this method resolved NP-hard problem in a graph, therefore it is not feasible with the large-size and high-complexity programs. So, we propose the C500-CFG algorithm in Control flow-based features based on the idea of dynamic programming, solving Ding's NP-hard problem in O(N2) time complexity, where N is the number of basic blocks in decom-piled executable codes. Our algorithm is more efficient and more outstanding in detecting malware than Ding's algorithm: fast processing time, allowing processing large files, using less memory and extracting more feature information. Applying our algorithms with IoT data sets gives outstanding results on 2 measures: Accuracy = 99.34%
We classify .NET files as either benign or malicious by examining directed graphs derived from the set of functions comprising the given file. Each graph is viewed probabilistically as a Markov chain where each node represents a code block of the corresponding function, and by computing the PageRank vector (Perron vector with transport), a probability measure can be defined over the nodes of the given graph. Each graph is vectorized by computing Lebesgue antiderivatives of hand-engineered functions defined on the vertex set of the given graph against the PageRank measure. Files are subsequently vectorized by aggregating the set of vectors corresponding to the set of graphs resulting from decompiling the given file. The result is a fast, intuitive, and easy-to-compute glass-box vectorization scheme, which can be leveraged for training a standalone classifier or to augment an existing feature space. We refer to this vectorization technique as PageRank Measure Integration Vectorization (PMIV). We demonstrate the efficacy of PMIV by training a vanilla random forest on 2.5 million samples of decompiled. NET, evenly split between benign and malicious, from our in-house corpus and compare this model to a baseline model which leverages a text-only feature space. The median time needed for decompilation and scoring was 24ms. 11Code available at https://github.com/gtownrocks/grafuple.
The limited information on the cyberattacks available in the unclassified regime, hardens standardizing the analysis. We address the problem of modeling and analyzing cyberattacks using a multimodal graph approach. We formulate the stages, actors, and outcomes of cyberattacks as a multimodal graph. Multimodal graph nodes include cyberattack victims, adversaries, autonomous systems, and the observed cyber events. In multimodal graphs, single-modality graphs are interconnected according to their interaction. We apply community and centrality analysis on the graph to obtain in-depth insights into the attack. In community analysis, we cluster those nodes that exhibit “strong” inter-modal ties. We further use centrality to rank the nodes according to their importance. Classifying nodes according to centrality provides the progression of the attack from the attacker to the targeted nodes. We apply our methods to two popular case studies, namely GhostNet and Putter Panda and demonstrate a clear distinction in the attack stages.
In cloud computing environments with many virtual machines, containers, and other systems, an epidemic of malware can be crippling and highly threatening to business processes. In this vision paper, we introduce a hierarchical approach to performing malware detection and analysis using several recent advances in machine learning on graphs, hypergraphs, and natural language. We analyze individual systems and their logs, inspecting and understanding their behavior with attentional sequence models. Given a feature representation of each system's logs using this procedure, we construct an attributed network of the cloud with systems and other components as vertices and propose an analysis of malware with inductive graph and hypergraph learning models. With this foundation, we consider the multicloud case, in which multiple clouds with differing privacy requirements cooperate against the spread of malware, proposing the use of federated learning to perform inference and training while preserving privacy. Finally, we discuss several open problems that remain in defending cloud computing environments against malware related to designing robust ecosystems, identifying cloud-specific optimization problems for response strategy, action spaces for malware containment and eradication, and developing priors and transfer learning tasks for machine learning models in this area.
The rapid growth of Android malware apps poses a great security threat to users thus it is very important and urgent to detect Android malware effectively. What's more, the increasing unknown malware and evasion technique also call for novel detection method. In this paper, we focus on API feature and develop a novel method to detect Android malware. First, we propose a novel selection method for API feature related with the malware class. However, such API also has a legitimate use in benign app thus causing FP problem (misclassify benign as malware). Second, we further explore structure relationships between these APIs and map to a matrix interpreted as the hand-refined API-based feature graph. Third, a CNN-based classifier is developed for the API-based feature graph classification. Evaluations of a real-world dataset containing 3,697 malware apps and 3,312 benign apps demonstrate that selected API feature is effective for Android malware classification, just top 20 APIs can achieve high F1 of 94.3% under Random Forest classifier. When the available API features are few, classification performance including FPR indicator can achieve effective improvement effectively by complementing our further work.
With the rapid development of the mobile Internet, Android has been the most popular mobile operating system. Due to the open nature of Android, c countless malicious applications are hidden in a large number of benign applications, which pose great threats to users. Most previous malware detection approaches mainly rely on features such as permissions, API calls, and opcode sequences. However, these approaches fail to capture structural semantics of applications. In this paper, we propose AMDroid that leverages function call graphs (FCGs) representing the behaviors of applications and applies graph kernels to automatically learn the structural semantics of applications from FCGs. We evaluate AMDroid on the Genome Project, and the experimental results show that AMDroid is effective to detect Android malware with 97.49% detection accuracy.
IoT malware detection using control flow graph (CFG)-based features and deep learning networks are widely explored. The main goal of this study is to investigate the robustness of such models against adversarial learning. We designed two approaches to craft adversarial IoT software: off-the-shelf methods and Graph Embedding and Augmentation (GEA) method. In the off-the-shelf adversarial learning attack methods, we examine eight different adversarial learning methods to force the model to misclassification. The GEA approach aims to preserve the functionality and practicality of the generated adversarial sample through a careful embedding of a benign sample to a malicious one. Intensive experiments are conducted to evaluate the performance of the proposed method, showing that off-the-shelf adversarial attack methods are able to achieve a misclassification rate of 100%. In addition, we observed that the GEA approach is able to misclassify all IoT malware samples as benign. The findings of this work highlight the essential need for more robust detection tools against adversarial learning, including features that are not easy to manipulate, unlike CFG-based features. The implications of the study are quite broad, since the approach challenged in this work is widely used for other applications using graphs.
The rapid growth of Android malware has posed severe security threats to smartphone users. On the basis of the familial trait of Android malware observed by previous work, the familial analysis is a promising way to help analysts better focus on the commonalities of malware samples within the same families, thus reducing the analytical workload and accelerating malware analysis. The majority of existing approaches rely on supervised learning and face three main challenges, i.e., low accuracy, low efficiency, and the lack of labeled dataset. To address these challenges, we first construct a fine-grained behavior model by abstracting the program semantics into a set of subgraphs. Then, we propose SRA, a novel feature that depicts the similarity relationships between the Structural Roles of sensitive API call nodes in subgraphs. An SRA is obtained based on graph embedding techniques and represented as a vector, thus we can effectively reduce the high complexity of graph matching. After that, instead of training a classifier with labeled samples, we construct malware link network based on SRAs and apply community detection algorithms on it to group the unlabeled samples into groups. We implement these ideas in a system called GefDroid that performs Graph embedding based familial analysis of AnDroid malware using unsupervised learning. Moreover, we conduct extensive experiments to evaluate GefDroid on three datasets with ground truth. The results show that GefDroid can achieve high agreements (0.707-0.883 in term of NMI) between the clustering results and the ground truth. Furthermore, GefDroid requires only linear run-time overhead and takes around 8.6s to analyze a sample on average, which is considerably faster than the previous work.
Malware scanning of an app market is expected to be scalable and effective. However, existing approaches use either syntax-based features which can be evaded by transformation attacks or semantic-based features which are usually extracted by performing expensive program analysis. Therefor, in this paper, we propose a lightweight graph-based approach to perform Android malware detection. Instead of traditional heavyweight static analysis, we treat function call graphs of apps as social networks and perform social-network-based centrality analysis to represent the semantic features of the graphs. Our key insight is that centrality provides a succinct and fault-tolerant representation of graph semantics, especially for graphs with certain amount of inaccurate information (e.g., inaccurate call graphs). We implement a prototype system, MalScan, and evaluate it on datasets of 15,285 benign samples and 15,430 malicious samples. Experimental results show that MalScan is capable of detecting Android malware with up to 98% accuracy under one second which is more than 100 times faster than two state-of-the-art approaches, namely MaMaDroid and Drebin. We also demonstrate the feasibility of MalScan on market-wide malware scanning by performing a statistical study on over 3 million apps. Finally, in a corpus of dataset collected from Google-Play app market, MalScan is able to identify 18 zero-day malware including malware samples that can evade detection of existing tools.
Trust is known to be a key component in human social relationships. It is trust that defines human behavior with others to a large extent. Generative models have been extensively used in social networks study to simulate different characteristics and phenomena in social graphs. In this work, an attempt is made to understand how trust in social graphs can be combined with generative modeling techniques to generate trust-based social graphs. These generated social graphs are then compared with the original social graphs to evaluate how trust helps in generative modeling. Two well-known social network data sets i.e. the soc-Bitcoin and the wiki administrator network data sets are used in this work. Social graphs are generated from these data sets and then compared with the original graphs along with other standard generative modeling techniques to see how trust is a good component in this. Other Generative modeling techniques have been available for a while but this investigation with the real social graph data sets validate that trust can be an important factor in generative modeling.
Nowadays, Microblog has become an important online social networking platform, and a large number of users share information through Microblog. Many malicious users have released various false news driven by various interests, which seriously affects the availability of Microblog platform. Therefore, the evaluation of Microblog user credibility has become an important research issue. This paper proposes a microblog user credibility evaluation algorithm based on trust propagation. In view of the high consumption and low precision caused by malicious users' attacking algorithms and manual selection of seed sets by establishing false social relationships, this paper proposes two optimization strategies: pruning algorithm based on social activity and similarity and based on The seed node selection algorithm of clustering. The pruning algorithm can trim off the attack edges established by malicious users and normal users. The seed node selection algorithm can efficiently select the highly available seed node set, and finally use the user social relationship graph to perform the two-way propagation trust scoring, so that the low trusted user has a lower trusted score and thus identifies the malicious user. The related experiments verify the effectiveness of the trustworthiness-based user credibility evaluation algorithm in the evaluation of Microblog user credibility.
We propose a distributed machine-learning architecture to predict trustworthiness of sensor services in Mobile Edge Computing (MEC) based Internet of Things (IoT) services, which aligns well with the goals of MEC and requirements of modern IoT systems. The proposed machine-learning architecture models training a distributed trust prediction model over a topology of MEC-environments as a Network Lasso problem, which allows simultaneous clustering and optimization on large-scale networked-graphs. We then attempt to solve it using Alternate Direction Method of Multipliers (ADMM) in a way that makes it suitable for MEC-based IoT systems. We present analytical and simulation results to show the validity and efficiency of the proposed solution.
Malware is pervasive and poses serious threats to normal operation of business processes in cloud. Cloud computing environments typically have hundreds of hosts that are connected to each other, often with high risk trust assumptions and/or protection mechanisms that are not difficult to break. Malware often exploits such weaknesses, as its immediate goal is often to spread itself to as many hosts as possible. Detecting this propagation is often difficult to address because the malware may reside in multiple components across the software or hardware stack. In this scenario, it is usually best to contain the malware to the smallest possible number of hosts, and it's also critical for system administration to resolve the issue in a timely manner. Furthermore, resolution often requires that several participants across different organizational teams scramble together to address the intrusion. In this vision paper, we define this problem in detail. We then present our vision of decentralized malware containment and the challenges and issues associated with this vision. The approach of containment involves detection and response using graph analytics coupled with a blockchain framework. We propose the use of a dominance frontier for profile nodes which must be involved in the containment process. Smart contracts are used to obtain consensus amongst the involved parties. The paper presents a basic implementation of this proposal. We have further discussed some open problems related to our vision.

