Biblio
In this paper we study the densest subgraph problem, which plays a key role in many graph mining applications. The goal of the problem is to find a subset of nodes that induces a graph with maximum average degree. The problem has been extensively studied in the past few decades under a variety of different settings. Several exact and approximation algorithms were proposed. However, as normal graph can only model objects with pairwise relationships, the densest subgraph problem fails in identifying communities under relationships that involve more than 2 objects, e.g., in a network connecting authors by publications. We consider in this work the densest subgraph problem in hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. We present two exact algorithms and a near-linear time r-approximation algorithm for the problem, where r is the maximum cardinality of an edge in the hypergraph. We also consider the dynamic version of the problem, in which an adversary can insert or delete an edge from the hypergraph in each round and the goal is to maintain efficiently an approximation of the densest subgraph. We present two dynamic approximation algorithms in this paper with amortized polog update time, for any ε \textbackslashtextgreater 0. For the case when there are only insertions, the approximation ratio we maintain is r(1+ε), while for the fully dynamic case, the ratio is r2(1+ε). Extensive experiments are performed on large real datasets to validate the effectiveness and efficiency of our algorithms.
As a vital component of variety cyber attacks, malicious domain detection becomes a hot topic for cyber security. Several recent techniques are proposed to identify malicious domains through analysis of DNS data because much of global information in DNS data which cannot be affected by the attackers. The attackers always recycle resources, so they frequently change the domain - IP resolutions and create new domains to avoid detection. Therefore, multiple malicious domains are hosted by the same IPs and multiple IPs also host same malicious domains in simultaneously, which create intrinsic association among them. Hence, using the labeled domains which can be traced back from queries history of all domains to verify and figure out the association of them all. Graphs seem the best candidate to represent for this relationship and there are many algorithms developed on graph with high performance. A graph-based interface can be developed and transformed to the graph mining task of inferring graph node's reputation scores using improvements of the belief propagation algorithm. Then higher reputation scores the nodes reveal, the more malicious probabilities they infer. For demonstration, this paper proposes a malicious domain detection technique and evaluates on a real-world dataset. The dataset is collected from DNS data servers which will be used for building a DNS graph. The proposed technique achieves high performance in accuracy rates over 98.3%, precision and recall rates as: 99.1%, 98.6%. Especially, with a small set of labeled domains (legitimate and malicious domains), the technique can discover a large set of potential malicious domains. The results indicate that the method is strongly effective in detecting malicious domains.
Detection of insider threats relies on monitoring individuals and their interactions with organizational resources. Identification of anomalous insiders typically relies on supervised learning models that use labeled data. However, such labeled data is not easily obtainable. The labeled data that does exist is also limited by current insider threat detection methods and undetected insiders would not be included. These models also inherently assume that the insider threat is not rapidly evolving between model generation and use of the model in detection. Yet there is a large body of research that illustrates that the insider threat changes significantly after some types of precipitating events, such as layoffs, significant restructuring, and plant or facility closure. To capture this temporal evolution of user-system interactions, we use an unsupervised learning framework to evaluate whether potential insider threat events are triggered following precipitating events. The analysis leverages a bipartite graph of user and system interactions. The approach shows a clear correlation between precipitating events and the number of apparent anomalies. The results of our empirical analysis show a clear shift in behaviors after events which have previously been shown to increase insider activity, specifically precipitating events. We argue that this metadata about the level of insider threat behaviors validates the potential of the approach. We apply our method to a dataset that comprises interactions between engineers and software components in an enterprise version control system spanning more than 22 years. We use this unlabeled dataset and automatically detect statistically significant events. We show that there is statistically significant evidence that a subset of users diversify their committing behavior after precipitating events have been announced. Although these findings do not constitute detection of insider threat events per se, they do identify patterns of potentially malicious high-risk insider behavior. They reinforce the idea that insider operations can be motivated by the insiders' environment. Our proposed framework outperforms algorithms based on naive random approaches and algorithms using volume dependent statistics. This graph mining technique has potential for early detection of insider threat behavior in user-system interactions independent of the volume of interactions. The proposed method also enables organizations without a corpus of identified insider threats to train its own anomaly detection system.
Understanding and fending off attack campaigns against organizations, companies and individuals, has become a global struggle. As today's threat actors become more determined and organized, isolated efforts to detect and reveal threats are no longer effective. Although challenging, this situation can be significantly changed if information about security incidents is collected, shared and analyzed across organizations. To this end, different exchange data formats such as STIX, CyBOX, or IODEF have been recently proposed and numerous CERTs are adopting these threat intelligence standards to share tactical and technical threat insights. However, managing, analyzing and correlating the vast amount of data available from different sources to identify relevant attack patterns still remains an open problem. In this paper we present Mantis, a platform for threat intelligence that enables the unified analysis of different standards and the correlation of threat data trough a novel type-agnostic similarity algorithm based on attributed graphs. Its unified representation allows the security analyst to discover similar and related threats by linking patterns shared between seemingly unrelated attack campaigns through queries of different complexity. We evaluate the performance of Mantis as an information retrieval system for threat intelligence in different experiments. In an evaluation with over 14,000 CyBOX objects, the platform enables retrieving relevant threat reports with a mean average precision of 80%, given only a single object from an incident, such as a file or an HTTP request. We further illustrate the performance of this analysis in two case studies with the attack campaigns Stuxnet and Regin.
In this paper we propose Mastino, a novel defense system to detect malware download events. A download event is a 3-tuple that identifies the action of downloading a file from a URL that was triggered by a client (machine). Mastino utilizes global situation awareness and continuously monitors various network- and system-level events of the clients' machines across the Internet and provides real time classification of both files and URLs to the clients upon submission of a new, unknown file or URL to the system. To enable detection of the download events, Mastino builds a large download graph that captures the subtle relationships among the entities of download events, i.e. files, URLs, and machines. We implemented a prototype version of Mastino and evaluated it in a large-scale real-world deployment. Our experimental evaluation shows that Mastino can accurately classify malware download events with an average of 95.5% true positive (TP), while incurring less than 0.5% false positives (FP). In addition, we show the Mastino can classify a new download event as either benign or malware in just a fraction of a second, and is therefore suitable as a real time defense system.
Given "who-trusts/distrusts-whom" information, how can we propagate the trust and distrust? With the appearance of fraudsters in social network sites, the importance of trust prediction has increased. Most such methods use only explicit and implicit trust information (e.g., if Smith likes several of Johnson's reviews, then Smith implicitly trusts Johnson), but they do not consider distrust. In this paper, we propose PIN-TRUST, a novel method to handle all three types of interaction information: explicit trust, implicit trust, and explicit distrust. The novelties of our method are the following: (a) it is carefully designed, to take into account positive, implicit, and negative information, (b) it is scalable (i.e., linear on the input size), (c) most importantly, it is effective and accurate. Our extensive experiments with a real dataset, Epinions.com data, of 100K nodes and 1M edges, confirm that PIN-TRUST is scalable and outperforms existing methods in terms of prediction accuracy, achieving up to 50.4 percentage relative improvement.
Recently, PIN-TRUST, a method to predict future trust relationships between users is proposed. PIN-TRUST out-performs existing trust prediction methods by exploiting all types of interactions between users and the reciprocation of ones. In this paper, we validate whether its consideration on the reciprocation of interactions is really effective in trust prediction. Furthermore, we consider a new concept, the "uncertainty" of untrustworthy users that is devised to reflect the difficulty on modeling the activities of untrustworthy users in PIN-TRUST. Then, we also validate the effectiveness this uncertainty concepts. Through the validation, we reveal that the consideration of the reciprocation of interactions is effective for trust prediction with PIN-TRUST, and it is necessary to regard the uncertainty of untrustworthy users same as that of other users.