Biblio
Deep web refers to sites that cannot be found by search engines and makes up the 96% of the digital world. The dark web is the part of the deep web that can only be accessed through specialised tools and anonymity networks. To avoid monitoring and control, communities that seek for anonymization are moving to the dark web. In this work, we scrape five dark web forums and construct five graphs to model user connections. These networks are then studied and compared using data mining techniques and social network analysis tools; for each community we identify the key actors, we study the social connections and interactions, we observe the small world effect, and we highlight the type of discussions among the users. Our results indicate that only a small subset of users are influential, while the rapid dissemination of information and resources between users may affect behaviours and formulate ideas for future members.
With increasing monitoring and regulation by platforms, communities with criminal interests are moving to the dark web, which hosts content ranging from whistle-blowing and privacy, to drugs, terrorism, and hacking. Using post discussion data from six dark web forums we construct six interaction graphs and use social network analysis tools to study these underground communities. We observe the structure of each network to highlight structural patterns and identify nodes of importance through network centrality analysis. Our findings suggest that in the majority of the forums some members are highly connected and form hubs, while most members have a lower number of connections. When examining the posting activities of central nodes we found that most of the central nodes post in sub-forums with broader topics, such as general discussions and tutorials. These members play different roles in the different forums, and within each forum we identified diverse user profiles.
While the rapid adaptation of mobile devices changes our daily life more conveniently, the threat derived from malware is also increased. There are lots of research to detect malware to protect mobile devices, but most of them adopt only signature-based malware detection method that can be easily bypassed by polymorphic and metamorphic malware. To detect malware and its variants, it is essential to adopt behavior-based detection for efficient malware classification. This paper presents a system that classifies malware by using common behavioral characteristics along with malware families. We measure the similarity between malware families with carefully chosen features commonly appeared in the same family. With the proposed similarity measure, we can classify malware by malware's attack behavior pattern and tactical characteristics. Also, we apply community detection algorithm to increase the modularity within each malware family network aggregation. To maintain high classification accuracy, we propose a process to derive the optimal weights of the selected features in the proposed similarity measure. During this process, we find out which features are significant for representing the similarity between malware samples. Finally, we provide an intuitive graph visualization of malware samples which is helpful to understand the distribution and likeness of the malware networks. In the experiment, the proposed system achieved 97% accuracy for malware classification and 95% accuracy for prediction by K-fold cross-validation using the real malware dataset.
We present new algorithms for Personalized PageRank estimation and Personalized PageRank search. First, for the problem of estimating Personalized PageRank (PPR) from a source distribution to a target node, we present a new bidirectional estimator with simple yet strong guarantees on correctness and performance, and 3x to 8x speedup over existing estimators in experiments on a diverse set of networks. Moreover, it has a clean algebraic structure which enables it to be used as a primitive for the Personalized PageRank Search problem: Given a network like Facebook, a query like "people named John," and a searching user, return the top nodes in the network ranked by PPR from the perspective of the searching user. Previous solutions either score all nodes or score candidate nodes one at a time, which is prohibitively slow for large candidate sets. We develop a new algorithm based on our bidirectional PPR estimator which identifies the most relevant results by sampling candidates based on their PPR; this is the first solution to PPR search that can find the best results without iterating through the set of all candidate results. Finally, by combining PPR sampling with sequential PPR estimation and Monte Carlo, we develop practical algorithms for PPR search, and we show via experiments that our algorithms are efficient on networks with billions of edges.
While email plays a growingly important role on the Internet, we are faced with more severe challenges brought by compromised email accounts, especially for the administrators of institutional email service providers. Inspired by the previous experience on spam filtering and compromised accounts detection, we propose several criteria, like Success Outdegree Proportion, Reverse Pagerank, Recipient Clustering Coefficient and Legitimate Recipient Proportion, for compromised email accounts detection from the perspective of graph topology in this paper. Specifically, several widely used social network analysis metrics are used and adapted according to the characteristics of mail log analysis. We evaluate our methods on a dataset constructed by mining the one month (30 days) mail log from an university with 118,617 local users and 11,460,399 mail log entries. The experimental results demonstrate that our methods achieve very positive performance, and we also prove that these methods can be efficiently applied on even larger datasets.
The modern malware poses serious security threats because of its evolved capability of using staged and persistent attack while remaining undetected over a long period of time to perform a number of malicious activities. The challenge for malicious actors is to gain initial control of the victim's machine by bypassing all the security controls. The most favored bait often used by attackers is to deceive users through a trusting or interesting email containing a malicious attachment or a malicious link. To make the email credible and interesting the cybercriminals often perform reconnaissance activities to find background information on the potential target. To this end, the value of information found on the discarded or stolen storage devices is often underestimated or ignored. In this paper, we present the partial results of analysis of one such hard disk that was purchased from the open market. The data found on the disk contained highly sensitive personal and organizational data. The results from the case study will be useful in not only understanding the involved risk but also creating awareness of related threats.
Decision makers need capabilities to quickly model and effectively assess consequences of actions and reactions in crisis de-escalation environments. The creation and what-if exercising of such models has traditionally had onerous resource requirements. This research demonstrates fast and viable ways to build such models in operational environments. Through social network extraction from texts, network analytics to identify key actors, and then simulation to assess alternative interventions, advisors can support practicing and execution of crisis de-escalation activities. We describe how we used this approach as part of a scenario-driven modeling effort. We demonstrate the strength of moving from data to models and the advantages of data-driven simulation, which allow for iterative refinement. We conclude with a discussion of the limitations of this approach and anticipated future work.
Formal methods, models and tools for social big data analytics are largely limited to graph theoretical approaches such as social network analysis (SNA) informed by relational sociology. There are no other unified modeling approaches to social big data that integrate the conceptual, formal and software realms. In this paper, we first present and discuss a theory and conceptual model of social data. Second, we outline a formal model based on set theory and discuss the semantics of the formal model with a real-world social data example from Facebook. Third, we briefly present and discuss the Social Data Analytics Tool (SODATO) that realizes the conceptual model in software and provisions social data analysis based on the conceptual and formal models. Fourth and last, based on the formal model and sentiment analysis of text, we present a method for profiling of artifacts and actors and apply this technique to the data analysis of big social data collected from Facebook page of the fast fashion company, H&M.
In this article, researcher collaboration patterns and research topics on Intelligence and Security Informatics (ISI) are investigated using social network analysis approaches. The collaboration networks exhibit scale-free property and small-world effect. From these networks, the authors obtain the key researchers, institutions, and three important topics.
With the advent of social networks and cloud computing, the amount of multimedia data produced and communicated within social networks is rapidly increasing. In the mean time, social networking platform based on cloud computing has made multimedia big data sharing in social network easier and more efficient. The growth of social multimedia, as demonstrated by social networking sites such as Facebook and YouTube, combined with advances in multimedia content analysis, underscores potential risks for malicious use such as illegal copying, piracy, plagiarism, and misappropriation. Therefore, secure multimedia sharing and traitor tracing issues have become critical and urgent in social network. In this paper, we propose a scheme for implementing the Tree-Structured Harr (TSH) transform in a homomorphic encrypted domain for fingerprinting using social network analysis with the purpose of protecting media distribution in social networks. The motivation is to map hierarchical community structure of social network into tree structure of TSH transform for JPEG2000 coding, encryption and fingerprinting. Firstly, the fingerprint code is produced using social network analysis. Secondly, the encrypted content is decomposed by the TSH transform. Thirdly, the content is fingerprinted in the TSH transform domain. At last, the encrypted and fingerprinted contents are delivered to users via hybrid multicast-unicast. The use of fingerprinting along with encryption can provide a double-layer of protection to media sharing in social networks. Theory analysis and experimental results show the effectiveness of the proposed scheme.