Biblio
In this research, we examine and develop an expert system with a mechanism to automate crime category classification and threat level assessment, using the information collected by crawling the dark web. We have constructed a bag of words from 250 posts on the dark web and developed an expert system which takes the frequency of terms as an input and classifies sample posts into 6 criminal category dealing with drugs, stolen credit card, passwords, counterfeit products, child porn and others, and 3 threat levels (high, middle, low). Contrary to prior expectations, our simple and explainable expert system can perform competitively with other existing systems. For short, our experimental result with 1500 posts on the dark web shows 76.4% of recall rate for 6 criminal category classification and 83% of recall rate for 3 threat level discrimination for 100 random-sampled posts.
Cyber threat intelligence (CTI) necessitates automated monitoring of dark web platforms (e.g., Dark Net Markets and carding shops) on a large scale. While there are existing methods for collecting data from the surface web, large-scale dark web data collection is commonly hindered by anti-crawling measures. Text-based CAPTCHA serves as the most prohibitive type of these measures. Text-based CAPTCHA requires the user to recognize a combination of hard-to-read characters. Dark web CAPTCHA patterns are intentionally designed to have additional background noise and variable character length to prevent automated CAPTCHA breaking. Existing CAPTCHA breaking methods cannot remedy these challenges and are therefore not applicable to the dark web. In this study, we propose a novel framework for breaking text-based CAPTCHA in the dark web. The proposed framework utilizes Generative Adversarial Network (GAN) to counteract dark web-specific background noise and leverages an enhanced character segmentation algorithm. Our proposed method was evaluated on both benchmark and dark web CAPTCHA testbeds. The proposed method significantly outperformed the state-of-the-art baseline methods on all datasets, achieving over 92.08% success rate on dark web testbeds. Our research enables the CTI community to develop advanced capabilities of large-scale dark web monitoring.
Nowadays, there is a flood of data such as naked body photos and child pornography, which is making people bloodless. In addition, people also distribute drugs through unknown dark channels. In particular, most transactions are being made through the Deep Web, the dark path. “Deep Web refers to an encrypted network that is not detected on search engine like Google etc. Users must use Tor to visit sites on the dark web” [4]. In other words, the Dark Web uses Tor's encryption client. Therefore, users can visit multiple sites on the dark Web, but not know the initiator of the site. In this paper, we propose the key idea based on the current status of such crimes and a crime information visual system for Deep Web has been developed. The status of deep web is analyzed and data is visualized using Java. It is expected that the program will help more efficient management and monitoring of crime in unknown web such as deep web, torrent etc.
Personally identifiable information (PII) has become a major target of cyber-attacks, causing severe losses to data breach victims. To protect data breach victims, researchers focus on collecting exposed PII to assess privacy risk and identify at-risk individuals. However, existing studies mostly rely on exposed PII collected from either the dark web or the surface web. Due to the wide exposure of PII on both the dark web and surface web, collecting from only the dark web or the surface web could result in an underestimation of privacy risk. Despite its research and practical value, jointly collecting PII from both sources is a non-trivial task. In this paper, we summarize our effort to systematically identify, collect, and monitor a total of 1,212,004,819 exposed PII records across both the dark web and surface web. Our effort resulted in 5.8 million stolen SSNs, 845,000 stolen credit/debit cards, and 1.2 billion stolen account credentials. From the surface web, we identified and collected over 1.3 million PII records of the victims whose PII is exposed on the dark web. To the best of our knowledge, this is the largest academic collection of exposed PII, which, if properly anonymized, enables various privacy research inquiries, including assessing privacy risk and identifying at-risk populations.
An emerging Internet business is residential proxy (RESIP) as a service, in which a provider utilizes the hosts within residential networks (in contrast to those running in a datacenter) to relay their customers' traffic, in an attempt to avoid server- side blocking and detection. With the prominent roles the services could play in the underground business world, little has been done to understand whether they are indeed involved in Cybercrimes and how they operate, due to the challenges in identifying their RESIPs, not to mention any in-depth analysis on them. In this paper, we report the first study on RESIPs, which sheds light on the behaviors and the ecosystem of these elusive gray services. Our research employed an infiltration framework, including our clients for RESIP services and the servers they visited, to detect 6 million RESIP IPs across 230+ countries and 52K+ ISPs. The observed addresses were analyzed and the hosts behind them were further fingerprinted using a new profiling system. Our effort led to several surprising findings about the RESIP services unknown before. Surprisingly, despite the providers' claim that the proxy hosts are willingly joined, many proxies run on likely compromised hosts including IoT devices. Through cross-matching the hosts we discovered and labeled PUP (potentially unwanted programs) logs provided by a leading IT company, we uncovered various illicit operations RESIP hosts performed, including illegal promotion, Fast fluxing, phishing, malware hosting, and others. We also reverse engi- neered RESIP services' internal infrastructures, uncovered their potential rebranding and reselling behaviors. Our research takes the first step toward understanding this new Internet service, contributing to the effective control of their security risks.
This paper presents the results of a qualitative study on discussions about two major law enforcement interventions against Dark Net Market (DNM) users extracted from relevant Reddit forums. We assess the impact of Operation Hyperion and Operation Bayonet (combined with the closure of the site Hansa) by analyzing posts and comments made by users of two Reddit forums created for the discussion of Dark Net Markets. The operations are compared in terms of the size of the discussions, the consequences recorded, and the opinions shared by forum users. We find that Operation Bayonet generated a higher number of discussions on Reddit, and from the qualitative analysis of such discussions it appears that this operation also had a greater impact on the DNM ecosystem.
The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state-of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness.
A multitude of leaked data can be purchased through the Dark Web nowadays. Recent reports highlight that the largest footprints of leaked data, which range from employee passwords to intellectual property, are linked to governmental institutions. According to OWL Cybersecurity, the US Navy is most affected. Thinking of leaked data like personal files, this can have a severe impact. For example, it can be the cornerstone for the start of sophisticated social engineering attacks, for getting credentials for illegal system access or installing malicious code in the target network. If personally identifiable information or sensitive data, access plans, strategies or intellectual property are traded on the Dark Web, this could pose a threat to the armed forces. The actual impact, role, and dimension of information treated in the Dark Web are rarely analysed. Is the available data authentic and useful? Can it endanger the capabilities of armed forces? These questions are even more challenging, as several well-known cases of deanonymization have been published over recent years, raising the question whether somebody really would use the Dark Web to sell highly sensitive information. In contrast, fake offers from scammers can be found regularly, only set up to cheat possible buyers. A victim of illegal offers on the Dark Web will typically not go to the police. The paper analyses the technical base of the Dark Web and examines possibilities of deanonymization. After an analysis of Dark Web marketplaces and the articles traded there, a discussion of the potential risks to military operations will be used to identify recommendations on how to minimize the risk. The analysis concludes that surveillance of the Dark Web is necessary to increase the chance of identifying sensitive information early; but actually the `open' internet, the surface web and the Deep Web, poses the more important risk factor, as it is - in practice - more difficult to surveil than the Dark Web, and only a small share of breached information is traded on the latter.
The Dark Web, a conglomerate of services hidden from search engines and regular users, is used by cyber criminals to offer all kinds of illegal services and goods. Multiple Dark Web offerings are highly relevant for the cyber security domain in anticipating and preventing attacks, such as information about zero-day exploits, stolen datasets with login information, or botnets available for hire. In this work, we analyze and discuss the challenges related to information gathering in the Dark Web for cyber security intelligence purposes. To facilitate information collection and the analysis of large amounts of unstructured data, we present BlackWidow, a highly automated modular system that monitors Dark Web services and fuses the collected data in a single analytics framework. BlackWidow relies on a Docker-based micro service architecture which permits the combination of both preexisting and customized machine learning tools. BlackWidow represents all extracted data and the corresponding relationships extracted from posts in a large knowledge graph, which is made available to its security analyst users for search and interactive visual exploration. Using BlackWidow, we conduct a study of seven popular services on the Deep and Dark Web across three different languages with almost 100,000 users. Within less than two days of monitoring time, BlackWidow managed to collect years of relevant information in the areas of cyber security and fraud monitoring. We show that BlackWidow can infer relationships between authors and forums and detect trends for cybersecurity-related topics. Finally, we discuss exemplary case studies surrounding leaked data and preparation for malicious activity.
With the rapid development of the Internet, the dark network has also been widely used in the Internet [1]. Due to the anonymity of the dark network, many illegal elements have committed illegal crimes on the dark. It is difficult for law enforcement officials to track the identity of these cyber criminals using traditional network survey techniques based on IP addresses [2]. The threat information is mainly from the dark web forum and the dark web market. In this paper, we introduce the current mainstream dark network communication system TOR and develop a visual dark web forum post association analysis system to graphically display the relationship between various forum messages and posters, and help law enforcement officers to explore deep levels. Clues to analyze crimes in the dark network.
Cybercrimes and cyber criminals widely use dark web and illegal functionalities of the dark web towards the world crisis. More than half of the criminal activities and the terror activities conducted through the dark web such as, cryptocurrency, selling human organs, red rooms, child pornography, arm deals, drug deals, hire assassins and hackers, hacking software and malware programs, etc. The law enforcement agencies such as FBI, NSA, Interpol, Mossad, FSB etc, are always conducting surveillance programs through the dark web to trace down the mass criminals and terrorists while stopping the crimes and the terror activities. This paper is about the dark web marketing and surveillance programs. In the deep end research will discuss the dark web access with securely and how the law enforcement agencies exponentially tracking down the users with terror behaviours and activities. Moreover, the paper discusses dark web sites which users can grab the dark web jihadist services and anonymous markets including safety precautions.
With the development of network services and people's privacy requirements continue to increase. On the basis of providing anonymous user communication, it is necessary to protect the anonymity of the server. At the same time, there are many threatening crime messages in the dark network. However, many scholars lack the ability or expertise to conduct research on dark-net threat intelligence. Therefore, this paper designs a framework based on Hadoop is hidden threat intelligence. The framework uses HDFS as the underlying storage system to build a HBase-based distributed database to store and manage threat intelligence information. According to the heterogeneous type of the forum, the web crawler is used to collect data through the anonymous TOR tool. The framework is used to identify the characteristics of key dark network criminal networks, which is the basis for the later dark network research.
In recent years, cyber attack techniques are increasingly sophisticated, and blocking the attack is more and more difficult, even if a kind of counter measure or another is taken. In order for a successful handling of this situation, it is crucial to have a prediction of cyber attacks, appropriate precautions, and effective utilization of cyber intelligence that enables these actions. Malicious hackers share various kinds of information through particular communities such as the dark web, indicating that a great deal of intelligence exists in cyberspace. This paper focuses on forums on the dark web and proposes an approach to extract forums which include important information or intelligence from huge amounts of forums and identify traits of each forum using methodologies such as machine learning, natural language processing and so on. This approach will allow us to grasp the emerging threats in cyberspace and take appropriate measures against malicious activities.
We contribute a scalable, open source implementation of the Pooled Time Series (PoT) algorithm from CVPR 2015. The algorithm is evaluated on approximately 6800 human trafficking (HT) videos collected from the deep and dark web, and on an open dataset: the Human Motion Database (HMDB). We describe PoT and our motivation for using it on larger data and the issues we encountered. Our new solution reimagines PoT as an Apache Hadoop-based algorithm. We demonstrate that our new Hadoop-based algorithm successfully identifies similar videos in the HT and HMDB datasets and we evaluate the algorithm qualitatively and quantitatively.
This paper explores the process of collective crisis problem-solving in the darkweb. We conducted a preliminary study on one of the Tor-based darkweb forums, during the shutdown of two marketplaces. Content analysis suggests that distrust permeated the forum during the marketplace shutdowns. We analyzed the debates concerned with suspicious claims and conspiracies. The results suggest that a black-market crisis potentially offers an opportunity for cyber-intelligence to disrupt the darkweb by engendering internal conflicts. At the same time, the study also shows that darkweb members were adept at reaching collective solutions by sharing new market information, more secure technologies, and alternative routes for economic activities.
Tor is a well known and widely used darknet, known for its anonymity. However, while its protocol and relay security have already been extensively studied, to date there is no comprehensive analysis of the structure and privacy of its Web Hidden Service. To fill this gap, we developed a dedicated analysis platform and used it to crawl and analyze over 1.5M URLs hosted in 7257 onion domains. For each page we analyzed its links, resources, and redirections graphs, as well as the language and category distribution. According to our experiments, Tor hidden services are organized in a sparse but highly connected graph, in which around 10% of the onions sites are completely isolated. Our study also measures for the first time the tight connection that exists between Tor hidden services and the Surface Web. In fact, more than 20% of the onion domains we visited imported resources from the Surface Web, and links to the Surface Web are even more prevalent than to other onion domains. Finally, we measured for the first time the prevalence and the nature of web tracking in Tor hidden services, showing that, albeit not as widespread as in the Surface Web, tracking is notably present also in the Dark Web: more than 40% of the scripts are used for this purpose, with the 70% of them being completely new tracking scripts unknown by existing anti-tracking solutions.
Searching and retrieving information from the Web is a primary activity needed to monitor the development and usage of Web resources. Possible benefits include improving user experience (e.g. by optimizing query results) and enforcing data/user security (e.g. by identifying harmful websites). Motivated by the lack of ready-to-use solutions, in this paper we present a flexible and accessible toolkit for structure and content mining, able to crawl, download, extract and index resources from the Web. While being easily configurable to work in the "surface" Web, our suite is specifically tailored to explore the Tor dark Web, i.e. the ensemble of Web servers composing the world's most famous darknet. Notably, the toolkit is not just a Web scraper, but it includes two mining modules, respectively able to prepare content to be fed to an (external) semantic engine, and to reconstruct the graph structure of the explored portion of the Web. Other than discussing in detail the design, features and performance of our toolkit, we report the findings of a preliminary run over Tor, that clarify the potential of our solution.
The Dark Web is known as the part of the Internet operated by decentralized and anonymous-preserving protocols like Tor. To date, the research community has focused on understanding the size and characteristics of the Dark Web and the services and goods that are offered in its underground markets. However, little is still known about the attacks landscape in the Dark Web. For the traditional Web, it is now well understood how websites are exploited, as well as the important role played by Google Dorks and automated attack bots to form some sort of "background attack noise" to which public websites are exposed. This paper tries to understand if these basic concepts and components have a parallel in the Dark Web. In particular, by deploying a high interaction honeypot in the Tor network for a period of seven months, we conducted a measurement study of the type of attacks and of the attackers behavior that affect this still relatively unknown corner of the Web.
According to the new Tor network (6.0.5 version) can help the domestic users easily realize "over the wall", and of course criminals may use it to visit deep and dark website also. The paper analyzes the core technology of the new Tor network: the new flow obfuscation technology based on meek plug-in and real instance is used to verify the new Tor network's fast connectivity. On the basis of analyzing the traffic confusion mechanism and the network crime based on Tor, it puts forward some measures to prevent the using of Tor network to implement network crime.
Traffic classification, i.e. associating network traffic to the application that generated it, is an important tool for several tasks, spanning on different fields (security, management, traffic engineering, R&D). This process is challenged by applications that preserve Internet users' privacy by encrypting the communication content, and even more by anonymity tools, additionally hiding the source, the destination, and the nature of the communication. In this paper, leveraging a public dataset released in 2017, we provide (repeatable) classification results with the aim of investigating to what degree the specific anonymity tool (and the traffic it hides) can be identified, when compared to the traffic of the other considered anonymity tools, using machine learning approaches based on the sole statistical features. To this end, four classifiers are trained and tested on the dataset: (i) Naïve Bayes, (ii) Bayesian Network, (iii) C4.5, and (iv) Random Forest. Results show that the three considered anonymity networks (Tor, I2P, JonDonym) can be easily distinguished (with an accuracy of 99.99%), telling even the specific application generating the traffic (with an accuracy of 98.00%).