Visible to the public Biblio

Filters: Keyword is dark web  [Clear All Filters]
2021-01-15
Kobayashi, H., Kadoguchi, M., Hayashi, S., Otsuka, A., Hashimoto, M..  2020.  An Expert System for Classifying Harmful Content on the Dark Web. 2020 IEEE International Conference on Intelligence and Security Informatics (ISI). :1—6.

In this research, we examine and develop an expert system with a mechanism to automate crime category classification and threat level assessment, using the information collected by crawling the dark web. We have constructed a bag of words from 250 posts on the dark web and developed an expert system which takes the frequency of terms as an input and classifies sample posts into 6 criminal category dealing with drugs, stolen credit card, passwords, counterfeit products, child porn and others, and 3 threat levels (high, middle, low). Contrary to prior expectations, our simple and explainable expert system can perform competitively with other existing systems. For short, our experimental result with 1500 posts on the dark web shows 76.4% of recall rate for 6 criminal category classification and 83% of recall rate for 3 threat level discrimination for 100 random-sampled posts.

Zhang, N., Ebrahimi, M., Li, W., Chen, H..  2020.  A Generative Adversarial Learning Framework for Breaking Text-Based CAPTCHA in the Dark Web. 2020 IEEE International Conference on Intelligence and Security Informatics (ISI). :1—6.

Cyber threat intelligence (CTI) necessitates automated monitoring of dark web platforms (e.g., Dark Net Markets and carding shops) on a large scale. While there are existing methods for collecting data from the surface web, large-scale dark web data collection is commonly hindered by anti-crawling measures. Text-based CAPTCHA serves as the most prohibitive type of these measures. Text-based CAPTCHA requires the user to recognize a combination of hard-to-read characters. Dark web CAPTCHA patterns are intentionally designed to have additional background noise and variable character length to prevent automated CAPTCHA breaking. Existing CAPTCHA breaking methods cannot remedy these challenges and are therefore not applicable to the dark web. In this study, we propose a novel framework for breaking text-based CAPTCHA in the dark web. The proposed framework utilizes Generative Adversarial Network (GAN) to counteract dark web-specific background noise and leverages an enhanced character segmentation algorithm. Our proposed method was evaluated on both benchmark and dark web CAPTCHA testbeds. The proposed method significantly outperformed the state-of-the-art baseline methods on all datasets, achieving over 92.08% success rate on dark web testbeds. Our research enables the CTI community to develop advanced capabilities of large-scale dark web monitoring.

Park, W..  2020.  A Study on Analytical Visualization of Deep Web. 2020 22nd International Conference on Advanced Communication Technology (ICACT). :81—83.

Nowadays, there is a flood of data such as naked body photos and child pornography, which is making people bloodless. In addition, people also distribute drugs through unknown dark channels. In particular, most transactions are being made through the Deep Web, the dark path. “Deep Web refers to an encrypted network that is not detected on search engine like Google etc. Users must use Tor to visit sites on the dark web” [4]. In other words, the Dark Web uses Tor's encryption client. Therefore, users can visit multiple sites on the dark Web, but not know the initiator of the site. In this paper, we propose the key idea based on the current status of such crimes and a crime information visual system for Deep Web has been developed. The status of deep web is analyzed and data is visualized using Java. It is expected that the program will help more efficient management and monitoring of crime in unknown web such as deep web, torrent etc.

Liu, Y., Lin, F. Y., Ahmad-Post, Z., Ebrahimi, M., Zhang, N., Hu, J. L., Xin, J., Li, W., Chen, H..  2020.  Identifying, Collecting, and Monitoring Personally Identifiable Information: From the Dark Web to the Surface Web. 2020 IEEE International Conference on Intelligence and Security Informatics (ISI). :1—6.

Personally identifiable information (PII) has become a major target of cyber-attacks, causing severe losses to data breach victims. To protect data breach victims, researchers focus on collecting exposed PII to assess privacy risk and identify at-risk individuals. However, existing studies mostly rely on exposed PII collected from either the dark web or the surface web. Due to the wide exposure of PII on both the dark web and surface web, collecting from only the dark web or the surface web could result in an underestimation of privacy risk. Despite its research and practical value, jointly collecting PII from both sources is a non-trivial task. In this paper, we summarize our effort to systematically identify, collect, and monitor a total of 1,212,004,819 exposed PII records across both the dark web and surface web. Our effort resulted in 5.8 million stolen SSNs, 845,000 stolen credit/debit cards, and 1.2 billion stolen account credentials. From the surface web, we identified and collected over 1.3 million PII records of the victims whose PII is exposed on the dark web. To the best of our knowledge, this is the largest academic collection of exposed PII, which, if properly anonymized, enables various privacy research inquiries, including assessing privacy risk and identifying at-risk populations.

2020-07-10
Mi, Xianghang, Feng, Xuan, Liao, Xiaojing, Liu, Baojun, Wang, XiaoFeng, Qian, Feng, Li, Zhou, Alrwais, Sumayah, Sun, Limin, Liu, Ying.  2019.  Resident Evil: Understanding Residential IP Proxy as a Dark Service. 2019 IEEE Symposium on Security and Privacy (SP). :1185—1201.

An emerging Internet business is residential proxy (RESIP) as a service, in which a provider utilizes the hosts within residential networks (in contrast to those running in a datacenter) to relay their customers' traffic, in an attempt to avoid server- side blocking and detection. With the prominent roles the services could play in the underground business world, little has been done to understand whether they are indeed involved in Cybercrimes and how they operate, due to the challenges in identifying their RESIPs, not to mention any in-depth analysis on them. In this paper, we report the first study on RESIPs, which sheds light on the behaviors and the ecosystem of these elusive gray services. Our research employed an infiltration framework, including our clients for RESIP services and the servers they visited, to detect 6 million RESIP IPs across 230+ countries and 52K+ ISPs. The observed addresses were analyzed and the hosts behind them were further fingerprinted using a new profiling system. Our effort led to several surprising findings about the RESIP services unknown before. Surprisingly, despite the providers' claim that the proxy hosts are willingly joined, many proxies run on likely compromised hosts including IoT devices. Through cross-matching the hosts we discovered and labeled PUP (potentially unwanted programs) logs provided by a leading IT company, we uncovered various illicit operations RESIP hosts performed, including illegal promotion, Fast fluxing, phishing, malware hosting, and others. We also reverse engi- neered RESIP services' internal infrastructures, uncovered their potential rebranding and reselling behaviors. Our research takes the first step toward understanding this new Internet service, contributing to the effective control of their security risks.

Bradley, Cerys, Stringhini, Gianluca.  2019.  A Qualitative Evaluation of Two Different Law Enforcement Approaches on Dark Net Markets. 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS PW). :453—463.

This paper presents the results of a qualitative study on discussions about two major law enforcement interventions against Dark Net Market (DNM) users extracted from relevant Reddit forums. We assess the impact of Operation Hyperion and Operation Bayonet (combined with the closure of the site Hansa) by analyzing posts and comments made by users of two Reddit forums created for the discussion of Dark Net Markets. The operations are compared in terms of the size of the discussions, the consequences recorded, and the opinions shared by forum users. We find that Operation Bayonet generated a higher number of discussions on Reddit, and from the qualitative analysis of such discussions it appears that this operation also had a greater impact on the DNM ecosystem.

Koloveas, Paris, Chantzios, Thanasis, Tryfonopoulos, Christos, Skiadopoulos, Spiros.  2019.  A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence. 2019 IEEE World Congress on Services (SERVICES). 2642-939X:3—8.

The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state-of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness.

Koch, Robert.  2019.  Hidden in the Shadow: The Dark Web - A Growing Risk for Military Operations? 2019 11th International Conference on Cyber Conflict (CyCon). 900:1—24.

A multitude of leaked data can be purchased through the Dark Web nowadays. Recent reports highlight that the largest footprints of leaked data, which range from employee passwords to intellectual property, are linked to governmental institutions. According to OWL Cybersecurity, the US Navy is most affected. Thinking of leaked data like personal files, this can have a severe impact. For example, it can be the cornerstone for the start of sophisticated social engineering attacks, for getting credentials for illegal system access or installing malicious code in the target network. If personally identifiable information or sensitive data, access plans, strategies or intellectual property are traded on the Dark Web, this could pose a threat to the armed forces. The actual impact, role, and dimension of information treated in the Dark Web are rarely analysed. Is the available data authentic and useful? Can it endanger the capabilities of armed forces? These questions are even more challenging, as several well-known cases of deanonymization have been published over recent years, raising the question whether somebody really would use the Dark Web to sell highly sensitive information. In contrast, fake offers from scammers can be found regularly, only set up to cheat possible buyers. A victim of illegal offers on the Dark Web will typically not go to the police. The paper analyses the technical base of the Dark Web and examines possibilities of deanonymization. After an analysis of Dark Web marketplaces and the articles traded there, a discussion of the potential risks to military operations will be used to identify recommendations on how to minimize the risk. The analysis concludes that surveillance of the Dark Web is necessary to increase the chance of identifying sensitive information early; but actually the `open' internet, the surface web and the Deep Web, poses the more important risk factor, as it is - in practice - more difficult to surveil than the Dark Web, and only a small share of breached information is traded on the latter.

Schäfer, Matthias, Fuchs, Markus, Strohmeier, Martin, Engel, Markus, Liechti, Marc, Lenders, Vincent.  2019.  BlackWidow: Monitoring the Dark Web for Cyber Security Information. 2019 11th International Conference on Cyber Conflict (CyCon). 900:1—21.

The Dark Web, a conglomerate of services hidden from search engines and regular users, is used by cyber criminals to offer all kinds of illegal services and goods. Multiple Dark Web offerings are highly relevant for the cyber security domain in anticipating and preventing attacks, such as information about zero-day exploits, stolen datasets with login information, or botnets available for hire. In this work, we analyze and discuss the challenges related to information gathering in the Dark Web for cyber security intelligence purposes. To facilitate information collection and the analysis of large amounts of unstructured data, we present BlackWidow, a highly automated modular system that monitors Dark Web services and fuses the collected data in a single analytics framework. BlackWidow relies on a Docker-based micro service architecture which permits the combination of both preexisting and customized machine learning tools. BlackWidow represents all extracted data and the corresponding relationships extracted from posts in a large knowledge graph, which is made available to its security analyst users for search and interactive visual exploration. Using BlackWidow, we conduct a study of seven popular services on the Deep and Dark Web across three different languages with almost 100,000 users. Within less than two days of monitoring time, BlackWidow managed to collect years of relevant information in the areas of cyber security and fraud monitoring. We show that BlackWidow can infer relationships between authors and forums and detect trends for cybersecurity-related topics. Finally, we discuss exemplary case studies surrounding leaked data and preparation for malicious activity.

Yang, Ying, Yang, Lina, Yang, Meihong, Yu, Huanhuan, Zhu, Guichun, Chen, Zhenya, Chen, Lijuan.  2019.  Dark web forum correlation analysis research. 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). :1216—1220.

With the rapid development of the Internet, the dark network has also been widely used in the Internet [1]. Due to the anonymity of the dark network, many illegal elements have committed illegal crimes on the dark. It is difficult for law enforcement officials to track the identity of these cyber criminals using traditional network survey techniques based on IP addresses [2]. The threat information is mainly from the dark web forum and the dark web market. In this paper, we introduce the current mainstream dark network communication system TOR and develop a visual dark web forum post association analysis system to graphically display the relationship between various forum messages and posters, and help law enforcement officers to explore deep levels. Clues to analyze crimes in the dark network.

Godawatte, Kithmini, Raza, Mansoor, Murtaza, Mohsin, Saeed, Ather.  2019.  Dark Web Along With The Dark Web Marketing And Surveillance. 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). :483—485.

Cybercrimes and cyber criminals widely use dark web and illegal functionalities of the dark web towards the world crisis. More than half of the criminal activities and the terror activities conducted through the dark web such as, cryptocurrency, selling human organs, red rooms, child pornography, arm deals, drug deals, hire assassins and hackers, hacking software and malware programs, etc. The law enforcement agencies such as FBI, NSA, Interpol, Mossad, FSB etc, are always conducting surveillance programs through the dark web to trace down the mass criminals and terrorists while stopping the crimes and the terror activities. This paper is about the dark web marketing and surveillance programs. In the deep end research will discuss the dark web access with securely and how the law enforcement agencies exponentially tracking down the users with terror behaviours and activities. Moreover, the paper discusses dark web sites which users can grab the dark web jihadist services and anonymous markets including safety precautions.

Yang, Ying, Yu, Huanhuan, Yang, Lina, Yang, Ming, Chen, Lijuan, Zhu, Guichun, Wen, Liqiang.  2019.  Hadoop-based Dark Web Threat Intelligence Analysis Framework. 2019 IEEE 3rd Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). :1088—1091.

With the development of network services and people's privacy requirements continue to increase. On the basis of providing anonymous user communication, it is necessary to protect the anonymity of the server. At the same time, there are many threatening crime messages in the dark network. However, many scholars lack the ability or expertise to conduct research on dark-net threat intelligence. Therefore, this paper designs a framework based on Hadoop is hidden threat intelligence. The framework uses HDFS as the underlying storage system to build a HBase-based distributed database to store and manage threat intelligence information. According to the heterogeneous type of the forum, the web crawler is used to collect data through the anonymous TOR tool. The framework is used to identify the characteristics of key dark network criminal networks, which is the basis for the later dark network research.

2020-01-28
KADOGUCHI, Masashi, HAYASHI, Shota, HASHIMOTO, Masaki, OTSUKA, Akira.  2019.  Exploring the Dark Web for Cyber Threat Intelligence Using Machine Leaning. 2019 IEEE International Conference on Intelligence and Security Informatics (ISI). :200–202.

In recent years, cyber attack techniques are increasingly sophisticated, and blocking the attack is more and more difficult, even if a kind of counter measure or another is taken. In order for a successful handling of this situation, it is crucial to have a prediction of cyber attacks, appropriate precautions, and effective utilization of cyber intelligence that enables these actions. Malicious hackers share various kinds of information through particular communities such as the dark web, indicating that a great deal of intelligence exists in cyberspace. This paper focuses on forums on the dark web and proposes an approach to extract forums which include important information or intelligence from huge amounts of forums and identify traits of each forum using methodologies such as machine learning, natural language processing and so on. This approach will allow us to grasp the emerging threats in cyberspace and take appropriate measures against malicious activities.

2019-05-08
Popov, Oliver, Bergman, Jesper, Valassi, Christian.  2018.  A Framework for a Forensically Sound Harvesting the Dark Web. Proceedings of the Central European Cybersecurity Conference 2018. :13:1–13:7.
The generative and transformative nature of the Internet which has become a synonym for the infrastructure of the contemporary digital society, is also a place where there are unsavoury and illegal activities such as fraud, human trafficking, exchange of control substances, arms smuggling, extremism, and terrorism. The legitimate concerns such as anonymity and privacy are used for proliferation of nefarious deeds in parts of the Internet termed as a deep web and a dark web. The cryptographic and anonymity mechanisms employed by the dark web miscreants create serious problems for the law enforcement agencies and other legal institutions to monitor, control, investigate, prosecute, and prevent the range of criminal events which should not be part of the Internet, and the human society in general. The paper describes the research on developing a framework for identifying, collecting, analysing, and reporting information from the dark web in a forensically sound manner. The framework should provide the fundamentals for creating a real-life system that could be used as a tool by law enforcement institutions, digital forensics researchers and practitioners to explore and study illicit actions and their consequences on the dark web. The design science paradigms is used to develop the framework, while international security and forensic experts are behind the ex-ante evaluation of the basic components and their functionality, the architecture, and the organization of the system. Finally, we discuss the future work concerning the implementation of the framework along with the inducement of some intelligent modules that should empower the tool with adaptability, effectiveness, and efficiency.
2018-09-12
Mattmann, Chris A., Sharan, Madhav.  2017.  Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web. Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. :117–120.

We contribute a scalable, open source implementation of the Pooled Time Series (PoT) algorithm from CVPR 2015. The algorithm is evaluated on approximately 6800 human trafficking (HT) videos collected from the deep and dark web, and on an open dataset: the Human Motion Database (HMDB). We describe PoT and our motivation for using it on larger data and the issues we encountered. Our new solution reimagines PoT as an Apache Hadoop-based algorithm. We demonstrate that our new Hadoop-based algorithm successfully identifies similar videos in the HT and HMDB datasets and we evaluate the algorithm qualitatively and quantitatively.

Kwon, K. Hazel, Priniski, J. Hunter, Sarkar, Soumajyoti, Shakarian, Jana, Shakarian, Paulo.  2017.  Crisis and Collective Problem Solving in Dark Web: An Exploration of a Black Hat Forum. Proceedings of the 8th International Conference on Social Media & Society. :45:1–45:5.

This paper explores the process of collective crisis problem-solving in the darkweb. We conducted a preliminary study on one of the Tor-based darkweb forums, during the shutdown of two marketplaces. Content analysis suggests that distrust permeated the forum during the marketplace shutdowns. We analyzed the debates concerned with suspicious claims and conspiracies. The results suggest that a black-market crisis potentially offers an opportunity for cyber-intelligence to disrupt the darkweb by engendering internal conflicts. At the same time, the study also shows that darkweb members were adept at reaching collective solutions by sharing new market information, more secure technologies, and alternative routes for economic activities.

Sanchez-Rola, Iskander, Balzarotti, Davide, Santos, Igor.  2017.  The Onions Have Eyes: A Comprehensive Structure and Privacy Analysis of Tor Hidden Services. Proceedings of the 26th International Conference on World Wide Web. :1251–1260.

Tor is a well known and widely used darknet, known for its anonymity. However, while its protocol and relay security have already been extensively studied, to date there is no comprehensive analysis of the structure and privacy of its Web Hidden Service. To fill this gap, we developed a dedicated analysis platform and used it to crawl and analyze over 1.5M URLs hosted in 7257 onion domains. For each page we analyzed its links, resources, and redirections graphs, as well as the language and category distribution. According to our experiments, Tor hidden services are organized in a sparse but highly connected graph, in which around 10% of the onions sites are completely isolated. Our study also measures for the first time the tight connection that exists between Tor hidden services and the Surface Web. In fact, more than 20% of the onion domains we visited imported resources from the Surface Web, and links to the Surface Web are even more prevalent than to other onion domains. Finally, we measured for the first time the prevalence and the nature of web tracking in Tor hidden services, showing that, albeit not as widespread as in the Surface Web, tracking is notably present also in the Dark Web: more than 40% of the scripts are used for this purpose, with the 70% of them being completely new tracking scripts unknown by existing anti-tracking solutions.

Celestini, Alessandro, Guarino, Stefano.  2017.  Design, Implementation and Test of a Flexible Tor-oriented Web Mining Toolkit. Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics. :19:1–19:10.

Searching and retrieving information from the Web is a primary activity needed to monitor the development and usage of Web resources. Possible benefits include improving user experience (e.g. by optimizing query results) and enforcing data/user security (e.g. by identifying harmful websites). Motivated by the lack of ready-to-use solutions, in this paper we present a flexible and accessible toolkit for structure and content mining, able to crawl, download, extract and index resources from the Web. While being easily configurable to work in the "surface" Web, our suite is specifically tailored to explore the Tor dark Web, i.e. the ensemble of Web servers composing the world's most famous darknet. Notably, the toolkit is not just a Web scraper, but it includes two mining modules, respectively able to prepare content to be fed to an (external) semantic engine, and to reconstruct the graph structure of the explored portion of the Web. Other than discussing in detail the design, features and performance of our toolkit, we report the findings of a preliminary run over Tor, that clarify the potential of our solution.

Catakoglu, Onur, Balduzzi, Marco, Balzarotti, Davide.  2017.  Attacks Landscape in the Dark Side of the Web. Proceedings of the Symposium on Applied Computing. :1739–1746.

The Dark Web is known as the part of the Internet operated by decentralized and anonymous-preserving protocols like Tor. To date, the research community has focused on understanding the size and characteristics of the Dark Web and the services and goods that are offered in its underground markets. However, little is still known about the attacks landscape in the Dark Web. For the traditional Web, it is now well understood how websites are exploited, as well as the important role played by Google Dorks and automated attack bots to form some sort of "background attack noise" to which public websites are exposed. This paper tries to understand if these basic concepts and components have a parallel in the Dark Web. In particular, by deploying a high interaction honeypot in the Tor network for a period of seven months, we conducted a measurement study of the type of attacks and of the attackers behavior that affect this still relatively unknown corner of the Web.

Lin, Z., Tong, L., Zhijie, M., Zhen, L..  2017.  Research on Cyber Crime Threats and Countermeasures about Tor Anonymous Network Based on Meek Confusion Plug-in. 2017 International Conference on Robots Intelligent System (ICRIS). :246–249.

According to the new Tor network (6.0.5 version) can help the domestic users easily realize "over the wall", and of course criminals may use it to visit deep and dark website also. The paper analyzes the core technology of the new Tor network: the new flow obfuscation technology based on meek plug-in and real instance is used to verify the new Tor network's fast connectivity. On the basis of analyzing the traffic confusion mechanism and the network crime based on Tor, it puts forward some measures to prevent the using of Tor network to implement network crime.

Montieri, A., Ciuonzo, D., Aceto, G., Pescape, A..  2017.  Anonymity Services Tor, I2P, JonDonym: Classifying in the Dark. 2017 29th International Teletraffic Congress (ITC 29). 1:81–89.

Traffic classification, i.e. associating network traffic to the application that generated it, is an important tool for several tasks, spanning on different fields (security, management, traffic engineering, R&D). This process is challenged by applications that preserve Internet users' privacy by encrypting the communication content, and even more by anonymity tools, additionally hiding the source, the destination, and the nature of the communication. In this paper, leveraging a public dataset released in 2017, we provide (repeatable) classification results with the aim of investigating to what degree the specific anonymity tool (and the traffic it hides) can be identified, when compared to the traffic of the other considered anonymity tools, using machine learning approaches based on the sole statistical features. To this end, four classifiers are trained and tested on the dataset: (i) Naïve Bayes, (ii) Bayesian Network, (iii) C4.5, and (iv) Random Forest. Results show that the three considered anonymity networks (Tor, I2P, JonDonym) can be easily distinguished (with an accuracy of 99.99%), telling even the specific application generating the traffic (with an accuracy of 98.00%).

Rahayuda, I. G. S., Santiari, N. P. L..  2017.  Crawling and cluster hidden web using crawler framework and fuzzy-KNN. 2017 5th International Conference on Cyber and IT Service Management (CITSM). :1–7.
Today almost everyone is using internet for daily activities. Whether it's for social, academic, work or business. But only a few of us are aware that internet generally we access only a small part of the overall of internet access. The Internet or the world wide web is divided into several levels, such as web surfaces, deep web or dark web. Accessing internet into deep or dark web is a dangerous thing. This research will be conducted with research on web content and deep content. For a faster and safer search, in this research will be use crawler framework. From the search process will be obtained various kinds of data to be stored into the database. The database classification process will be implemented to know the level of the website. The classification process is done by using the fuzzy-KNN method. The fuzzy-KNN method classifies the results of the crawling framework that contained in the database. Crawling framework will generate data in the form of url address, page info and other. Crawling data will be compared with predefined sample data. The classification result of fuzzy-KNN will result in the data of the web level based on the value of the word specified in the sample data. From the research conducted on several data tests that found there are as much as 20% of the web surface, 7.5% web bergie, 20% deep web, 22.5% charter and 30% dark web. Research is only done on some test data, it is necessary to add some data in order to get better result. Better crawler frameworks can speed up crawling results, especially at certain web levels because not all crawler frameworks can work at a particular web level, the tor browser's can be used but the crawler framework sometimes can not work.
Rafiuddin, M. F. B., Minhas, H., Dhubb, P. S..  2017.  A dark web story in-depth research and study conducted on the dark web based on forensic computing and security in Malaysia. 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI). :3049–3055.
The following is a research conducted on the Dark Web to study and identify the ins and outs of the dark web, what the dark web is all about, the various methods available to access the dark web and many others. The researchers have also included the steps and precautions taken before the dark web was opened. Apart from that, the findings and the website links / URL are also included along with a description of the sites. The primary usage of the dark web and some of the researcher's experience has been further documented in this research paper.
2017-11-03
Zulkarnine, A. T., Frank, R., Monk, B., Mitchell, J., Davies, G..  2016.  Surfacing collaborated networks in dark web to find illicit and criminal content. 2016 IEEE Conference on Intelligence and Security Informatics (ISI). :109–114.
The Tor Network, a hidden part of the Internet, is becoming an ideal hosting ground for illegal activities and services, including large drug markets, financial frauds, espionage, child sexual abuse. Researchers and law enforcement rely on manual investigations, which are both time-consuming and ultimately inefficient. The first part of this paper explores illicit and criminal content identified by prominent researchers in the dark web. We previously developed a web crawler that automatically searched websites on the internet based on pre-defined keywords and followed the hyperlinks in order to create a map of the network. This crawler has demonstrated previous success in locating and extracting data on child exploitation images, videos, keywords and linkages on the public internet. However, as Tor functions differently at the TCP level, and uses socket connections, further technical challenges are faced when crawling Tor. Some of the other inherent challenges for advanced Tor crawling include scalability, content selection tradeoffs, and social obligation. We discuss these challenges and the measures taken to meet them. Our modified web crawler for Tor, termed the “Dark Crawler” has been able to access Tor while simultaneously accessing the public internet. We present initial findings regarding what extremist and terrorist contents are present in Tor and how this content is connected to each other in a mapped network that facilitates dark web crimes. Our results so far indicate the most popular websites in the dark web are acting as catalysts for dark web expansion by providing necessary knowledgebase, support and services to build Tor hidden services and onion websites.
Park, A. J., Beck, B., Fletche, D., Lam, P., Tsang, H. H..  2016.  Temporal analysis of radical dark web forum users. 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). :880–883.
Extremist groups have turned to the Internet and social media sites as a means of sharing information amongst one another. This research study analyzes forum posts and finds people who show radical tendencies through the use of natural language processing and sentiment analysis. The forum data being used are from six Islamic forums on the Dark Web which are made available for security research. This research project uses a POS tagger to isolate keywords and nouns that can be utilized with the sentiment analysis program. Then the sentiment analysis program determines the polarity of the post. The post is scored as either positive or negative. These scores are then divided into monthly radical scores for each user. Once these time clusters are mapped, the change in opinions of the users over time may be interpreted as rising or falling levels of radicalism. Each user is then compared on a timeline to other radical users and events to determine possible connections or relationships. The ability to analyze a forum for an overall change in attitude can be an indicator of unrest and possible radical actions or terrorism.