Biblio
Cyber threat intelligence (CTI) is vital for enabling effective cybersecurity decisions by providing timely, relevant, and actionable information about emerging threats. Monitoring the dark web to generate CTI is one of the upcoming trends in cybersecurity. As a result, developing CTI capabilities with the dark web investigation is a significant focus for cybersecurity companies like Deepwatch, DarkOwl, SixGill, ThreatConnect, CyLance, ZeroFox, and many others. In addition, the dark web marketplace (DWM) monitoring tools are of much interest to law enforcement agencies (LEAs). The fact that darknet market participants operate anonymously and online transactions are pseudo-anonymous makes it challenging to identify and investigate them. Therefore, keeping up with the DWMs poses significant challenges for LEAs today. Nevertheless, the offerings on the DWM give insights into the dark web economy to LEAs. The present work is one such attempt to describe and analyze dark web market data collected for CTI using a dark web crawler. After processing and labeling, authors have 53 DWMs with their product listings and pricing.
Web Scraping is the technique of extracting desired data in an automated way by scanning the internal links and content of a website, this activity usually performed by systematically programmed bots. This paper explains our proposed solution to protect the blog content from theft and from being copied to other destinations by mitigating the scraping bots. To achieve our purpose we applied two steps in two levels, the first one, on the main blog page level, mitigated the work of crawler bots by adding extra empty articles anchors among real articles, and the next step, on the article page level, we add a random number of empty and hidden spans with randomly generated text among the article's body. To assess this solution we apply it to a local project developed using PHP language in Laravel framework, and put four criteria that measure the effectiveness. The results show that the changes in the file size before and after the application do not affect it, also, the processing time increased by few milliseconds which still in the acceptable range. And by using the HTML-similarity tool we get very good results that show the symmetric over style, with a few bit changes over the structure. Finally, to assess the effects on the bots, scraper bot reused and get the expected results from the programmed middleware. These results show that the solution is feasible to be adopted and use to protect blogs content.
We investigate what we call the "Bitcoin Generator Scam" (BGS), a simple system in which the scammers promise to "generate" new bitcoins using the ones that were sent to them. A typical offer will suggest that, for a small fee, one could receive within minutes twice the amount of bitcoins submitted. BGS is clearly not a very sophisticated attack. The modus operandi is simply to put up some web page on which to find the address to send the money and wait for the payback. The pages are then indexed by search engines, and ready to find for victims looking for free bitcoins. We describe here a generic system to find and analyze scams such as BGS. We have trained a classifier to detect these pages, and we have a crawler searching for instances using a series of search engines. We then monitor the instances that we find to trace payments and bitcoin addresses that are being used over time. Unlike most bitcoin-based scam monitoring systems, we do not rely on analyzing transactions on the blockchain to find scam instances. Instead, we proactively find these instances through the web pages advertising the scam. Thus our system is able to find addresses with very few transactions, or even none at all. Indeed, over half of the addresses that have eventually received funds were detected before receiving any transactions. The data for this paper was collected over four months, from November 2019 to February 2020. We have found more than 1,300 addresses directly associated with the scam, hosted on over 500 domains. Overall, these addresses have received (at least) over 5 million USD to the scam, with an average of 47.3 USD per transaction.
Malicious domain names are consistently changing. It is challenging to keep blacklists of malicious domain names up-to-date because of the time lag between its creation and detection. Even if a website is clean itself, it does not necessarily mean that it won't be used as a pivot point to redirect users to malicious destinations. To address this issue, this paper demonstrates how to use linkage analysis and open-source threat intelligence to visualize the relationship of malicious domain names whilst verifying their categories, i.e., drive-by download, unwanted software etc. Featured by a graph-based model that could present the inter-connectivity of malicious domain names in a dynamic fashion, the proposed approach proved to be helpful for revealing the group patterns of different kinds of malicious domain names. When applied to analyze a blacklisted set of URLs in a real enterprise network, it showed better effectiveness than traditional methods and yielded a clearer view of the common patterns in the data.
The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state-of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness.
This paper studies the principle of vulnerability generation and mechanism of cross-site scripting attack, designs a dynamic cross-site scripting vulnerabilities detection technique based on existing theories of black box vulnerabilities detection. The dynamic detection process contains five steps: crawler, feature construct, attacks simulation, results detection and report generation. Crawling strategy in crawler module and constructing algorithm in feature construct module are key points of this detection process. Finally, according to the detection technique proposed in this paper, a detection tool is accomplished in Linux using python language to detect web applications. Experiments were launched to verify the results and compare with the test results of other existing tools, analyze the usability, advantages and disadvantages of the detection method above, confirm the feasibility of applying dynamic detection technique to cross-site scripting vulnerabilities detection.
SQL injection is well known a method of executing SQL queries and retrieving sensitive information from a website connected database. This process poses a threat to those applications which are poorly coded in the today's world. SQL is considered as one of the top 10 vulnerabilities even in 2018. To keep a track of the vulnerabilities that each of the websites are facing, we employ a tool called Acunetix which allows us to find the vulnerabilities of a specific website. This tool also suggests measures on how to ensure preventive measures. Using this implementation, we discover vulnerabilities in an actual website. Such a real-world implementation would be useful for instructional use in a foundational cybersecurity course.
Phishers often exploit users' trust on the appearance of a site by using webpages that are visually similar to an authentic site. In the past, various research studies have tried to identify and classify the factors contributing towards the detection of phishing websites. The focus of this research is to establish a strong relationship between those identified heuristics (content-based) and the legitimacy of a website by analyzing training sets of websites (both phishing and legitimate websites) and in the process analyze new patterns and report findings. Many existing phishing detection tools are often not very accurate as they depend mostly on the old database of previously identified phishing websites. However, there are thousands of new phishing websites appearing every year targeting financial institutions, cloud storage/file hosting sites, government websites, and others. This paper presents a framework called Phishing-Detective that detects phishing websites based on existing and newly found heuristics. For this framework, a web crawler was developed to scrape the contents of phishing and legitimate websites. These contents were analyzed to rate the heuristics and their contribution scale factor towards the illegitimacy of a website. The data set collected from Web Scraper was then analyzed using a data mining tool to find patterns and report findings. A case study shows how this framework can be used to detect a phishing website. This research is still in progress but shows a new way of finding and using heuristics and the sum of their contributing weights to effectively and accurately detect phishing websites. Further development of this framework is discussed at the end of the paper.
Attacks against websites are increasing rapidly with the expansion of web services. An increasing number of diversified web services make it difficult to prevent such attacks due to many known vulnerabilities in websites. To overcome this problem, it is necessary to collect the most recent attacks using decoy web honeypots and to implement countermeasures against malicious threats. Web honeypots collect not only malicious accesses by attackers but also benign accesses such as those by web search crawlers. Thus, it is essential to develop a means of automatically identifying malicious accesses from mixed collected data including both malicious and benign accesses. Specifically, detecting vulnerability scanning, which is a preliminary process, is important for preventing attacks. In this study, we focused on classification of accesses for web crawling and vulnerability scanning since these accesses are too similar to be identified. We propose a feature vector including features of collective accesses, e.g., intervals of request arrivals and the dispersion of source port numbers, obtained with multiple honeypots deployed in different networks for classification. Through evaluation using data collected from 37 honeypots in a real network, we show that features of collective accesses are advantageous for vulnerability scanning and crawler classification.