Biblio
Internet technology has made surveillance widespread and access to resources at greater ease than ever before. This implied boon has countless advantages. It however makes protecting privacy more challenging for the greater masses, and for the few hacktivists, supplies anonymity. The ever-increasing frequency and scale of cyber-attacks has not only crippled private organizations but has also left Law Enforcement Agencies(LEA's) in a fix: as data depicts a surge in cases relating to cyber-bullying, ransomware attacks; and the force not having adequate manpower to tackle such cases on a more microscopic level. The need is for a tool, an automated assistant which will help the security officers cut down precious time needed in the very first phase of information gathering: reconnaissance. Confronting the surface web along with the deep and dark web is not only a tedious job but which requires documenting the digital footprint of the perpetrator and identifying any Indicators of Compromise(IOC's). TORSION which automates web reconnaissance using the Open Source Intelligence paradigm, extracts the metadata from popular indexed social sites and un-indexed dark web onion sites, provided it has some relating Intel on the target. TORSION's workflow allows account matching from various top indexed sites, generating a dossier on the target, and exporting the collected metadata to a PDF file which can later be referenced.
Onion sites on the darkweb operate using the Tor Hidden Service (HS) protocol to shield their locations on the Internet, which (among other features) enables these sites to host malicious and illegal content while being resistant to legal action and seizure. Identifying and monitoring such illicit sites in the darkweb is of high relevance to the Computer Security and Law Enforcement communities. We have developed an automated infrastructure that crawls and indexes content from onion sites into a large-scale data repository, called LIGHTS, with over 100M pages. In this paper we describe Automated Tool for Onion Labeling (ATOL), a novel scalable analysis service developed to conduct a thematic assessment of the content of onion sites in the LIGHTS repository. ATOL has three core components – (a) a novel keyword discovery mechanism (ATOLKeyword) which extends analyst-provided keywords for different categories by suggesting new descriptive and discriminative keywords that are relevant for the categories; (b) a classification framework (ATOLClassify) that uses the discovered keywords to map onion site content to a set of categories when sufficient labeled data is available; (c) a clustering framework (ATOLCluster) that can leverage information from multiple external heterogeneous knowledge sources, ranging from domain expertise to Bitcoin transaction data, to categorize onion content in the absence of sufficient supervised data. The paper presents empirical results of ATOL on onion datasets derived from the LIGHTS repository, and additionally benchmarks ATOL's algorithms on the publicly available 20 Newsgroups dataset to demonstrate the reproducibility of its results. On the LIGHTS dataset, ATOLClassify gives a 12% performance gain over an analyst-provided baseline, while ATOLCluster gives a 7% improvement over state-of-the-art semi-supervised clustering algorithms. We also discuss how ATOL has been deployed and externally evaluated, as part of the LIGHTS system.