Visible to the public Biblio

Filters: Keyword is social web  [Clear All Filters]
2020-07-10
Koloveas, Paris, Chantzios, Thanasis, Tryfonopoulos, Christos, Skiadopoulos, Spiros.  2019.  A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence. 2019 IEEE World Congress on Services (SERVICES). 2642-939X:3—8.

The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state-of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness.

2017-11-03
Collarana, Diego, Lange, Christoph, Auer, Sören.  2016.  FuhSen: A Platform for Federated, RDF-based Hybrid Search. Proceedings of the 25th International Conference Companion on World Wide Web. :171–174.
The increasing amount of structured and semi-structured information available on the Web and in distributed information systems, as well as the Web's diversification into different segments such as the Social Web, the Deep Web, or the Dark Web, requires new methods for horizontal search. FuhSen is a federated, RDF-based, hybrid search platform that searches, integrates and summarizes information about entities from distributed heterogeneous information sources using Linked Data. As a use case, we present scenarios where law enforcement institutions search and integrate data spread across these different Web segments to identify cases of organized crime. We present the architecture and implementation of FuhSen and explain the queries that can be addressed with this new approach.