Visible to the public Biblio

Filters: Keyword is RDF  [Clear All Filters]
2017-11-03
Collarana, Diego, Lange, Christoph, Auer, Sören.  2016.  FuhSen: A Platform for Federated, RDF-based Hybrid Search. Proceedings of the 25th International Conference Companion on World Wide Web. :171–174.
The increasing amount of structured and semi-structured information available on the Web and in distributed information systems, as well as the Web's diversification into different segments such as the Social Web, the Deep Web, or the Dark Web, requires new methods for horizontal search. FuhSen is a federated, RDF-based, hybrid search platform that searches, integrates and summarizes information about entities from distributed heterogeneous information sources using Linked Data. As a use case, we present scenarios where law enforcement institutions search and integrate data spread across these different Web segments to identify cases of organized crime. We present the architecture and implementation of FuhSen and explain the queries that can be addressed with this new approach.
2017-05-30
Cuzzocrea, Alfredo, Pirrò, Giuseppe.  2016.  A Semantic-web-technology-based Framework for Supporting Knowledge-driven Digital Forensics. Proceedings of the 8th International Conference on Management of Digital EcoSystems. :58–66.

The usage of Information and Communication Technologies (ICTs) pervades everyday's life. If it is true that ICT contributed to improve the quality of our life, it is also true that new forms of (cyber)crime have emerged in this setting. The diversity and amount of information forensic investigators need to cope with, when tackling a cyber-crime case, call for tools and techniques where knowledge is the main actor. Current approaches leave to the investigator the chore of integrating the diverse sources of evidence relevant for a case thus hindering the automatic generation of reusable knowledge. This paper describes an architecture that lifts the classical phases of a digital forensic investigation to a knowledge-driven setting. We discuss how the usage of languages and technologies originating from the Semantic Web proposal can complement digital forensics tools so that knowledge becomes a first-class citizen. Our architecture enables to perform in an integrated way complex forensic investigations and, as a by-product, build a knowledge base that can be consulted to gain insights from previous cases. Our proposal has been inspired by real-world scenarios emerging in the context of an Italian research project about cyber security.

2017-03-07
Farid, Mina, Roatis, Alexandra, Ilyas, Ihab F., Hoffmann, Hella-Franziska, Chu, Xu.  2016.  CLAMS: Bringing Quality to Data Lakes. Proceedings of the 2016 International Conference on Management of Data. :2089–2092.

With the increasing incentive of enterprises to ingest as much data as they can in what is commonly referred to as "data lakes", and with the recent development of multiple technologies to support this "load-first" paradigm, the new environment presents serious data management challenges. Among them, the assessment of data quality and cleaning large volumes of heterogeneous data sources become essential tasks in unveiling the value of big data. The coveted use of unstructured and semi-structured data in large volumes makes current data cleaning tools (primarily designed for relational data) not directly adoptable. We present CLAMS, a system to discover and enforce expressive integrity constraints from large amounts of lake data with very limited schema information (e.g., represented as RDF triples). This demonstration shows how CLAMS is able to discover the constraints and the schemas they are defined on simultaneously. CLAMS also introduces a scale-out solution to efficiently detect errors in the raw data. CLAMS interacts with human experts to both validate the discovered constraints and to suggest data repairs. CLAMS has been deployed in a real large-scale enterprise data lake and was experimented with a real data set of 1.2 billion triples. It has been able to spot multiple obscure data inconsistencies and errors early in the data processing stack, providing huge value to the enterprise.