Visible to the public A Collective, Probabilistic Approach to Schema Mapping

TitleA Collective, Probabilistic Approach to Schema Mapping
Publication TypeConference Paper
Year of Publication2017
AuthorsKimmig, A., Memory, A., Miller, R. J., Getoor, L.
Conference Name2017 IEEE 33rd International Conference on Data Engineering (ICDE)
KeywordsCMD, Cognition, collective mapping discovery, collective probabilistic schema mapping approach, Companies, compositionality, Conferences, Data engineering, data mining, data structures, inference mechanisms, mapping selection, meta data, metadata, metadata constraints, Metadata Discovery Problem, metadata-only approaches, Optimization, Probabilistic logic, probabilistic reasoning, probabilistic reasoning techniques, probability, pubcrawl, Resiliency, Scalability, schema mapping optimization problem, uncertainty handling
Abstract

We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of mapping selection, that is, choosing the best mapping from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures interactions between mappings. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using stateof- the-art probabilistic reasoning techniques, which allows for inconsistencies and incompleteness. Using hundreds of realistic integration scenarios, we demonstrate that the accuracy of CMD is more than 33% above that of metadata-only approaches already for small data examples, and that CMD routinely finds perfect mappings even if a quarter of the data is inconsistent.

DOI10.1109/ICDE.2017.140
Citation Keykimmig_collective_2017