Visible to the public Linking Deutsche Bundesbank Company Data Using Machine-Learning-Based Classification: Extended Abstract

TitleLinking Deutsche Bundesbank Company Data Using Machine-Learning-Based Classification: Extended Abstract
Publication TypeConference Paper
Year of Publication2016
AuthorsSchild, Christopher-J., Schultz, Simone
Conference NameProceedings of the Second International Workshop on Data Science for Macro-Modeling
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4407-4
KeywordsCompany Data, Data integration, pubcrawl170201, Record Linkage, Semi-Supervised Classification
Abstract

We present a process of linking various Deutsche Bundesbank datasources on companies based on a semi-automatic classification. The linkage process involves data cleaning and harmonization, blocking, construction of comparison features, as well as training and testing a statistical classification model on a "ground-truth" subset of known matches and non-matches. The evaluation of our method shows that the process limits the need for manual classifications to a small percentage of ambiguously classified match candidates.

URLhttp://doi.acm.org/10.1145/2951894.2951896
DOI10.1145/2951894.2951896
Citation Keyschild_linking_2016