Linking Deutsche Bundesbank Company Data Using Machine-Learning-Based Classification: Extended Abstract
Title | Linking Deutsche Bundesbank Company Data Using Machine-Learning-Based Classification: Extended Abstract |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Schild, Christopher-J., Schultz, Simone |
Conference Name | Proceedings of the Second International Workshop on Data Science for Macro-Modeling |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4407-4 |
Keywords | Company Data, Data integration, pubcrawl170201, Record Linkage, Semi-Supervised Classification |
Abstract | We present a process of linking various Deutsche Bundesbank datasources on companies based on a semi-automatic classification. The linkage process involves data cleaning and harmonization, blocking, construction of comparison features, as well as training and testing a statistical classification model on a "ground-truth" subset of known matches and non-matches. The evaluation of our method shows that the process limits the need for manual classifications to a small percentage of ambiguously classified match candidates. |
URL | http://doi.acm.org/10.1145/2951894.2951896 |
DOI | 10.1145/2951894.2951896 |
Citation Key | schild_linking_2016 |