Title | Datafusion: Taking Source Confidences into Account |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Ahmed, Alaa H., Sadri, Fereidoon |
Conference Name | Proceedings of the 8th International Conference on Information Systems and Technologies |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-6404-1 |
Keywords | composability, data fusion, fusion precision, pubcrawl, source confidence, source trustworthiness, trustworthiness |
Abstract | Data fusion is a form of information integration where large amounts of data mined from sources such as web sites, Twitter feeds, Facebook postings, blogs, email messages, news streams, and the like are integrated. Such data is inherently uncertain and unreliable. The sources have different degrees of accuracy and the data mining process itself incurs additional uncertainty. The main goal of data fusion is to discover the correct data among the uncertain and possibly conflicting mined data. We investigate a data fusion approach that, in addition to the accuracy of sources, incorporates the correctness (confidence) measures that most data mining approaches associate with mined data. There are a number of advantages in incorporating these confidences. First, we do not require a training set. The initial training set is obtained using the confidence measures. More importantly, a more accurate fusion can result by taking the confidences into account. We present an approach to determine the correctness threshold using users' feedback, and show it can significantly improve the accuracy of data fusion. We evaluate of the performance and accuracy of our data fusion approach for two groups of experiments. In the first group data sources contain random (unintentional) errors. In the second group data sources contain intentional falsifications. |
URL | http://doi.acm.org/10.1145/3200842.3200854 |
DOI | 10.1145/3200842.3200854 |
Citation Key | ahmed_datafusion:_2018 |