Visible to the public Biblio

Filters: Author is Sadri, Fereidoon  [Clear All Filters]
2019-03-11
Ahmed, Alaa H., Sadri, Fereidoon.  2018.  Datafusion: Taking Source Confidences into Account. Proceedings of the 8th International Conference on Information Systems and Technologies. :9:1–9:6.
Data fusion is a form of information integration where large amounts of data mined from sources such as web sites, Twitter feeds, Facebook postings, blogs, email messages, news streams, and the like are integrated. Such data is inherently uncertain and unreliable. The sources have different degrees of accuracy and the data mining process itself incurs additional uncertainty. The main goal of data fusion is to discover the correct data among the uncertain and possibly conflicting mined data. We investigate a data fusion approach that, in addition to the accuracy of sources, incorporates the correctness (confidence) measures that most data mining approaches associate with mined data. There are a number of advantages in incorporating these confidences. First, we do not require a training set. The initial training set is obtained using the confidence measures. More importantly, a more accurate fusion can result by taking the confidences into account. We present an approach to determine the correctness threshold using users' feedback, and show it can significantly improve the accuracy of data fusion. We evaluate of the performance and accuracy of our data fusion approach for two groups of experiments. In the first group data sources contain random (unintentional) errors. In the second group data sources contain intentional falsifications.