Visible to the public Datafusion: Taking Source Confidences into Account

TitleDatafusion: Taking Source Confidences into Account
Publication TypeConference Paper
Year of Publication2018
AuthorsAhmed, Alaa H., Sadri, Fereidoon
Conference NameProceedings of the 8th International Conference on Information Systems and Technologies
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-6404-1
Keywordscomposability, data fusion, fusion precision, pubcrawl, source confidence, source trustworthiness, trustworthiness
AbstractData fusion is a form of information integration where large amounts of data mined from sources such as web sites, Twitter feeds, Facebook postings, blogs, email messages, news streams, and the like are integrated. Such data is inherently uncertain and unreliable. The sources have different degrees of accuracy and the data mining process itself incurs additional uncertainty. The main goal of data fusion is to discover the correct data among the uncertain and possibly conflicting mined data. We investigate a data fusion approach that, in addition to the accuracy of sources, incorporates the correctness (confidence) measures that most data mining approaches associate with mined data. There are a number of advantages in incorporating these confidences. First, we do not require a training set. The initial training set is obtained using the confidence measures. More importantly, a more accurate fusion can result by taking the confidences into account. We present an approach to determine the correctness threshold using users' feedback, and show it can significantly improve the accuracy of data fusion. We evaluate of the performance and accuracy of our data fusion approach for two groups of experiments. In the first group data sources contain random (unintentional) errors. In the second group data sources contain intentional falsifications.
URLhttp://doi.acm.org/10.1145/3200842.3200854
DOI10.1145/3200842.3200854
Citation Keyahmed_datafusion:_2018