Visible to the public Mathematical Formulation and Implementation of Query Inversion Techniques in RDBMS for Tracking Data Provenance

TitleMathematical Formulation and Implementation of Query Inversion Techniques in RDBMS for Tracking Data Provenance
Publication TypeConference Paper
Year of Publication2019
AuthorsTabassum, Anika, Nady, Anannya Islam, Rezwanul Huq, Mohammad
Conference Name2019 7th International Conference on Information and Communication Technology (ICoICT)
KeywordsBiology, composability, data origin, data provenance, data provenance finding, data provenance tracking, Data-Intensive Applications, database, Databases, finance, Flowcharts, History, Human Behavior, inverse queries, inversion queries, mathematical formulation, mathematical formulations, Metrics, Physics, Provenance, pubcrawl, query inversion techniques, query processing, RDBMS, relational algebra, relational algebra operations, relational database management system, relational databases, Resiliency, Springs, unexpected results
AbstractNowadays the massive amount of data is produced from different sources and lots of applications are processing these data to discover insights. Sometimes we may get unexpected results from these applications and it is not feasible to trace back to the data origin manually to find the source of errors. To avoid this problem, data must be accompanied by the context of how they are processed and analyzed. Especially, data-intensive applications like e-Science always require transparency and therefore, we need to understand how data has been processed and transformed. In this paper, we propose mathematical formulation and implementation of query inversion techniques to trace the provenance of data in a relational database management system (RDBMS). We build mathematical formulations of inverse queries for most of the relational algebra operations and show the formula for join operations in this paper. We, then, implement these formulas of inversion techniques and the experiment shows that our proposed inverse queries can successfully trace back to original data i.e. finding data provenance.
DOI10.1109/ICoICT.2019.8835290
Citation Keytabassum_mathematical_2019