Visible to the public Dynamic Functional Dependency Discovery with Dynamic Hitting Set Enumeration

TitleDynamic Functional Dependency Discovery with Dynamic Hitting Set Enumeration
Publication TypeConference Paper
Year of Publication2022
AuthorsXiao, Renjie, Yuan, Yong'an, Tan, Zijing, Ma, Shuai, Wang, Wei
Conference Name2022 IEEE 38th International Conference on Data Engineering (ICDE)
Date Publishedmay
KeywordsConferences, data deletion, data dependency, Data engineering, Data processing, Data profiling, Functional Dependency, Heuristic algorithms, pubcrawl, Scalability, Semantic Web, Task Analysis
AbstractFunctional dependencies (FDs) are widely applied in data management tasks. Since FDs on data are usually unknown, FD discovery techniques are studied for automatically finding hidden FDs from data. In this paper, we develop techniques to dynamically discover FDs in response to changes on data. Formally, given the complete set S of minimal and valid FDs on a relational instance r, we aim to find the complete set S$^\textrm\textbackslashprime$ of minimal and valid FDs on roplus\textbackslashDelta r, where \textbackslashDelta r is a set of tuple insertions and deletions. Different from the batch approaches that compute S$^\textrm\textbackslashprime$ on roplus\textbackslashDelta r from scratch, our dynamic method computes S$^\textrm\textbackslashprime$ in response to \textbackslashtriangle\textbackslashuparrow. by leveraging the known S on r, and avoids processing the whole of r for each update from \textbackslashDelta r. We tackle dynamic FD discovery on roplus\textbackslashDelta r by dynamic hitting set enumeration on the difference-set of roplus\textbackslashDelta r. Specifically, (1) leveraging auxiliary structures built on r, we first present an efficient algorithm to update the difference-set of r to that of roplus\textbackslashDelta r. (2) We then compute S$^\textrm\textbackslashprime$, by recasting dynamic FD discovery as dynamic hitting set enumeration on the difference-set of roplus\textbackslashDelta r and developing novel techniques for dynamic hitting set enumeration. (3) We finally experimentally verify the effectiveness and efficiency of our approaches, using real-life and synthetic data. The results show that our dynamic FD discovery method outperforms the batch counterparts on most tested data, even when \textbackslashDelta r is up to 30 % of r.
DOI10.1109/ICDE53745.2022.00026
Citation Keyxiao_dynamic_2022