Visible to the public Biblio

Filters: Author is Tan, Zijing  [Clear All Filters]
2023-06-16
Xiao, Renjie, Yuan, Yong'an, Tan, Zijing, Ma, Shuai, Wang, Wei.  2022.  Dynamic Functional Dependency Discovery with Dynamic Hitting Set Enumeration. 2022 IEEE 38th International Conference on Data Engineering (ICDE). :286—298.
Functional dependencies (FDs) are widely applied in data management tasks. Since FDs on data are usually unknown, FD discovery techniques are studied for automatically finding hidden FDs from data. In this paper, we develop techniques to dynamically discover FDs in response to changes on data. Formally, given the complete set Σ of minimal and valid FDs on a relational instance r, we aim to find the complete set Σ$^\textrm\textbackslashprime$ of minimal and valid FDs on røplus\textbackslashDelta r, where \textbackslashDelta r is a set of tuple insertions and deletions. Different from the batch approaches that compute Σ$^\textrm\textbackslashprime$ on røplus\textbackslashDelta r from scratch, our dynamic method computes Σ$^\textrm\textbackslashprime$ in response to \textbackslashtriangle\textbackslashuparrow. by leveraging the known Σ on r, and avoids processing the whole of r for each update from \textbackslashDelta r. We tackle dynamic FD discovery on røplus\textbackslashDelta r by dynamic hitting set enumeration on the difference-set of røplus\textbackslashDelta r. Specifically, (1) leveraging auxiliary structures built on r, we first present an efficient algorithm to update the difference-set of r to that of røplus\textbackslashDelta r. (2) We then compute Σ$^\textrm\textbackslashprime$, by recasting dynamic FD discovery as dynamic hitting set enumeration on the difference-set of røplus\textbackslashDelta r and developing novel techniques for dynamic hitting set enumeration. (3) We finally experimentally verify the effectiveness and efficiency of our approaches, using real-life and synthetic data. The results show that our dynamic FD discovery method outperforms the batch counterparts on most tested data, even when \textbackslashDelta r is up to 30 % of r.