Title | Dynamic Functional Dependency Discovery with Dynamic Hitting Set Enumeration |
Publication Type | Conference Paper |
Year of Publication | 2022 |
Authors | Xiao, Renjie, Yuan, Yong'an, Tan, Zijing, Ma, Shuai, Wang, Wei |
Conference Name | 2022 IEEE 38th International Conference on Data Engineering (ICDE) |
Date Published | may |
Keywords | Conferences, data deletion, data dependency, Data engineering, Data processing, Data profiling, Functional Dependency, Heuristic algorithms, pubcrawl, Scalability, Semantic Web, Task Analysis |
Abstract | Functional dependencies (FDs) are widely applied in data management tasks. Since FDs on data are usually unknown, FD discovery techniques are studied for automatically finding hidden FDs from data. In this paper, we develop techniques to dynamically discover FDs in response to changes on data. Formally, given the complete set S of minimal and valid FDs on a relational instance r, we aim to find the complete set S$^\textrm\textbackslashprime$ of minimal and valid FDs on roplus\textbackslashDelta r, where \textbackslashDelta r is a set of tuple insertions and deletions. Different from the batch approaches that compute S$^\textrm\textbackslashprime$ on roplus\textbackslashDelta r from scratch, our dynamic method computes S$^\textrm\textbackslashprime$ in response to \textbackslashtriangle\textbackslashuparrow. by leveraging the known S on r, and avoids processing the whole of r for each update from \textbackslashDelta r. We tackle dynamic FD discovery on roplus\textbackslashDelta r by dynamic hitting set enumeration on the difference-set of roplus\textbackslashDelta r. Specifically, (1) leveraging auxiliary structures built on r, we first present an efficient algorithm to update the difference-set of r to that of roplus\textbackslashDelta r. (2) We then compute S$^\textrm\textbackslashprime$, by recasting dynamic FD discovery as dynamic hitting set enumeration on the difference-set of roplus\textbackslashDelta r and developing novel techniques for dynamic hitting set enumeration. (3) We finally experimentally verify the effectiveness and efficiency of our approaches, using real-life and synthetic data. The results show that our dynamic FD discovery method outperforms the batch counterparts on most tested data, even when \textbackslashDelta r is up to 30 % of r. |
DOI | 10.1109/ICDE53745.2022.00026 |
Citation Key | xiao_dynamic_2022 |