Biblio

List
Filter

Found 7 results

Filters: Keyword is Data engineering [Clear All Filters]

2023-06-16

Xiao, Renjie, Yuan, Yong'an, Tan, Zijing, Ma, Shuai, Wang, Wei. 2022. Dynamic Functional Dependency Discovery with Dynamic Hitting Set Enumeration. 2022 IEEE 38th International Conference on Data Engineering (ICDE). :286—298.

Functional dependencies (FDs) are widely applied in data management tasks. Since FDs on data are usually unknown, FD discovery techniques are studied for automatically finding hidden FDs from data. In this paper, we develop techniques to dynamically discover FDs in response to changes on data. Formally, given the complete set Σ of minimal and valid FDs on a relational instance r, we aim to find the complete set Σ$^\textrm\textbackslashprime$ of minimal and valid FDs on røplus\textbackslashDelta r, where \textbackslashDelta r is a set of tuple insertions and deletions. Different from the batch approaches that compute Σ$^\textrm\textbackslashprime$ on røplus\textbackslashDelta r from scratch, our dynamic method computes Σ$^\textrm\textbackslashprime$ in response to \textbackslashtriangle\textbackslashuparrow. by leveraging the known Σ on r, and avoids processing the whole of r for each update from \textbackslashDelta r. We tackle dynamic FD discovery on røplus\textbackslashDelta r by dynamic hitting set enumeration on the difference-set of røplus\textbackslashDelta r. Specifically, (1) leveraging auxiliary structures built on r, we first present an efficient algorithm to update the difference-set of r to that of røplus\textbackslashDelta r. (2) We then compute Σ$^\textrm\textbackslashprime$, by recasting dynamic FD discovery as dynamic hitting set enumeration on the difference-set of røplus\textbackslashDelta r and developing novel techniques for dynamic hitting set enumeration. (3) We finally experimentally verify the effectiveness and efficiency of our approaches, using real-life and synthetic data. The results show that our dynamic FD discovery method outperforms the batch counterparts on most tested data, even when \textbackslashDelta r is up to 30 % of r.

2023-03-31

Soderi, Mirco, Kamath, Vignesh, Breslin, John G.. 2022. A Demo of a Software Platform for Ubiquitous Big Data Engineering, Visualization, and Analytics, via Reconfigurable Micro-Services, in Smart Factories. 2022 IEEE International Conference on Smart Computing (SMARTCOMP). :1–3.

Intelligent, smart, Cloud, reconfigurable manufac-turing, and remote monitoring, all intersect in modern industry and mark the path toward more efficient, effective, and sustain-able factories. Many obstacles are found along the path, including legacy machineries and technologies, security issues, and software that is often hard, slow, and expensive to adapt to face unforeseen challenges and needs in this fast-changing ecosystem. Light-weight, portable, loosely coupled, easily monitored, variegated software components, supporting Edge, Fog and Cloud computing, that can be (re)created, (re)configured and operated from remote through Web requests in a matter of milliseconds, and that rely on libraries of ready-to-use tasks also extendable from remote through sub-second Web requests, constitute a fertile technological ground on top of which fourth-generation industries can be built. In this demo it will be shown how starting from a completely virgin Docker Engine, it is possible to build, configure, destroy, rebuild, operate, exclusively from remote, exclusively via API calls, computation networks that are capable to (i) raise alerts based on configured thresholds or trained ML models, (ii) transform Big Data streams, (iii) produce and persist Big Datasets on the Cloud, (iv) train and persist ML models on the Cloud, (v) use trained models for one-shot or stream predictions, (vi) produce tabular visualizations, line plots, pie charts, histograms, at real-time, from Big Data streams. Also, it will be shown how easily such computation networks can be upgraded with new functionalities at real-time, from remote, via API calls.

ISSN: 2693-8340

2022-07-15

Fan, Wenqi, Derr, Tyler, Zhao, Xiangyu, Ma, Yao, Liu, Hui, Wang, Jianping, Tang, Jiliang, Li, Qing. 2021. Attacking Black-box Recommendations via Copying Cross-domain User Profiles. 2021 IEEE 37th International Conference on Data Engineering (ICDE). :1583—1594.

Recommender systems, which aim to suggest personalized lists of items for users, have drawn a lot of attention. In fact, many of these state-of-the-art recommender systems have been built on deep neural networks (DNNs). Recent studies have shown that these deep neural networks are vulnerable to attacks, such as data poisoning, which generate fake users to promote a selected set of items. Correspondingly, effective defense strategies have been developed to detect these generated users with fake profiles. Thus, new strategies of creating more ‘realistic’ user profiles to promote a set of items should be investigated to further understand the vulnerability of DNNs based recommender systems. In this work, we present a novel framework CopyAttack. It is a reinforcement learning based black-box attacking method that harnesses real users from a source domain by copying their profiles into the target domain with the goal of promoting a subset of items. CopyAttack is constructed to both efficiently and effectively learn policy gradient networks that first select, then further refine/craft user profiles from the source domain, and ultimately copy them into the target domain. CopyAttack’s goal is to maximize the hit ratio of the targeted items in the Top-k recommendation list of the users in the target domain. We conducted experiments on two real-world datasets and empirically verified the effectiveness of the proposed framework. The implementation of CopyAttack is available at https://github.com/wenqifan03/CopyAttack.

2022-05-03

Zeighami, Sepanta, Ghinita, Gabriel, Shahabi, Cyrus. 2021. Secure Dynamic Skyline Queries Using Result Materialization. 2021 IEEE 37th International Conference on Data Engineering (ICDE). :157—168.

Skyline computation is an increasingly popular query, with broad applicability to many domains. Given the trend to outsource databases, and due to the sensitive nature of the data (e.g., in healthcare), it is essential to evaluate skylines on encrypted datasets. Research efforts acknowledged the importance of secure skyline computation, but existing solutions suffer from several shortcomings: (i) they only provide ad-hoc security; (ii) they are prohibitively expensive; or (iii) they rely on assumptions such as the presence of multiple non-colluding parties in the protocol. Inspired by solutions for secure nearest-neighbors, we conjecture that a secure and efficient way to compute skylines is through result materialization. However, materialization is much more challenging for skylines queries due to large space requirements. We show that pre-computing skyline results while minimizing storage overhead is NP-hard, and we provide heuristics that solve the problem more efficiently, while maintaining storage at reasonable levels. Our algorithms are novel and also applicable to regular skyline computation, but we focus on the encrypted setting where materialization reduces the response time of skyline queries from hours to seconds. Extensive experiments show that we clearly outperform existing work in terms of performance, and our security analysis proves that we obtain a small (and quantifiable) data leakage.

2022-02-09

Zheng, Shiyuan, Xie, Hong, Lui, John C.S.. 2021. Social Visibility Optimization in OSNs with Anonymity Guarantees: Modeling, Algorithms and Applications. 2021 IEEE 37th International Conference on Data Engineering (ICDE). :2063–2068.

Online social network (OSN) is an ideal venue to enhance one's visibility. This paper considers how a user (called requester) in an OSN selects a small number of available users and invites them as new friends/followers so as to maximize his "social visibility". More importantly, the requester has to do this under the anonymity setting, which means he is not allowed to know the neighborhood information of these available users in the OSN. In this paper, we first develop a mathematical model to quantify the social visibility and formulate the problem of visibility maximization with anonymity guarantee, abbreviated as "VisMAX-A". Then we design an algorithmic framework named as "AdaExp", which adaptively expands the requester's visibility in multiple rounds. In each round of the expansion, AdaExp uses a query oracle with anonymity guarantee to select only one available user. By using probabilistic data structures like the k-minimum values (KMV) sketch, we design an efficient query oracle with anonymity guarantees. We also conduct experiments on real-world social networks and validate the effectiveness of our algorithms.

2020-10-16

Zhang, Yiwei, Deng, Sanhong, Zhang, Yue, Kong, Jia. 2019. Research on Government Information Sharing Model Using Blockchain Technology. 2019 10th International Conference on Information Technology in Medicine and Education (ITME). :726—729.

Research Purpose: The distributed, traceable and security of blockchain technology are applicable to the construction of new government information resource models, which could eliminate the barn effect and trust in government information sharing, as well as promoting the transformation of government affairs from management to service, it is also of great significance to the sharing of government information and construction of service-oriented e-government. Propose Methods: By analyzing the current problems of government information sharing, combined with literature research, this paper proposes the theoretical framework and advantages of blockchain technology applied to government information management and sharing, expounds the blockchain-based solution, it also constructs a government information sharing model based on blockchain, and gives implementation strategies at the technical and management levels. Results and Conclusion: The government information sharing model based on the blockchain solution and the transparency of government information can be used as a research framework for information interaction analysis between the government and users. It can also promote the construction and development of information sharing for Chinese government, as well as providing unified information sharing solution at the departmental and regional levels for e-government.

2017-12-12

Kimmig, A., Memory, A., Miller, R. J., Getoor, L.. 2017. A Collective, Probabilistic Approach to Schema Mapping. 2017 IEEE 33rd International Conference on Data Engineering (ICDE). :921–932.

We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of mapping selection, that is, choosing the best mapping from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures interactions between mappings. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using stateof- the-art probabilistic reasoning techniques, which allows for inconsistencies and incompleteness. Using hundreds of realistic integration scenarios, we demonstrate that the accuracy of CMD is more than 33% above that of metadata-only approaches already for small data examples, and that CMD routinely finds perfect mappings even if a quarter of the data is inconsistent.