Visible to the public Biblio

Filters: Keyword is data warehouses  [Clear All Filters]
2023-09-08
Zalozhnev, Alexey Yu., Ginz, Vasily N., Loktionov, Anatoly Eu..  2022.  Intelligent System and Human-Computer Interaction for Personal Data Cyber Security in Medicaid Enterprises. 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET). :1–4.
Intelligent Systems for Personal Data Cyber Security is a critical component of the Personal Information Management of Medicaid Enterprises. Intelligent Systems for Personal Data Cyber Security combines components of Cyber Security Systems with Human-Computer Interaction. It also uses the technology and principles applied to the Internet of Things. The use of software-hardware concepts and solutions presented in this report is, in the authors’ opinion, some step in the working-out of the Intelligent Systems for Personal Data Cyber Security in Medicaid Enterprises. These concepts may also be useful for developers of these types of systems.
2022-07-29
Fuhry, Benny, Jayanth Jain, H A, Kerschbaum, Florian.  2021.  EncDBDB: Searchable Encrypted, Fast, Compressed, In-Memory Database Using Enclaves. 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :438—450.
Data confidentiality is an important requirement for clients when outsourcing databases to the cloud. Trusted execution environments, such as Intel SGX, offer an efficient solution to this confidentiality problem. However, existing TEE-based solutions are not optimized for column-oriented, in-memory databases and pose impractical memory requirements on the enclave. We present EncDBDB, a novel approach for client-controlled encryption of a column-oriented, in-memory databases allowing range searches using an enclave. EncDBDB offers nine encrypted dictionaries, which provide different security, performance, and storage efficiency tradeoffs for the data. It is especially suited for complex, read-oriented, analytic queries as present, e.g., in data warehouses. The computational overhead compared to plaintext processing is within a millisecond even for databases with millions of entries and the leakage is limited. Compressed encrypted data requires less space than a corresponding plaintext column. Furthermore, EncDBDB's enclave is very small reducing the potential for security-relevant implementation errors and side-channel leakages.
2021-11-29
Baker, Oras, Thien, Chuong Nguyen.  2020.  A New Approach to Use Big Data Tools to Substitute Unstructured Data Warehouse. 2020 IEEE Conference on Big Data and Analytics (ICBDA). :26–31.
Data warehouse and big data have become the trend to help organise data effectively. Business data are originating in various kinds of sources with different forms from conventional structured data to unstructured data, it is the input for producing useful information essential for business sustainability. This research will navigate through the complicated designs of the common big data and data warehousing technologies to propose an effective approach to use these technologies for designing and building an unstructured textual data warehouse, a crucial and essential tool for most enterprises nowadays for decision making and gaining business competitive advantages. In this research, we utilised the IBM BigInsights Text Analytics, PostgreSQL, and Pentaho tools, an unstructured data warehouse is implemented and worked excellently with the unstructured text from Amazon review datasets, the new proposed approach creates a practical solution for building an unstructured data warehouse.
2021-10-12
Uy, Francis Aldrine A., Vea, Larry A., Binag, Matthew G., Diaz, Keith Anshilo L., Gallardo, Roy G., Navarro, Kevin Jorge A., Pulido, Maria Teresa R., Pinca, Ryan Christopher B., Rejuso, Billy John Rudolfh I., Santos, Carissa Jane R..  2020.  The Potential of New Data Sources in a Data-Driven Transportation, Operation, Management and Assessment System (TOMAS). 2020 IEEE Conference on Technologies for Sustainability (SusTech). :1–8.
We present our journey in constructing the first integrated data warehouse for Philippine transportation research in the hopes of developing a Transportation Decision Support System for impact studies and policy making. We share how we collected data from diverse sources, processed them into a homogeneous format and applied them to our multimodal platform. We also list the challenges we encountered, including bureaucratic delays, data privacy concerns, lack of software, and overlapping datasets. The data warehouse shall serve as a public resource for researchers and professionals, and for government officials to make better-informed policies. The warehouse will also function within our multi-modal platform for measurement, modelling, and visualization of road transportation. This work is our contribution to improve the transportation situation in the Philippines, both in the local and national levels, to boost our economy and overall quality of life.
2021-02-22
Alzahrani, A., Feki, J..  2020.  Toward a Natural Language-Based Approach for the Specification of Decisional-Users Requirements. 2020 3rd International Conference on Computer Applications Information Security (ICCAIS). :1–6.
The number of organizations adopting the Data Warehouse (DW) technology along with data analytics in order to improve the effectiveness of their decision-making processes is permanently increasing. Despite the efforts invested, the DW design remains a great challenge research domain. More accurately, the design quality of the DW depends on several aspects; among them, the requirement-gathering phase is a critical and complex task. In this context, we propose a Natural language (NL) NL-template based design approach, which is twofold; firstly, it facilitates the involvement of decision-makers in the early step of the DW design; indeed, using NL is a good and natural means to encourage the decision-makers to express their requirements as query-like English sentences. Secondly, our approach aims to generate a DW multidimensional schema from a set of gathered requirements (as OLAP: On-Line-Analytical-Processing queries, written according to the NL suggested templates). This approach articulates around: (i) two NL-templates for specifying multidimensional components, and (ii) a set of five heuristic rules for extracting the multidimensional concepts from requirements. Really, we are developing a software prototype that accepts the decision-makers' requirements then automatically identifies the multidimensional components of the DW model.
2019-09-23
Zheng, N., Alawini, A., Ives, Z. G..  2019.  Fine-Grained Provenance for Matching ETL. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :184–195.
Data provenance tools capture the steps used to produce analyses. However, scientists must choose among workflow provenance systems, which allow arbitrary code but only track provenance at the granularity of files; provenance APIs, which provide tuple-level provenance, but incur overhead in all computations; and database provenance tools, which track tuple-level provenance through relational operators and support optimization, but support a limited subset of data science tasks. None of these solutions are well suited for tracing errors introduced during common ETL, record alignment, and matching tasks - for data types such as strings, images, etc. Scientists need new capabilities to identify the sources of errors, find why different code versions produce different results, and identify which parameter values affect output. We propose PROVision, a provenance-driven troubleshooting tool that supports ETL and matching computations and traces extraction of content within data objects. PROVision extends database-style provenance techniques to capture equivalences, support optimizations, and enable selective evaluation. We formalize our extensions, implement them in the PROVision system, and validate their effectiveness and scalability for common ETL and matching tasks.