Visible to the public Biblio

Filters: Keyword is scientific workflows  [Clear All Filters]
2021-09-16
Konjaang, J. Kok, Xu, Lina.  2020.  Cost Optimised Heuristic Algorithm (COHA) for Scientific Workflow Scheduling in IaaS Cloud Environment. 2020 IEEE 6th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :162–168.
Cloud computing, a multipurpose and high-performance internet-based computing, can model and transform a large range of application requirements into a set of workflow tasks. It allows users to represent their computational needs conveniently for data retrieval, reformatting, and analysis. However, workflow applications are big data applications and often take long hours to finish executing due to their nature and data size. In this paper, we study the cost optimised scheduling algorithms in cloud and proposed a novel task splitting algorithm named Cost optimised Heuristic Algorithm (COHA) for the cloud scheduler to optimise the execution cost. In this algorithm, the large tasks are split into sub-tasks to reduce their execution time. The design purpose is to enable all tasks to adequately meet their deadlines. We have carefully tested the performance of the COHA with a list of workflow inputs. The simulation results have convincingly demonstrated that COHA can effectively perform VM allocation and deployment, and well handle randomly arrived tasks. It can efficiently reduce execution costs while also allowing all tasks to properly finish before their deadlines. Overall, the improvements in our algorithm have remarkably reduced the execution cost by 32.5% for Sipht, 3.9% for Montage, and 1.2% for CyberShake workflows when compared to the state of art work.
Du, Xin, Tang, Songtao, Lu, Zhihui, Wet, Jie, Gai, Keke, Hung, Patrick C.K..  2020.  A Novel Data Placement Strategy for Data-Sharing Scientific Workflows in Heterogeneous Edge-Cloud Computing Environments. 2020 IEEE International Conference on Web Services (ICWS). :498–507.
The deployment of datasets in the heterogeneous edge-cloud computing paradigm has received increasing attention in state-of-the-art research. However, due to their large sizes and the existence of private scientific datasets, finding an optimal data placement strategy that can minimize data transmission as well as improve performance, remains a persistent problem. In this study, the advantages of both edge and cloud computing are combined to construct a data placement model that works for multiple scientific workflows. Apparently, the most difficult research challenge is to provide a data placement strategy to consider shared datasets, both within individual and among multiple workflows, across various geographically distributed environments. According to the constructed model, not only the storage capacity of edge micro-datacenters, but also the data transfer between multiple clouds across regions must be considered. To address this issue, we considered the characteristics of this model and identified the factors that are causing the transmission delay. The authors propose using a discrete particle swarm optimization algorithm with differential evolution (DE-DPSO) to distribute dataset during workflow execution. Based on this, a new data placement strategy named DE-DPSO-DPS is proposed. DE-DPSO-DPS is evaluated using several experiments designed in simulated heterogeneous edge-cloud computing environments. The results demonstrate that our data placement strategy can effectively reduce the data transmission time and achieve superior performance as compared to traditional strategies for data-sharing scientific workflows.
2021-09-01
Gegan, Ross, Mao, Christina, Ghosal, Dipak, Bishop, Matt, Peisert, Sean.  2020.  Anomaly Detection for Science DMZs Using System Performance Data. 2020 International Conference on Computing, Networking and Communications (ICNC). :492—496.
Science DMZs are specialized networks that enable large-scale distributed scientific research, providing efficient and guaranteed performance while transferring large amounts of data at high rates. The high-speed performance of a Science DMZ is made viable via data transfer nodes (DTNs), therefore they are a critical point of failure. DTNs are usually monitored with network intrusion detection systems (NIDS). However, NIDS do not consider system performance data, such as network I/O interrupts and context switches, which can also be useful in revealing anomalous system performance potentially arising due to external network based attacks or insider attacks. In this paper, we demonstrate how system performance metrics can be applied towards securing a DTN in a Science DMZ network. Specifically, we evaluate the effectiveness of system performance data in detecting TCP-SYN flood attacks on a DTN using DBSCAN (a density-based clustering algorithm) for anomaly detection. Our results demonstrate that system interrupts and context switches can be used to successfully detect TCP-SYN floods, suggesting that system performance data could be effective in detecting a variety of attacks not easily detected through network monitoring alone.
2020-03-16
Singh, Rina, Graves, Jeffrey A., Anantharaj, Valentine, Sukumar, Sreenivas R..  2019.  Evaluating Scientific Workflow Engines for Data and Compute Intensive Discoveries. 2019 IEEE International Conference on Big Data (Big Data). :4553–4560.
Workflow engines used to script scientific experiments involving numerical simulation, data analysis, instruments, edge sensors, and artificial intelligence have to deal with the complexities of hardware, software, resource availability, and the collaborative nature of science. In this paper, we survey workflow engines used in data-intensive and compute-intensive discovery pipelines from scientific disciplines such as astronomy, high energy physics, earth system science, bio-medicine, and material science and present a qualitative analysis of their respective capabilities. We compare 5 popular workflow engines and their differentiated approach to job orchestration, job launching, data management and provenance, security authentication, ease-ofuse, workflow description, and scripting semantics. The comparisons presented in this paper allow practitioners to choose the appropriate engine for their scientific experiment and lead to recommendations for future work.
2019-10-28
Huang, Jingwei.  2018.  From Big Data to Knowledge: Issues of Provenance, Trust, and Scientific Computing Integrity. 2018 IEEE International Conference on Big Data (Big Data). :2197–2205.
This paper addresses the nature of data and knowledge, the relation between them, the variety of views as a characteristic of Big Data regarding that data may come from many different sources/views from different viewpoints, and the associated essential issues of data provenance, knowledge provenance, scientific computing integrity, and trust in the data science process. Towards the direction of data-intensive science and engineering, it is of paramount importance to ensure Scientific Computing Integrity (SCI). A failure of SCI may be caused by malicious attacks, natural environmental changes, faults of scientists, operations mistakes, faults of supporting systems, faults of processes, and errors in the data or theories on which a research relies. The complexity of scientific workflows and large provenance graphs as well as various causes for SCI failures make ensuring SCI extremely difficult. Provenance and trust play critical role in evaluating SCI. This paper reports our progress in building a model for provenance-based trust reasoning about SCI.