Biblio | CPS-VO

Suriarachchi, I., Withana, S., Plale, B.. 2018. Big Provenance Stream Processing for Data Intensive Computations. 2018 IEEE 14th International Conference on e-Science (e-Science). :245–255.

In the business and research landscape of today, data analysis consumes public and proprietary data from numerous sources, and utilizes any one or more of popular data-parallel frameworks such as Hadoop, Spark and Flink. In the Data Lake setting these frameworks co-exist. Our earlier work has shown that data provenance in Data Lakes can aid with both traceability and management. The sheer volume of fine-grained provenance generated in a multi-framework application motivates the need for on-the-fly provenance processing. We introduce a new parallel stream processing algorithm that reduces fine-grained provenance while preserving backward and forward provenance. The algorithm is resilient to provenance events arriving out-of-order. It is evaluated using several strategies for partitioning a provenance stream. The evaluation shows that the parallel algorithm performs well in processing out-of-order provenance streams, with good scalability and accuracy.

Wang, Xiao, Sabne, Amit, Kisner, Sherman, Raghunathan, Anand, Bouman, Charles, Midkiff, Samuel. 2016. High Performance Model Based Image Reconstruction. Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. :2:1–2:12.

Computed Tomography (CT) Image Reconstruction is an important technique used in a wide range of applications, ranging from explosive detection, medical imaging to scientific imaging. Among available reconstruction methods, Model Based Iterative Reconstruction (MBIR) produces higher quality images and allows for the use of more general CT scanner geometries than is possible with more commonly used methods. The high computational cost of MBIR, however, often makes it impractical in applications for which it would otherwise be ideal. This paper describes a new MBIR implementation that significantly reduces the computational cost of MBIR while retaining its benefits. It describes a novel organization of the scanner data into super-voxels (SV) that, combined with a super-voxel buffer (SVB), dramatically increase locality and prefetching, enable parallelism across SVs and lead to an average speedup of 187 on 20 cores.

Kishore, N., Kapoor, B.. 2014. An efficient parallel algorithm for hash computation in security and forensics applications. Advance Computing Conference (IACC), 2014 IEEE International. :873-877.

Hashing algorithms are used extensively in information security and digital forensics applications. This paper presents an efficient parallel algorithm hash computation. It's a modification of the SHA-1 algorithm for faster parallel implementation in applications such as the digital signature and data preservation in digital forensics. The algorithm implements recursive hash to break the chain dependencies of the standard hash function. We discuss the theoretical foundation for the work including the collision probability and the performance implications. The algorithm is implemented using the OpenMP API and experiments performed using machines with multicore processors. The results show a performance gain by more than a factor of 3 when running on the 8-core configuration of the machine.

Kishore, N., Kapoor, B.. 2014. An efficient parallel algorithm for hash computation in security and forensics applications. Advance Computing Conference (IACC), 2014 IEEE International. :873-877.

Hashing algorithms are used extensively in information security and digital forensics applications. This paper presents an efficient parallel algorithm hash computation. It's a modification of the SHA-1 algorithm for faster parallel implementation in applications such as the digital signature and data preservation in digital forensics. The algorithm implements recursive hash to break the chain dependencies of the standard hash function. We discuss the theoretical foundation for the work including the collision probability and the performance implications. The algorithm is implemented using the OpenMP API and experiments performed using machines with multicore processors. The results show a performance gain by more than a factor of 3 when running on the 8-core configuration of the machine.