Visible to the public Biblio

Filters: Keyword is hash join  [Clear All Filters]
2020-06-08
Tang, Deyou, Zhang, Yazhuo, Zeng, Qingmiao.  2019.  Optimization of Hardware-oblivious and Hardware-conscious Hash-join Algorithms on KNL. 2019 4th International Conference on Cloud Computing and Internet of Things (CCIOT). :24–28.
Investigation of hash join algorithm on multi-core and many-core platforms showed that carefully tuned hash join implementations could outperform simple hash joins on most multi-core servers. However, hardware-oblivious hash join has shown competitive performance on many-core platforms. Knights Landing (KNL) has received attention in the field of parallel computing for its massively data-parallel nature and high memory bandwidth, but both hardware-oblivious and hardware-conscious hash join algorithms have not been systematically discussed and evaluated for KNL's characteristics (high bandwidth, cluster mode, etc.). In this paper, we present the design and implementation of the state-of-the-art hardware-oblivious and hardware-conscious hash joins that are tuned to exploit various KNL hardware characteristics. Using a thorough evaluation, we show that:1) Memory allocation strategies based on KNL's architecture are effective for both hardware-oblivious and hardware-conscious hash join algorithms; 2) In order to improve the efficiency of the hash join algorithms, hardware architecture features are still non-negligible factors.
2017-05-16
Shin, Mincheol, Roh, Hongchan, Jung, Wonmook, Park, Sanghyun.  2016.  Optimizing Hash Partitioning for Solid State Drives. Proceedings of the 31st Annual ACM Symposium on Applied Computing. :1000–1007.

The use of flashSSDs has increased rapidly in a wide range of areas due to their superior energy efficiency, shorter access time, and higher bandwidth when compared to HDDs. The internal parallelism created by multiple flash memory packages embedded in a flashSSDs, is one of the unique features of flashSSDs. Many new DBMS technologies have been developed for flashSSDs, but query processing for flashSSDs have drawn less attention than other DBMS technologies. Hash partitioning is popularly used in query processing algorithms to materialize their intermediate results in an efficient manner. In this paper, we propose a novel hash partitioning algorithm that exploits the internal parallelism of flashSSDs. The devised hash partitioning method outperforms the traditional hash partitioning technique regardless of the amount of available main memory independently from the buffer management strategies (blocked I/O vs page sized I/O). We implemented our method based on the source code of the PostgreSQL storage manager. PostgreSQL relation files created by the TPC-H workload were employed in the experiments. Our method was found to be up to 3.55 times faster than the traditional method with blocked I/O, and 2.36 times faster than the traditional method with pagesized I/O.