Biblio

List
Filter

Found 4 results

Filters: Keyword is message passing interface [Clear All Filters]

2023-05-12

Li, Shushan, Wang, Meng, Zhang, Hong. 2022. Deadlock Detection for MPI Programs Based on Refined Match-sets. 2022 IEEE International Conference on Cluster Computing (CLUSTER). :82–93.

Deadlock is one of the critical problems in the message passing interface. At present, most techniques for detecting the MPI deadlock issue rely on exhausting all execution paths of a program, which is extremely inefficient. In addition, with the increasing number of wildcards that receive events and processes, the number of execution paths raises exponentially, further worsening the situation. To alleviate the problem, we propose a deadlock detection approach called SAMPI based on match-sets to avoid exploring execution paths. In this approach, a match detection rule is employed to form the rough match-sets based on Lazy Lamport Clocks Protocol. Then we design three refining algorithms based on the non-overtaking rule and MPI communication mechanism to refine the match-sets. Finally, deadlocks are detected by analyzing the refined match-sets. We performed the experimental evaluation on 15 various programs, and the experimental results show that SAMPI is really efficient in detecting deadlocks in MPI programs, especially in handling programs with many interleavings.

ISSN: 2168-9253

2018-02-21

Bai, Xu, Jiang, Lei, Dai, Qiong, Yang, Jiajia, Tan, Jianlong. 2017. Acceleration of RSA processes based on hybrid ARM-FPGA cluster. 2017 IEEE Symposium on Computers and Communications (ISCC). :682–688.

Cooperation of software and hardware with hybrid architectures, such as Xilinx Zynq SoC combining ARM CPU and FPGA fabric, is a high-performance and low-power platform for accelerating RSA Algorithm. This paper adopts the none-subtraction Montgomery algorithm and the Chinese Remainder Theorem (CRT) to implement high-speed RSA processors, and deploys a 48-node cluster infrastructure based on Zynq SoC to achieve extremely high scalability and throughput of RSA computing. In this design, we use the ARM to implement node-to-node communication with the Message Passing Interface (MPI) while use the FPGA to handle complex calculation. Finally, the experimental results show that the overall performance is linear with the number of nodes. And the cluster achieves 6× 9× speedup against a multi-core desktop (Intel i7-3770) and comparable performance to a many-core server (288-core). In addition, we gain up to 2.5× energy efficiency compared to these two traditional platforms.

2017-05-22

Holmes, Daniel, Mohror, Kathryn, Grant, Ryan E., Skjellum, Anthony, Schulz, Martin, Bland, Wesley, Squyres, Jeffrey M.. 2016. MPI Sessions: Leveraging Runtime Infrastructure to Increase Scalability of Applications at Exascale. Proceedings of the 23rd European MPI Users' Group Meeting. :121–129.

MPI includes all processes in MPI\_COMM\_WORLD; this is untenable for reasons of scale, resiliency, and overhead. This paper offers a new approach, extending MPI with a new concept called Sessions, which makes two key contributions: a tighter integration with the underlying runtime system; and a scalable route to communication groups. This is a fundamental change in how we organise and address MPI processes that removes well-known scalability barriers by no longer requiring the global communicator MPI\_COMM\_WORLD.

2015-04-30

Smith, S., Woodward, C., Liang Min, Chaoyang Jing, Del Rosso, A.. 2014. On-line transient stability analysis using high performance computing. Innovative Smart Grid Technologies Conference (ISGT), 2014 IEEE PES. :1-5.

In this paper, parallelization and high performance computing are utilized to enable ultrafast transient stability analysis that can be used in a real-time environment to quickly perform “what-if” simulations involving system dynamics phenomena. EPRI's Extended Transient Midterm Simulation Program (ETMSP) is modified and enhanced for this work. The contingency analysis is scaled for large-scale contingency analysis using Message Passing Interface (MPI) based parallelization. Simulations of thousands of contingencies on a high performance computing machine are performed, and results show that parallelization over contingencies with MPI provides good scalability and computational gains. Different ways to reduce the Input/Output (I/O) bottleneck are explored, and findings indicate that architecting a machine with a larger local disk and maintaining a local file system significantly improve the scaling results. Thread-parallelization of the sparse linear solve is explored also through use of the SuperLU_MT library.