Biblio
To solve the problems associated with large data volume real-time processing, heterogeneous systems using various computing devices are increasingly used. The characteristic of solving this class of problems is related to the fact that there are two directions for improving methods of real-time data analysis: the first is the development of algorithms and approaches to analysis, and the second is the development of hardware and software. This article reviews the main approaches to the architecture of a hardware-software solution for traffic capture and deep packet inspection (DPI) in data transmission networks with a bandwidth of 80 Gbit/s and higher. At the moment there are software and hardware tools that allow designing the architecture of capture system and deep packet inspection: 1) Using only the central processing unit (CPU); 2) Using only the graphics processing unit (GPU); 3) Using the central processing unit and graphics processing unit simultaneously (CPU + GPU). In this paper, we consider these key approaches. Also attention is paid to both hardware and software requirements for the architecture of solutions. Pain points and remedies are described.
Graphics processing unit (GPU) has been applied successfully in many scientific computing realms due to its superior performances on float-pointing calculation and memory bandwidth, and has great potential in power system applications. The N-1 static security analysis (SSA) appears to be a candidate application in which massive alternating current power flow (ACPF) problems need to be solved. However, when applying existing GPU-accelerated algorithms to solve N-1 SSA problem, the degree of parallelism is limited because existing researches have been devoted to accelerating the solution of a single ACPF. This paper therefore proposes a GPU-accelerated solution that creates an additional layer of parallelism among batch ACPFs and consequently achieves a much higher level of overall parallelism. First, this paper establishes two basic principles for determining well-designed GPU algorithms, through which the limitation of GPU-accelerated sequential-ACPF solution is demonstrated. Next, being the first of its kind, this paper proposes a novel GPU-accelerated batch-QR solver, which packages massive number of QR tasks to formulate a new larger-scale problem and then achieves higher level of parallelism and better coalesced memory accesses. To further improve the efficiency of solving SSA, a GPU-accelerated batch-Jacobian-Matrix generating and contingency screening is developed and carefully optimized. Lastly, the complete process of the proposed GPU-accelerated batch-ACPF solution for SSA is presented. Case studies on an 8503-bus system show dramatic computation time reduction is achieved compared with all reported existing GPU-accelerated methods. In comparison to UMFPACK-library-based single-CPU counterpart using Intel Xeon E5-2620, the proposed GPU-accelerated SSA framework using NVIDIA K20C achieves up to 57.6 times speedup. It can even achieve four times speedup when compared to one of the fastest multi-core CPU parallel computing solution using KLU library. The prop- sed batch-solving method is practically very promising and lays a critical foundation for many other power system applications that need to deal with massive subtasks, such as Monte-Carlo simulation and probabilistic power flow.