Visible to the public Biblio

Filters: Keyword is CUDA  [Clear All Filters]
2020-06-26
M, Raviraja Holla, D, Suma.  2019.  Memory Efficient High-Performance Rotational Image Encryption. 2019 International Conference on Communication and Electronics Systems (ICCES). :60—64.

Image encryption is an essential part of a Visual Cryptography. Existing traditional sequential encryption techniques are infeasible to real-time applications. High-performance reformulations of such methods are increasingly growing over the last decade. These reformulations proved better performances over their sequential counterparts. A rotational encryption scheme encrypts the images in such a way that the decryption is possible with the rotated encrypted images. A parallel rotational encryption technique makes use of a high-performance device. But it less-leverages the optimizations offered by them. We propose a rotational image encryption technique which makes use of memory coalescing provided by the Compute Unified Device Architecture (CUDA). The proposed scheme achieves improved global memory utilization and increased efficiency.

2019-12-30
Razaque, Abdul, Jinrui, Wang, Zancheng, Wang, Hani, Qassim Bani, Khaskheli, Murad Ali, Bhutto, Waseem Ahmed.  2018.  Integration of CPU and GPU to Accelerate RSA Modular Exponentiation Operation. 2018 IEEE Long Island Systems, Applications and Technology Conference (LISAT). :1-6.

Now-a-days, the security of data becomes more and more important, people store many personal information in their phones. However, stored information require security and maintain privacy. Encryption algorithm has become the main force of maintaining the security of data. Thus, the algorithm complexity and encryption efficiency have become the main measurement of whether the encryption algorithm is save or not. With the development of hardware, we have many tools to improve the algorithm at present. Because modular exponentiation in RSA algorithm can be divided into several parts mathematically. In this paper, we introduce a conception by dividing the process of encryption and add the model into graphics process unit (GPU). By using GPU's capacity in parallel computing, the core of RSA can be accelerated by using central process unit (CPU) and GPU. Compute unified device architecture (CUDA) is a platform which can combine CPU and GPU together to realize GPU parallel programming and this is the tool we use to perform experience of accelerating RSA algorithm. This paper will also build up a mathematical model to help understand the mechanism of RSA encryption algorithm.

2018-09-12
Khazankin, G. R., Komarov, S., Kovalev, D., Barsegyan, A., Likhachev, A..  2017.  System architecture for deep packet inspection in high-speed networks. 2017 Siberian Symposium on Data Science and Engineering (SSDSE). :27–32.

To solve the problems associated with large data volume real-time processing, heterogeneous systems using various computing devices are increasingly used. The characteristic of solving this class of problems is related to the fact that there are two directions for improving methods of real-time data analysis: the first is the development of algorithms and approaches to analysis, and the second is the development of hardware and software. This article reviews the main approaches to the architecture of a hardware-software solution for traffic capture and deep packet inspection (DPI) in data transmission networks with a bandwidth of 80 Gbit/s and higher. At the moment there are software and hardware tools that allow designing the architecture of capture system and deep packet inspection: 1) Using only the central processing unit (CPU); 2) Using only the graphics processing unit (GPU); 3) Using the central processing unit and graphics processing unit simultaneously (CPU + GPU). In this paper, we consider these key approaches. Also attention is paid to both hardware and software requirements for the architecture of solutions. Pain points and remedies are described.

2018-05-02
Lin, Frank Po-Chen, Phoa, Frederick Kin Hing.  2017.  A Performance Study of Parallel Programming via CPU and GPU on Swarm Intelligence Based Evolutionary Algorithm. Proceedings of the 2017 International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence. :1–5.
Algorithm parallelization diversifies a complicated computing task into small parts, and thus it receives wide attention when it is implemented to evolutionary algorithms (EA). This works considers a recently developed EA called the Swarm Intelligence Based (SIB) method as a benchmark to compare the performance of two types of parallel computing approaches: a CPU-based approach via OpenMP and a GPU-based approach via CUDA. The experiments are conducted to solve an optimization problem in the search of supersaturated designs via the SIB method. Unlike conventional suggestions, we show that the CPU-based OpenMP outperforms CUDA at the execution time. At the end of this paper, we provide several potential problems in GPU parallel computing towards EA and suggest to use CPU-based OpenMP for parallel computing of EA.
2017-02-14
A. Motamedi, M. Najafi, N. Erami.  2015.  "Parallel secure turbo code for security enhancement in physical layer". 2015 Signal Processing and Intelligent Systems Conference (SPIS). :179-184.

Turbo code has been one of the important subjects in coding theory since 1993. This code has low Bit Error Rate (BER) but decoding complexity and delay are big challenges. On the other hand, considering the complexity and delay of separate blocks for coding and encryption, if these processes are combined, the security and reliability of communication system are guaranteed. In this paper a secure decoding algorithm in parallel on General-Purpose Graphics Processing Units (GPGPU) is proposed. This is the first prototype of a fast and parallel Joint Channel-Security Coding (JCSC) system. Despite of encryption process, this algorithm maintains desired BER and increases decoding speed. We considered several techniques for parallelism: (1) distribute decoding load of a code word between multiple cores, (2) simultaneous decoding of several code words, (3) using protection techniques to prevent performance degradation. We also propose two kinds of optimizations to increase the decoding speed: (1) memory access improvement, (2) the use of new GPU properties such as concurrent kernel execution and advanced atomics to compensate buffering latency.