Biblio
Image encryption is an essential part of a Visual Cryptography. Existing traditional sequential encryption techniques are infeasible to real-time applications. High-performance reformulations of such methods are increasingly growing over the last decade. These reformulations proved better performances over their sequential counterparts. A rotational encryption scheme encrypts the images in such a way that the decryption is possible with the rotated encrypted images. A parallel rotational encryption technique makes use of a high-performance device. But it less-leverages the optimizations offered by them. We propose a rotational image encryption technique which makes use of memory coalescing provided by the Compute Unified Device Architecture (CUDA). The proposed scheme achieves improved global memory utilization and increased efficiency.
Now-a-days, the security of data becomes more and more important, people store many personal information in their phones. However, stored information require security and maintain privacy. Encryption algorithm has become the main force of maintaining the security of data. Thus, the algorithm complexity and encryption efficiency have become the main measurement of whether the encryption algorithm is save or not. With the development of hardware, we have many tools to improve the algorithm at present. Because modular exponentiation in RSA algorithm can be divided into several parts mathematically. In this paper, we introduce a conception by dividing the process of encryption and add the model into graphics process unit (GPU). By using GPU's capacity in parallel computing, the core of RSA can be accelerated by using central process unit (CPU) and GPU. Compute unified device architecture (CUDA) is a platform which can combine CPU and GPU together to realize GPU parallel programming and this is the tool we use to perform experience of accelerating RSA algorithm. This paper will also build up a mathematical model to help understand the mechanism of RSA encryption algorithm.
To solve the problems associated with large data volume real-time processing, heterogeneous systems using various computing devices are increasingly used. The characteristic of solving this class of problems is related to the fact that there are two directions for improving methods of real-time data analysis: the first is the development of algorithms and approaches to analysis, and the second is the development of hardware and software. This article reviews the main approaches to the architecture of a hardware-software solution for traffic capture and deep packet inspection (DPI) in data transmission networks with a bandwidth of 80 Gbit/s and higher. At the moment there are software and hardware tools that allow designing the architecture of capture system and deep packet inspection: 1) Using only the central processing unit (CPU); 2) Using only the graphics processing unit (GPU); 3) Using the central processing unit and graphics processing unit simultaneously (CPU + GPU). In this paper, we consider these key approaches. Also attention is paid to both hardware and software requirements for the architecture of solutions. Pain points and remedies are described.
Turbo code has been one of the important subjects in coding theory since 1993. This code has low Bit Error Rate (BER) but decoding complexity and delay are big challenges. On the other hand, considering the complexity and delay of separate blocks for coding and encryption, if these processes are combined, the security and reliability of communication system are guaranteed. In this paper a secure decoding algorithm in parallel on General-Purpose Graphics Processing Units (GPGPU) is proposed. This is the first prototype of a fast and parallel Joint Channel-Security Coding (JCSC) system. Despite of encryption process, this algorithm maintains desired BER and increases decoding speed. We considered several techniques for parallelism: (1) distribute decoding load of a code word between multiple cores, (2) simultaneous decoding of several code words, (3) using protection techniques to prevent performance degradation. We also propose two kinds of optimizations to increase the decoding speed: (1) memory access improvement, (2) the use of new GPU properties such as concurrent kernel execution and advanced atomics to compensate buffering latency.