Visible to the public Biblio

Filters: Keyword is parallelism  [Clear All Filters]
2023-03-03
Yang, Gangqiang, Shi, Zhengyuan, Chen, Cheng, Xiong, Hailiang, Hu, Honggang, Wan, Zhiguo, Gai, Keke, Qiu, Meikang.  2022.  Work-in-Progress: Towards a Smaller than Grain Stream Cipher: Optimized FPGA Implementations of Fruit-80. 2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES). :19–20.
Fruit-80, an ultra-lightweight stream cipher with 80-bit secret key, is oriented toward resource constrained devices in the Internet of Things. In this paper, we propose area and speed optimization architectures of Fruit-80 on FPGAs. The area optimization architecture reuses NFSR&LFSR feedback functions and achieves the most suitable ratio of look-up-tables and flip-flops. The speed optimization architecture adopts a hybrid approach for parallelization and reduces the latency of long data paths by pre-generating primary feedback and inserting flip-flops. In conclusion, the optimal throughput-to-area ratio of the speed optimization architecture is better than that of Grain v1. The area optimization architecture occupies only 35 slices on Xilinx Spartan-3 FPGA, smaller than that of Grain and other common stream ciphers. To the best of our knowledge, this result sets a new record of the minimum area in lightweight cipher implementations on FPGA.
2022-05-19
Zhang, Feng, Pan, Zaifeng, Zhou, Yanliang, Zhai, Jidong, Shen, Xipeng, Mutlu, Onur, Du, Xiaoyong.  2021.  G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression. 2021 IEEE 37th International Conference on Data Engineering (ICDE). :1679–1690.
Text analytics directly on compression (TADOC) has proven to be a promising technology for big data analytics. GPUs are extremely popular accelerators for data analytics systems. Unfortunately, no work so far shows how to utilize GPUs to accelerate TADOC. We describe G-TADOC, the first framework that provides GPU-based text analytics directly on compression, effectively enabling efficient text analytics on GPUs without decompressing the input data. G-TADOC solves three major challenges. First, TADOC involves a large amount of dependencies, which makes it difficult to exploit massive parallelism on a GPU. We develop a novel fine-grained thread-level workload scheduling strategy for GPU threads, which partitions heavily-dependent loads adaptively in a fine-grained manner. Second, in developing G-TADOC, thousands of GPU threads writing to the same result buffer leads to inconsistency while directly using locks and atomic operations lead to large synchronization overheads. We develop a memory pool with thread-safe data structures on GPUs to handle such difficulties. Third, maintaining the sequence information among words is essential for lossless compression. We design a sequence-support strategy, which maintains high GPU parallelism while ensuring sequence information. Our experimental evaluations show that G-TADOC provides 31.1× average speedup compared to state-of-the-art TADOC.
2020-03-04
Sadkhan, Sattar B., Yaseen, Basim S..  2019.  Hybrid Method to Implement a Parallel Search of the Cryptosystem Keys. 2019 International Conference on Advanced Science and Engineering (ICOASE). :204–207.

The current paper proposes a method to combine the theoretical concepts of the parallel processing created by the DNA computing and GA environments, with the effectiveness novel mechanism of the distinction and discover of the cryptosystem keys. Three-level contributions to the current work, the first is the adoption of a final key sequence mechanism by the principle of interconnected sequence parts, the second to exploit the principle of the parallel that provides GA in the search for the counter value of the sequences of the challenge to the mechanism of the discrimination, the third, the most important and broadening the breaking of the cipher, is the harmony of the principle of the parallelism that has found via the DNA computing to discover the basic encryption key. The proposed method constructs a combined set of files includes binary sequences produced from substitution of the guess attributes of the binary equations system of the cryptosystem, as well as generating files that include all the prospects of the DNA strands for all successive cipher characters, the way to process these files to be obtained from the first character file, where extract a key sequence of each sequence from mentioned file and processed with the binary sequences that mentioned the counter produced from GA. The aim of the paper is exploitation and implementation the theoretical principles of the parallelism that providing via biological environment with the new sequences recognition mechanism in the cryptanalysis.

2018-02-21
Zhou, G., Feng, Y., Bo, R., Chien, L., Zhang, X., Lang, Y., Jia, Y., Chen, Z..  2017.  GPU-Accelerated Batch-ACPF Solution for N-1 Static Security Analysis. IEEE Transactions on Smart Grid. 8:1406–1416.

Graphics processing unit (GPU) has been applied successfully in many scientific computing realms due to its superior performances on float-pointing calculation and memory bandwidth, and has great potential in power system applications. The N-1 static security analysis (SSA) appears to be a candidate application in which massive alternating current power flow (ACPF) problems need to be solved. However, when applying existing GPU-accelerated algorithms to solve N-1 SSA problem, the degree of parallelism is limited because existing researches have been devoted to accelerating the solution of a single ACPF. This paper therefore proposes a GPU-accelerated solution that creates an additional layer of parallelism among batch ACPFs and consequently achieves a much higher level of overall parallelism. First, this paper establishes two basic principles for determining well-designed GPU algorithms, through which the limitation of GPU-accelerated sequential-ACPF solution is demonstrated. Next, being the first of its kind, this paper proposes a novel GPU-accelerated batch-QR solver, which packages massive number of QR tasks to formulate a new larger-scale problem and then achieves higher level of parallelism and better coalesced memory accesses. To further improve the efficiency of solving SSA, a GPU-accelerated batch-Jacobian-Matrix generating and contingency screening is developed and carefully optimized. Lastly, the complete process of the proposed GPU-accelerated batch-ACPF solution for SSA is presented. Case studies on an 8503-bus system show dramatic computation time reduction is achieved compared with all reported existing GPU-accelerated methods. In comparison to UMFPACK-library-based single-CPU counterpart using Intel Xeon E5-2620, the proposed GPU-accelerated SSA framework using NVIDIA K20C achieves up to 57.6 times speedup. It can even achieve four times speedup when compared to one of the fastest multi-core CPU parallel computing solution using KLU library. The prop- sed batch-solving method is practically very promising and lays a critical foundation for many other power system applications that need to deal with massive subtasks, such as Monte-Carlo simulation and probabilistic power flow.

2017-06-27
Jordan, Michael I..  2016.  On Computational Thinking, Inferential Thinking and Data Science. Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. :47–47.

The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level-in computer science, the growth of the number of data points is a source of "complexity" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as "runtime" in core statistical theory and the lack of a role for statistical concepts such as "risk" in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and ways to exploit parallelism so as to trade off the speed and accuracy of inference.

2017-05-22
Anderson, Brian, Bergstrom, Lars, Goregaokar, Manish, Matthews, Josh, McAllister, Keegan, Moffitt, Jack, Sapin, Simon.  2016.  Engineering the Servo Web Browser Engine Using Rust. Proceedings of the 38th International Conference on Software Engineering Companion. :81–89.

All modern web browsers –- Internet Explorer, Firefox, Chrome, Opera, and Safari –- have a core rendering engine written in C++. This language choice was made because it affords the systems programmer complete control of the underlying hardware features and memory in use, and it provides a transparent compilation model. Unfortunately, this language is complex (especially to new contributors!), challenging to write correct parallel code in, and highly susceptible to memory safety issues that potentially lead to security holes. Servo is a project started at Mozilla Research to build a new web browser engine that preserves the capabilities of these other browser engines but also both takes advantage of the recent trends in parallel hardware and is more memory-safe. We use a new language, Rust, that provides us a similar level of control of the underlying system to C++ but which statically prevents many memory safety issues and provides direct support for parallelism and concurrency. In this paper, we show how a language with an advanced type system can address many of the most common security issues and software engineering challenges in other browser engines, while still producing code that has the same performance and memory profile. This language is also quite accessible to new open source contributors and employees, even those without a background in C++ or systems programming. We also outline several pitfalls encountered along the way and describe some potential areas for future improvement.

2017-05-16
Fu, Zhe, Liu, Zhi, Li, Jun.  2016.  ParaRegex: Towards Fast Regular Expression Matching in Parallel. Proceedings of the 2016 Symposium on Architectures for Networking and Communications Systems. :113–114.

In this paper, we propose ParaRegex, a novel approach for fast parallel regular expression matching. ParaRegex is a framework that implements data-parallel regular expression matching for deterministic finite automaton based methods. Experimental evaluation shows that ParaRegex produces a fast matching engine with speeds of up to 6 times compared to sequential implementations on a commodity 8-thread workstation.

2017-02-14
A. Motamedi, M. Najafi, N. Erami.  2015.  "Parallel secure turbo code for security enhancement in physical layer". 2015 Signal Processing and Intelligent Systems Conference (SPIS). :179-184.

Turbo code has been one of the important subjects in coding theory since 1993. This code has low Bit Error Rate (BER) but decoding complexity and delay are big challenges. On the other hand, considering the complexity and delay of separate blocks for coding and encryption, if these processes are combined, the security and reliability of communication system are guaranteed. In this paper a secure decoding algorithm in parallel on General-Purpose Graphics Processing Units (GPGPU) is proposed. This is the first prototype of a fast and parallel Joint Channel-Security Coding (JCSC) system. Despite of encryption process, this algorithm maintains desired BER and increases decoding speed. We considered several techniques for parallelism: (1) distribute decoding load of a code word between multiple cores, (2) simultaneous decoding of several code words, (3) using protection techniques to prevent performance degradation. We also propose two kinds of optimizations to increase the decoding speed: (1) memory access improvement, (2) the use of new GPU properties such as concurrent kernel execution and advanced atomics to compensate buffering latency.