Visible to the public Biblio

Filters: Keyword is high-performance  [Clear All Filters]
2022-02-22
Tan, Qinyun, Xiao, Kun, He, Wen, Lei, Pinyuan, Chen, Lirong.  2021.  A Global Dynamic Load Balancing Mechanism with Low Latency for Micokernel Operating System. 2021 7th International Symposium on System and Software Reliability (ISSSR). :178—187.
As Internet of Things(IOT) devices become intelli-gent, more powerful computing capability is required. Multi-core processors are widely used in IoT devices because they provide more powerful computing capability while ensuring low power consumption. Therefore, it requires the operating system on IoT devices to support and optimize the scheduling algorithm for multi-core processors. Nowadays, microkernel-based operating systems, such as QNX Neutrino RTOS and HUAWEI Harmony OS, are widely used in IoT devices because of their real-time and security feature. However, research on multi-core scheduling for microkernel operating systems is relatively limited, especially for load balancing mechanisms. Related research is still mainly focused on the traditional monolithic operating systems, such as Linux. Therefore, this paper proposes a low-latency, high- performance, and high real-time centralized global dynamic multi-core load balancing method for the microkernel operating system. It has been implemented and tested on our own microkernel operating system named Mginkgo. The test results show that when there is load imbalance in the system, load balancing can be performed automatically so that all processors in the system can try to achieve the maximum throughput and resource utilization. And the latency brought by load balancing to the system is very low, about 4882 cycles (about 6.164us) triggered by new task creation and about 6596 cycles (about 8.328us) triggered by timing. In addition, we also tested the improvement of system throughput and CPU utilization. The results show that load balancing can improve the CPU utilization by 20% under the preset case, while the CPU utilization occupied by load balancing is negligibly low, about 0.0082%.
2020-06-26
M, Raviraja Holla, D, Suma.  2019.  Memory Efficient High-Performance Rotational Image Encryption. 2019 International Conference on Communication and Electronics Systems (ICCES). :60—64.

Image encryption is an essential part of a Visual Cryptography. Existing traditional sequential encryption techniques are infeasible to real-time applications. High-performance reformulations of such methods are increasingly growing over the last decade. These reformulations proved better performances over their sequential counterparts. A rotational encryption scheme encrypts the images in such a way that the decryption is possible with the rotated encrypted images. A parallel rotational encryption technique makes use of a high-performance device. But it less-leverages the optimizations offered by them. We propose a rotational image encryption technique which makes use of memory coalescing provided by the Compute Unified Device Architecture (CUDA). The proposed scheme achieves improved global memory utilization and increased efficiency.

2019-12-05
Leißa, Roland, Boesche, Klaas, Hack, Sebastian, Pérard-Gayot, Arsène, Membarth, Richard, Slusallek, Philipp, Müller, André, Schmidt, Bertil.  2018.  AnyDSL: A Partial Evaluation Framework for Programming High-Performance Libraries. Proc. ACM Program. Lang.. 2:119:1-119:30.

This paper advocates programming high-performance code using partial evaluation. We present a clean-slate programming system with a simple, annotation-based, online partial evaluator that operates on a CPS-style intermediate representation. Our system exposes code generation for accelerators (vectorization/parallelization for CPUs and GPUs) via compiler-known higher-order functions that can be subjected to partial evaluation. This way, generic implementations can be instantiated with target-specific code at compile time. In our experimental evaluation we present three extensive case studies from image processing, ray tracing, and genome sequence alignment. We demonstrate that using partial evaluation, we obtain high-performance implementations for CPUs and GPUs from one language and one code base in a generic way. The performance of our codes is mostly within 10%, often closer to the performance of multi man-year, industry-grade, manually-optimized expert codes that are considered to be among the top contenders in their fields.