Visible to the public Biblio

Filters: Keyword is Performance optimization  [Clear All Filters]
2020-11-02
Shen, Hanji, Long, Chun, Li, Jun, Wan, Wei, Song, Xiaofan.  2018.  A Method for Performance Optimization of Virtual Network I/O Based on DPDK-SRIOV*. 2018 IEEE International Conference on Information and Automation (ICIA). :1550—1554.
Network security testing devices play important roles in Cyber security. Most of the current network security testing devices are based on proprietary hardware, however, the virtual network security tester needs high network I/O throughput performance. Therefore, the solution of the problem, which provides high-performance network I/O in the virtual scene will be explained in this paper. The method we proposed for virtualized network I/O performance optimization on a general hardware platform is able to achieve the I/O throughput performance of the proprietary hardware. The Single Root I/O Virtualization (SRIOV) of the physical network card is divided into a plurality of virtual network function of VF, furthermore, it can be added to different VF and VM. Extensive experiment illustrated that the virtualization and the physical network card sharing based on hardware are realized, and they can be used by Data Plane Development Kit (DPDK) and SRIOV technology. Consequently, the test instrument applications in virtual machines achieves the rate of 10Gps and meet the I/O requirement.
2020-05-15
Fan, Renshi, Du, Gaoming, Xu, Pengfei, Li, Zhenmin, Song, Yukun, Zhang, Duoli.  2019.  An Adaptive Routing Scheme Based on Q-learning and Real-time Traffic Monitoring for Network-on-Chip. 2019 IEEE 13th International Conference on Anti-counterfeiting, Security, and Identification (ASID). :244—248.
In the Network on Chip (NoC), performance optimization has always been a research focus. Compared with the static routing scheme, dynamical routing schemes can better reduce the data of packet transmission latency under network congestion. In this paper, we propose a dynamical Q-learning routing approach with real-time monitoring of NoC. Firstly, we design a real-time monitoring scheme and the corresponding circuits to record the status of traffic congestion for NoC. Secondly, we propose a novel method of Q-learning. This method finds an optimal path based on the lowest traffic congestion. Finally, we dynamically redistribute network tasks to increase the packet transmission speed and balance the traffic load. Compared with the C-XY routing and DyXY routing, our method achieved improvement in terms of 25.6%-49.5% and 22.9%-43.8%.
2019-01-16
Hasslinger, G., Ntougias, K., Hasslinger, F., Hohlfeld, O..  2018.  Comparing Web Cache Implementations for Fast O(1) Updates Based on LRU, LFU and Score Gated Strategies. 2018 IEEE 23rd International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD). :1–7.
To be applicable to high user request workloads, web caching strategies benefit from low implementation and update effort. In this regard, the Least Recently Used (LRU) replacement principle is a simple and widely-used method. Despite its popularity, LRU has deficits in the achieved hit rate performance and cannot consider transport and network optimization criteria for selecting content to be cached. As a result, many alternatives have been proposed in the literature, which improve the cache performance at the cost of higher complexity. In this work, we evaluate the implementation complexity and runtime performance of LRU, Least Frequently Used (LFU), and score based strategies in the class of fast O(1) updates with constant effort per request. We implement Window LFU (W-LFU) within this class and show that O(1) update effort can be achieved. We further compare fast update schemes of Score Gated LRU and new Score Gated Polling (SGP). SGP is simpler than LRU and provides full flexibility for arbitrary score assessment per data object as information basis for performance optimization regarding network cost and quality measures.
2018-02-21
Liu, M., Yan, Y. J., Li, W..  2017.  Implementation and optimization of A5-1 algorithm on coarse-grained reconfigurable cryptographic logic array. 2017 IEEE 12th International Conference on ASIC (ASICON). :279–282.

A5-1 algorithm is a stream cipher used to encrypt voice data in GSM, which needs to be realized with high performance due to real-time requirements. Traditional implementation on FPGA or ASIC can't obtain a trade-off among performance, cost and flexibility. To this aim, this paper introduces CGRCA to implement A5-1, and in order to optimize the performance and resource consumption, this paper proposes a resource-based path seeking (RPS) algorithm to develop an advanced implementation. Experimental results show that final optimal throughput of A5-1 implemented on CGRCA is 162.87Mbps when the frequency is 162.87MHz, and the set-up time is merely 87 cycles, which is optimal among similar works.

2017-05-16
Anh, Pham Nguyen Quang, Fan, Rui, Wen, Yonggang.  2016.  Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication. Proceedings of the 2016 International Conference on Supercomputing. :36:1–36:12.

General sparse matrix-matrix multiplication (SpGEMM) is a core component of many algorithms. A number of recent works have used high throughput graphics processing units (GPUs) to accelerate SpGEMM. However, exploiting the power of GPUs for SpGEMM requires addressing a number of challenges, including highly imbalanced workloads and large numbers of inefficient random global memory accesses. This paper presents a SpGEMM algorithm which uses several novel techniques to overcome these problems. We first propose two low cost methods to achieve perfect load balancing during the most expensive step in SpGEMM. Next, we show how to eliminate nearly all random global memory accesses using shared memory based hash tables. To optimize the performance of the hash tables, we propose a lightweight method to estimate the number of nonzeros in the output matrix. We compared our algorithm to the CUSP, CUSPARSE and the state-of-the-art BHSPARSE GPU SpGEMM algorithms, and show that it performs 5.6x, 2.4x and 1.5x better on average, and up to 11.8x, 9.5x and 2.5x better in the best case, respectively. Furthermore, we show that our algorithm performs especially well on highly imbalanced and unstructured matrices.