Visible to the public Biblio

Filters: Keyword is hardware acceleration  [Clear All Filters]
2023-06-22
Cheng, Xin, Wang, Mei-Qi, Shi, Yu-Bo, Lin, Jun, Wang, Zhong-Feng.  2022.  Magical-Decomposition: Winning Both Adversarial Robustness and Efficiency on Hardware. 2022 International Conference on Machine Learning and Cybernetics (ICMLC). :61–66.
Model compression is one of the most preferred techniques for efficiently deploying deep neural networks (DNNs) on resource- constrained Internet of Things (IoT) platforms. However, the simply compressed model is often vulnerable to adversarial attacks, leading to a conflict between robustness and efficiency, especially for IoT devices exposed to complex real-world scenarios. We, for the first time, address this problem by developing a novel framework dubbed Magical-Decomposition to simultaneously enhance both robustness and efficiency for hardware. By leveraging a hardware-friendly model compression method called singular value decomposition, the defending algorithm can be supported by most of the existing DNN hardware accelerators. To step further, by using a recently developed DNN interpretation tool, the underlying scheme of how the adversarial accuracy can be increased in the compressed model is highlighted clearly. Ablation studies and extensive experiments under various attacks/models/datasets consistently validate the effectiveness and scalability of the proposed framework.
ISSN: 2160-1348
2020-06-12
Grochol, David, Sekanina, Lukas.  2018.  Fast Reconfigurable Hash Functions for Network Flow Hashing in FPGAs. 2018 NASA/ESA Conference on Adaptive Hardware and Systems (AHS). :257—263.

Efficient monitoring of high speed computer networks operating with a 100 Gigabit per second (Gbps) data throughput requires a suitable hardware acceleration of its key components. We present a platform capable of automated designing of hash functions suitable for network flow hashing. The platform employs a multi-objective linear genetic programming developed for the hash function design. We evolved high-quality hash functions and implemented them in a field programmable gate array (FPGA). Several evolved hash functions were combined together in order to form a new reconfigurable hash function. The proposed reconfigurable design significantly reduces the area on a chip while the maximum operation frequency remains very close to the fastest hash functions. Properties of evolved hash functions were compared with the state-of-the-art hash functions in terms of the quality of hashing, area and operation frequency in the FPGA.

2020-05-22
Abdelhadi, Ameer M.S., Bouganis, Christos-Savvas, Constantinides, George A..  2019.  Accelerated Approximate Nearest Neighbors Search Through Hierarchical Product Quantization. 2019 International Conference on Field-Programmable Technology (ICFPT). :90—98.
A fundamental recurring task in many machine learning applications is the search for the Nearest Neighbor in high dimensional metric spaces. Towards answering queries in large scale problems, state-of-the-art methods employ Approximate Nearest Neighbors (ANN) search, a search that returns the nearest neighbor with high probability, as well as techniques that compress the dataset. Product-Quantization (PQ) based ANN search methods have demonstrated state-of-the-art performance in several problems, including classification, regression and information retrieval. The dataset is encoded into a Cartesian product of multiple low-dimensional codebooks, enabling faster search and higher compression. Being intrinsically parallel, PQ-based ANN search approaches are amendable for hardware acceleration. This paper proposes a novel Hierarchical PQ (HPQ) based ANN search method as well as an FPGA-tailored architecture for its implementation that outperforms current state of the art systems. HPQ gradually refines the search space, reducing the number of data compares and enabling a pipelined search. The mapping of the architecture on a Stratix 10 FPGA device demonstrates over ×250 speedups over current state-of-the-art systems, opening the space for addressing larger datasets and/or improving the query times of current systems.
2020-02-10
Zhang, Junjie, Sun, Tianfu.  2019.  Multi-core Heterogeneous Video Processing System Design. 2019 IEEE 13th International Conference on Anti-counterfeiting, Security, and Identification (ASID). :178–182.
In order to accelerate the image processing speed, in this paper, a multi-core heterogeneous computing technology based on the Xilinx Zynq platform is proposed. The proposed technique could accelerate the real-time video image processing system through hardware acceleration. In order to verify the proposed technique, an Otsu binarized hardware-accelerated IP is designed in FPGA and interacts with ARM through the AXI bus. Compared with the existing homogeneous architecture processor computing, the image processing speed of the proposed technique with multi-core heterogeneous acceleration processing is significantly accelerated.
2018-05-01
Cowart, R., Coe, D., Kulick, J., Milenković, A..  2017.  An Implementation and Experimental Evaluation of Hardware Accelerated Ciphers in All-Programmable SoCs. Proceedings of the SouthEast Conference. :34–41.
The protection of confidential information has become very important with the increase of data sharing and storage on public domains. Data confidentiality is accomplished through the use of ciphers that encrypt and decrypt the data to impede unauthorized access. Emerging heterogeneous platforms provide an ideal environment to use hardware acceleration to improve application performance. In this paper, we explore the performance benefits of an AES hardware accelerator versus the software implementation for multiple cipher modes on the Zynq 7000 All-Programmable System-on-a-Chip (SoC). The accelerator is implemented on the FPGA fabric of the SoC and utilizes DMA for interfacing to the CPU. File encryption and decryption of varying file sizes are used as the workload, with execution time and throughput as the metrics for comparing the performance of the hardware and software implementations. The performance evaluations show that the accelerated AES operations achieve a speedup of 7 times relative to its software implementation and throughput upwards of 350 MB/s for the counter cipher mode, and modest improvements for other cipher modes.
2018-04-30
Cowart, R., Coe, D., Kulick, J., Milenković, A..  2017.  An Implementation and Experimental Evaluation of Hardware Accelerated Ciphers in All-Programmable SoCs. Proceedings of the SouthEast Conference. :34–41.

The protection of confidential information has become very important with the increase of data sharing and storage on public domains. Data confidentiality is accomplished through the use of ciphers that encrypt and decrypt the data to impede unauthorized access. Emerging heterogeneous platforms provide an ideal environment to use hardware acceleration to improve application performance. In this paper, we explore the performance benefits of an AES hardware accelerator versus the software implementation for multiple cipher modes on the Zynq 7000 All-Programmable System-on-a-Chip (SoC). The accelerator is implemented on the FPGA fabric of the SoC and utilizes DMA for interfacing to the CPU. File encryption and decryption of varying file sizes are used as the workload, with execution time and throughput as the metrics for comparing the performance of the hardware and software implementations. The performance evaluations show that the accelerated AES operations achieve a speedup of 7 times relative to its software implementation and throughput upwards of 350 MB/s for the counter cipher mode, and modest improvements for other cipher modes.

2017-10-18
Ahmad, Abdul Mutaal, Lukowicz, Paul, Cheng, Jingyuan.  2016.  FPGA Based Hardware Acceleration of Sensor Matrix. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct. :793–802.
This paper describes the hardware acceleration of various feature calculation functions used in activity recognition. In this work we have used a large scale sensing matrix which recognizes and counts gym exercises. Human activity is played on pressure matrix and the sensor data is sent to computer using a wired protocol for further processing. The recorded data from matrix is huge making it impractical to process on a smart phone. We propose a FPGA (Field Programmable Gate Array) based processing methodology which not only accelerates sensing data processing but also reduces the size of 2D sensor data matrix to 10 features. The resultant feature set can be transferred using wireless medium to a smart phone or other processing unit where the classification can be done. Our system takes a matrix of arbitrary size and output a 'features' set for each matrix frame. We used HLS (High Level Synthesis), an approach to write algorithm for FPGA using SystemC/C/C++ instead of traditional VHDL/Verilog. Results show promising improvement in processing time as compared to Matlab. Since the size of data is reduced, wireless medium can be use to transmit data. Additionally, the development time for FPGA designs is greatly reduced due to the usage of an abstracted high level synthesis approach. This system is currently developed for pressure sensing system but this strategy can be applied to other sensing application like temperature sensor grid.