Hardware Design of Polynomial Multiplication for Byte-Level Ring-LWE Based Cryptosystem

Submitted by aekwall on Mon, 03/15/2021 - 11:10am

Title	Hardware Design of Polynomial Multiplication for Byte-Level Ring-LWE Based Cryptosystem
Publication Type	Conference Paper
Year of Publication	2020
Authors	Khuchit, U., Wu, L., Zhang, X., Yin, Y., Batsukh, A., Mongolyn, B., Chinbat, M.
Conference Name	2020 IEEE 14th International Conference on Anti-counterfeiting, Security, and Identification (ASID)
Date Published	oct
Keywords	BRAMs, byte-level modulus, byte-level ring-LWE based cryptosystem, compiler security, compositionality, computational time-consuming block, cryptography, DSPs, field programmable gate arrays, Hardware, high level synthesis, high-level synthesis based hardware design methodology, ideal lattice, LAC, lattice-based cryptography, learning (artificial intelligence), logic design, Metrics, multiplication core, NIST, NIST PQC Standardization Process, polynomial multiplication, polynomials, post quantum cryptography, program compilers, pubcrawl, Resiliency, ring learning with error problem, ring LWE, Scalability, Software algorithms, Table lookup, time 4.3985 ns, time 5.052 ns, time 5.133 ns, Timing, Vivado HLS compiler, Xilinx Artix-7 family FPGA
Abstract	An ideal lattice is defined over a ring learning with errors (Ring-LWE) problem. Polynomial multiplication over the ring is the most computational and time-consuming block in lattice-based cryptography. This paper presents the first hardware design of the polynomial multiplication for LAC, one of the Round-2 candidates of the NIST PQC Standardization Process, which has byte-level modulus p=251. The proposed architecture supports polynomial multiplications for different degree n (n=512/1024/2048). For designing the scheme, we used the Vivado HLS compiler, a high-level synthesis based hardware design methodology, which is able to optimize software algorithms into actual hardware products. The design of the scheme takes 274/280/291 FFs and 204/217/208 LUTs on the Xilinx Artix-7 family FPGA, requested by NIST PQC competition for hardware implementation. Multiplication core uses only 1/1/2 pieces of 18Kb BRAMs, 1/1/1 DSPs, and 90/94/95 slices on the board. Our timing result achieved in an alternative degree n with 5.052/4.3985/5.133ns.
DOI	10.1109/ASID50160.2020.9271725
Citation Key	khuchit_hardware_2020

Groups:

Science of Security VO