Title | Hardware Design of Polynomial Multiplication for Byte-Level Ring-LWE Based Cryptosystem |
Publication Type | Conference Paper |
Year of Publication | 2020 |
Authors | Khuchit, U., Wu, L., Zhang, X., Yin, Y., Batsukh, A., Mongolyn, B., Chinbat, M. |
Conference Name | 2020 IEEE 14th International Conference on Anti-counterfeiting, Security, and Identification (ASID) |
Date Published | oct |
Keywords | BRAMs, byte-level modulus, byte-level ring-LWE based cryptosystem, compiler security, compositionality, computational time-consuming block, cryptography, DSPs, field programmable gate arrays, Hardware, high level synthesis, high-level synthesis based hardware design methodology, ideal lattice, LAC, lattice-based cryptography, learning (artificial intelligence), logic design, Metrics, multiplication core, NIST, NIST PQC Standardization Process, polynomial multiplication, polynomials, post quantum cryptography, program compilers, pubcrawl, Resiliency, ring learning with error problem, ring LWE, Scalability, Software algorithms, Table lookup, time 4.3985 ns, time 5.052 ns, time 5.133 ns, Timing, Vivado HLS compiler, Xilinx Artix-7 family FPGA |
Abstract | An ideal lattice is defined over a ring learning with errors (Ring-LWE) problem. Polynomial multiplication over the ring is the most computational and time-consuming block in lattice-based cryptography. This paper presents the first hardware design of the polynomial multiplication for LAC, one of the Round-2 candidates of the NIST PQC Standardization Process, which has byte-level modulus p=251. The proposed architecture supports polynomial multiplications for different degree n (n=512/1024/2048). For designing the scheme, we used the Vivado HLS compiler, a high-level synthesis based hardware design methodology, which is able to optimize software algorithms into actual hardware products. The design of the scheme takes 274/280/291 FFs and 204/217/208 LUTs on the Xilinx Artix-7 family FPGA, requested by NIST PQC competition for hardware implementation. Multiplication core uses only 1/1/2 pieces of 18Kb BRAMs, 1/1/1 DSPs, and 90/94/95 slices on the board. Our timing result achieved in an alternative degree n with 5.052/4.3985/5.133ns. |
DOI | 10.1109/ASID50160.2020.9271725 |
Citation Key | khuchit_hardware_2020 |