Biblio
Cooperation of software and hardware with hybrid architectures, such as Xilinx Zynq SoC combining ARM CPU and FPGA fabric, is a high-performance and low-power platform for accelerating RSA Algorithm. This paper adopts the none-subtraction Montgomery algorithm and the Chinese Remainder Theorem (CRT) to implement high-speed RSA processors, and deploys a 48-node cluster infrastructure based on Zynq SoC to achieve extremely high scalability and throughput of RSA computing. In this design, we use the ARM to implement node-to-node communication with the Message Passing Interface (MPI) while use the FPGA to handle complex calculation. Finally, the experimental results show that the overall performance is linear with the number of nodes. And the cluster achieves 6× 9× speedup against a multi-core desktop (Intel i7-3770) and comparable performance to a many-core server (288-core). In addition, we gain up to 2.5× energy efficiency compared to these two traditional platforms.
The deceleration of transistor feature size scaling has motivated growing adoption of specialized accelerators implemented as GPUs, FPGAs, ASICs, and more recently new types of computing such as neuromorphic, bio-inspired, ultra low energy, reversible, stochastic, optical, quantum, combinations, and others unforeseen. There is a tension between specialization and generalization, with the current state trending to master slave models where accelerators (slaves) are instructed by a general purpose system (master) running an Operating System (OS). Traditionally, an OS is a layer between hardware and applications and its primary function is to manage hardware resources and provide a common abstraction to applications. Does this function, however, apply to new types of computing paradigms? This paper revisits OS functionality for memristor-based accelerators. We explore one accelerator implementation, the Dot Product Engine (DPE), for a select pattern of applications in machine learning, imaging, and scientific computing and a small set of use cases. We explore typical OS functionality, such as reconfiguration, partitioning, security, virtualization, and programming. We also explore new types of functionality, such as precision and trustworthiness of reconfiguration. We claim that making an accelerator, such as the DPE, more general will result in broader adoption and better utilization.
Authentication and encryption within an embedded system environment using cameras, sensors, thermostats, autonomous vehicles, medical implants, RFID, etc. is becoming increasing important with ubiquitious wireless connectivity. Hardware-based authentication and encryption offer several advantages in these types of resource-constrained applications, including smaller footprints and lower energy consumption. Bitstring and key generation implemented with Physical Unclonable Functions or PUFs can further reduce resource utilization for authentication and encryption operations and reduce overall system cost by eliminating on-chip non-volatile-memory (NVM). In this paper, we propose a dynamic partial reconfiguration (DPR) strategy for implementing both authentication and encryption using a PUF for bitstring and key generation on FPGAs as a means of optimizing the utilization of the limited area resources. We show that the time and energy penalties associated with DPR are small in modern SoC-based architectures, such as the Xilinx Zynq SoC, and therefore, the overall approach is very attractive for emerging resource-constrained IoT applications.
Internet-connected embedded systems have limited capabilities to defend themselves against remote hacking attacks. The potential effects of such attacks, however, can have a significant impact in the context of the Internet of Things, industrial control systems, smart health systems, etc. Embedded systems cannot effectively utilize existing software-based protection mechanisms due to limited processing capabilities and energy resources. We propose a novel hardware-based monitoring technique that can detect if the embedded operating system or any running application deviates from the originally programmed behavior due to an attack. We present an FPGA-based prototype implementation that shows the effectiveness of such a security approach.
In today's world, we are surrounded by variety of computer vision applications e.g. medical imaging, bio-metrics, security, surveillance and robotics. Most of these applications require real time processing of a single image or sequence of images. This real time image/video processing requires high computational power and specialized hardware architecture and can't be achieved using general purpose CPUs. In this paper, a FPGA based generic canny edge detector is introduced. Edge detection is one of the basic steps in image processing, image analysis, image pattern recognition, and computer vision. We have implemented a re-sizable canny edge detector IP on programmable logic (PL) of PYNQ-Platform. The IP is integrated with HDMI input/output blocks and can process 1080p input video stream at 60 frames per second. As mentioned the canny edge detection IP is scalable with respect to frame size i.e. depending on the input frame size, the hardware architecture can be scaled up or down by changing the template parameters. The offloading of canny edge detection from PS to PL causes the CPU usage to drop from about 100% to 0%. Moreover, hardware based edge detector runs about 14 times faster than the software based edge detector running on Cortex-A9 ARM processor.
The size of counterfeiting activities is increasing day by day. These activities are encountered especially in electronics market. In this paper, a countermeasure against counterfeiting on intellectual properties (IP) on Field-Programmable Gate Arrays (FPGA) is proposed. FPGA vendors provide bitstream ciphering as an IP security solution such as battery-backed or non-volatile FPGAs. However, these solutions are secure as long as they can keep decryption key away from third parties. Key storage and key transfer over unsecure channels expose risks for these solutions. In this work, physical unclonable functions (PUFs) have been used for key generation. Generating a key from a circuit in the device solves key transfer problem. Proposed system goes through different phases when it operates. Therefore, partial reconfiguration feature of FPGAs is essential for feasibility of proposed system.
The trend in computing is towards the use of FPGAs to improve performance at reduced costs. An indication of this is the adoption of FPGAs for data centre and server application acceleration by notable technological giants like Microsoft, Amazon, and Baidu. The continued protection of Intellectual Properties (IPs) on the FPGA has thus become both more important and challenging. To facilitate IP security, FPGA vendors have provided bitstream authentication and encryption. However, advancements in FPGA programming technology have engendered a bitstream manipulation technique like partial bitstream relocation (PBR), which is promising in terms of reducing bitstream storage cost and facilitating adaptability. Meanwhile, encrypted bitstreams are not amenable to PBR. In this paper, we present three methods for performing encrypted PBR with varying overheads of resources and time. These methods ensure that PBR can be applied to bitstreams without losing the protection of IPs.
QR codes, intended for maximum accessibility are widely in use these days and can be scanned readily by mobile phones. Their ease of accessibility makes them vulnerable to attacks and tampering. Certain scenarios require a QR code to be accessed by a group of users only. This is done by making the QR code cryptographically secure with the help of a password (key) for encryption and decryption. Symmetric key algorithms like AES requires the sender and the receiver to have a shared secret key. However, the whole motive of security fails if the shared key is not secure enough. Therefore, in our design we secure the key, which is a grey image using RSA algorithm. In this paper, FPGA implementation of 1024 bit RSA encryption and decryption is presented. For encryption, computation of modular exponentiation for 1024 bit size with accuracy and efficiency is needed and it is carried out by repeated modular multiplication technique. For decryption, L-R binary approach is used which deploys modular multiplication module. Efficiency in our design is achieved in terms of throughput/area ratio as compared to existing implementations. QR codes security is demonstrated by deploying AES-RSA hybrid design in Xilinx System Generator(XSG). XSG helps in hardware co-simulation and reduces the difficulty in structural design. Further, to ensure efficient encryption of the shared key by RSA, histograms of the images of key before and after encryption are generated and analysed for strength of encryption.
This work presents the proof of concept implementation for the first hardware-based design of Moving Target Defense over IPv6 (MT6D) in full Register Transfer Level (RTL) logic, with future sights on an embedded Application-Specified Integrated Circuit (ASIC) implementation. Contributions are an IEEE 802.3 Ethernet stream-based in-line network packet processor with a specialized Complex Instruction Set Computer (CISC) instruction set architecture, RTL-based Network Time Protocol v4 synchronization, and a modular crypto engine. Traditional static network addressing allows attackers the incredible advantage of taking time to plan and execute attacks against a network. To counter, MT6D provides a network host obfuscation technique that offers network-based keyed access to specific hosts without altering existing network infrastructure and is an excellent technique for protecting the Internet of Things, IPv6 over Low Power Wireless Personal Area Networks, and high value globally routable IPv6 interfaces. This is done by crypto-graphically altering IPv6 network addresses every few seconds in a synchronous manner at all endpoints. A border gateway device can be used to intercept select packets to unobtrusively perform this action. Software driven implementations have posed many challenges, namely, constant code maintenance to remain compliant with all library and kernel dependencies, the need for a host computing platform, and less than optimal throughput. This work seeks to overcome these challenges in a lightweight system to be developed for practical wide deployment.
In this paper, we propose a hardware-based defense system in Software-Defined Networking architecture to protect against the HTTP GET Flooding attacks, one of the most dangerous Distributed Denial of Service (DDoS) attacks in recent years. Our defense system utilizes per-URL counting mechanism and has been implemented on FPGA as an extension of a NetFPGA-based OpenFlow switch.
In this paper a random number generation method based on a piecewise linear one dimensional (PL1D) discrete time chaotic maps is proposed for applications in cryptography and steganography. Appropriate parameters are determined by examining the distribution of underlying chaotic signal and random number generator (RNG) is numerically verified by four fundamental statistical test of FIPS 140-2. Proposed design is practically realized on the field programmable analog and digital arrays (FPAA-FPGA). Finally it is experimentally verified that the presented RNG fulfills the NIST 800-22 randomness test without post processing.
To enhance the encryption and anti-translation capability of the information, we constructed a five-dimensional chaotic system. Combined with the Lü system, a time-switched system with multiple chaotic attractors is realized in the form of a digital circuit. Some characteristics of the five-dimensional system are analyzed, such as Poincare mapping, the Lyapunov exponent spectrum, and bifurcation diagram. The analysis shows that the system exhibits chaotic characteristics for a wide range of parameter values. We constructed a time-switched expression between multiple chaotic attractors using the communication between a microcontroller unit (MCU) and field programmable gate array (FPGA). The system can quickly switch between different chaotic attractors within the chaotic system and between chaotic systems at any time, leading to signal sources with more variability, diversity, and complexity for chaotic encryption.
We propose to use a genetic algorithm to evolve novel reconfigurable hardware to implement elliptic curve cryptographic combinational logic circuits. Elliptic curve cryptography offers high security-level with a short key length making it one of the most popular public-key cryptosystems. Furthermore, there are no known sub-exponential algorithms for solving the elliptic curve discrete logarithm problem. These advantages render elliptic curve cryptography attractive for incorporating in many future cryptographic applications and protocols. However, elliptic curve cryptography has proven to be vulnerable to non-invasive side-channel analysis attacks such as timing, power, visible light, electromagnetic, and acoustic analysis attacks. In this paper, we use a genetic algorithm to address this vulnerability by evolving combinational logic circuits that correctly implement elliptic curve cryptographic hardware that is also resistant to simple timing and power analysis attacks. Using a fitness function composed of multiple objectives - maximizing correctness, minimizing propagation delays and minimizing circuit size, we can generate correct combinational logic circuits resistant to non-invasive, side channel attacks. To the best of our knowledge, this is the first work to evolve a cryptography circuit using a genetic algorithm. We implement evolved circuits in hardware on a Xilinx Kintex-7 FPGA. Results reveal that the evolutionary algorithm can successfully generate correct, and side-channel resistant combinational circuits with negligible propagation delay.
Public-key cryptography schemes are widely used due to their high level of security. As a very efficient one among public-key cryptosystems, elliptic curve cryptography (ECC) has been studied for years. Researchers used to improve the efficiency of ECC through point multiplication, which is the most important and complex operation of ECC. In our research, we use special families of curves and prime fields which have special properties. After that, we introduce the instruction set architecture (ISA) extension method to accelerate this algorithm (192-bit private key) and build an ECC\_ASIP model with six new ECC custom instructions. Finally, the ECC\_ASIP model is implemented in a field-programmable gate array (FPGA) platform. The persuasive experiments have been conducted to evaluate the performance of our new model in the aspects of the performance, the code storage space and hardware resources. Experimental results show that our processor improves 69.6% in the execution efficiency and requires only 6.2% more hardware resources.
Lightweight block ciphers, which are required for IoT devices, have attracted attention. Simeck, which is one of the most popular lightweight block ciphers, can be implemented on IoT devices in the smallest area. Regarding the hardware security, the threat of electromagnetic analysis has been reported. However, electromagnetic analysis of Simeck has not been reported. Therefore, this study proposes a dedicated electromagnetic analysis for a lightweight block cipher Simeck to ensure the safety of IoT devices in the future. To our knowledge, this is the first electromagnetic analysis for Simeck. Experiments using a FPGA prove the validity of the proposed method.
Cyber-physical system integrity requires both hardware and software security. Many of the cyber attacks are successful as they are designed to selectively target a specific hardware or software component in an embedded system and trigger its failure. Existing security measures also use attack vector models and isolate the malicious component as a counter-measure. Isolated security primitives do not provide the overall trust required in an embedded system. Trust enhancements are proposed to a hardware security platform, where the trust specifications are implemented in both software and hardware. This distribution of trust makes it difficult for a hardware-only or software-only attack to cripple the system. The proposed approach is applied to a smart grid application consisting of third-party soft IP cores, where an attack on this module can result in a blackout. System integrity is preserved in the event of an attack and the anomalous behavior of the IP core is recorded by a supervisory module. The IP core also provides a snapshot of its trust metric, which is logged for further diagnostics.
In the near future, billions of new smart devices will connect the big network of the Internet of Things, playing an important key role in our daily life. Allowing IPv6 on the low-power resource constrained devices will lead research to focus on novel approaches that aim to improve the efficiency, security and performance of the 6LoWPAN adaptation layer. This work in progress paper proposes a hardware-based Network Packet Filtering (NPF) and an IPv6 Link-local address calculator which is able to filter the received IPv6 packets, offering nearly 18% overhead reduction. The goal is to obtain a System-on-Chip implementation that can be deployed in future IEEE 802.15.4 radio modules.
Radio-frequency identification (RFID) are becoming a part of our everyday life with a wide range of applications such as labeling products and supply chain management and etc. These smart and tiny devices have extremely constrained resources in terms of area, computational abilities, memory, and power. At the same time, security and privacy issues remain as an important problem, thus with the large deployment of low resource devices, increasing need to provide security and privacy among such devices, has arisen. Resource-efficient cryptographic incipient become basic for realizing both security and efficiency in constrained environments and embedded systems like RFID tags and sensor nodes. Among those primitives, lightweight block cipher plays a significant role as a building block for security systems. In 2014 Manoj Kumar et al proposed a new Lightweight block cipher named as FeW, which are suitable for extremely constrained environments and embedded systems. In this paper, we simulate and synthesize the FeW block cipher. Implementation results of the FeW cryptography algorithm on a FPGA are presented. The design target is efficiency of area and cost.
The vulnerabilities in today's supply chain have raised serious concerns about the security and trustworthiness of electronic components and systems. Testing for device provenance, detection of counterfeit integrated circuits/systems, and traceability are challenging issues to address. In this paper, we develop a novel RFID-based system suitable for electronic component and system Counterfeit detection and System Traceability called CST. CST is composed of different types of on-chip sensors and in-system structures that provide the information needed to detect multiple counterfeit IC types (recycled, cloned, etc.), verify the authenticity of the system with some degree of confidence, and track/identify boards. Central to CST is an RFID tag employed as storage and a channel to read the information from different types of chips on the printed circuit board (PCB) in both power-off and power-on scenarios. Simulations and experimental results using Spartan 3E FPGAs demonstrate the effectiveness of this system. The efficiency of the radio frequency (RF) communication has also been verified via a PCB prototype with a printed slot antenna.
Soft microprocessors are vital components of many embedded FPGA systems. As the application domain for FPGAs expands, the security of the software used by soft processors increases in importance. Although software confidentiality approaches (e.g. encryption) are effective, code obfuscation is known to be an effective enhancement that further deters code understanding for attackers. The availability of specialization in FPGAs provides a unique opportunity for code obfuscation on a per-application basis with minimal hardware overhead. In this paper we describe a new technique to obfuscate soft microprocessor code which is located outside the FPGA chip in an unprotected area. Our approach provides customizable, data-dependent control flow modification to make it difficult for attackers to easily understand program behavior. The application of the approach to three benchmarks illustrates a control flow cyclomatic complexity increase of about 7× with a modest logic overhead for the soft processor.