Biblio
This paper presents some verifications and improved considerations of NAND PUF, which was introduced recently [1]. For embedded system such as IC cards, the secret data in memory is vulnerable, so it has to be encrypted and secured. PUF circuit is sensitive to environmental condition, especially in the temperature range influences and variations of current and voltages. This proposed bank IC card would be operated in AB class standard, i.e. voltage would be constant except for power mode changing. Nevertheless, operational temperatures may vary such as the situation of outdoor ATM. Thus, this paper presented some results of our PUF work in Cadence, also on FPGA board. Around 5ns is spent for stabilization of our PUF output that is under variance temperature when power mode changes. Inter Hamming distances is 48.9%, very near to uniqueness and robustness value, that our PUF is feasible to use in bankcard. The maximum error rates are HDintra(0$^\circ$C) = 3.9961 and HDintra(80$^\circ$C) = 3.9916 where at antipoles, while the minimum error rate is HDintra(20$^\circ$C) = 2.9 at room temperature. For improvement, Repetition, LDPC and SEC-DED codes are considered that would eliminate error rates.
Advancements in semiconductor domain gave way to realize numerous applications in Video Surveillance using Computer vision and Deep learning, Video Surveillances in Industrial automation, Security, ADAS, Live traffic analysis etc. through image understanding improves efficiency. Image understanding requires input data with high precision which is dependent on Image resolution and location of camera. The data of interest can be thermal image or live feed coming for various sensors. Composite(CVBS) is a popular video interface capable of streaming upto HD(1920x1080) quality. Unlike high speed serial interfaces like HDMI/MIPI CSI, Analog composite video interface is a single wire standard supporting longer distances. Image understanding requires edge detection and classification for further processing. Sobel filter is one the most used edge detection filter which can be embedded into live stream. This paper proposes Zynq FPGA based system design for video surveillance with Sobel edge detection, where the input Composite video decoded (Analog CVBS input to YCbCr digital output), processed in HW and streamed to HDMI display simultaneously storing in SD memory for later processing. The HW design is scalable for resolutions from VGA to Full HD for 60fps and 4K for 24fps. The system is built on Xilinx ZC702 platform and TVP5146 to showcase the functional path.
In spite of numerous advantages of biometrics-based personal authentication systems over traditional security systems based on token or knowledge, they are vulnerable to attacks that can decrease their security considerably. In this paper, we propose a new hardware solution to protect biometric templates such as fingerprint. The proposed scheme is based on chaotic N × N grid multi-scroll system and it is implemented on Xilinx FPGA. The hardware implementation is achieved by applying numerical solution methods in our study, we use EM (Euler Method). Simulation and experimental results show that the proposed scheme allows a low cost image encryption for embedded systems while still providing a good trade-off between performance and hardware resources. Indeed, security analysis performed to the our scheme, is strong against known different attacks, such as: brute force, statistical, differential, and entropy. Therefore, the proposed chaos-based multiscroll encryption algorithm is suitable for use in securing embedded biometric systems.
Hardware Trojans, implantable at a myriad of points within the supply chain, are difficult to detect and identify. By emulating systems on programmable hardware, the authors have created a tool from which to create and evaluate Trojan attack signatures and therefore enable better Trojan detection (for in-service systems) and prevention (for in-design systems).
The Internet of Things (IoT) holds great potential for productivity, quality control, supply chain efficiencies and overall business operations. However, with this broader connectivity, new vulnerabilities and attack vectors are being introduced, increasing opportunities for systems to be compromised by hackers and targeted attacks. These vulnerabilities pose severe threats to a myriad of IoT applications within areas such as manufacturing, healthcare, power and energy grids, transportation and commercial building management. While embedded OEMs offer technologies, such as hardware Trusted Platform Module (TPM), that deploy strong chain-of-trust and authentication mechanisms, still they struggle to protect against vulnerabilities introduced by vendors and end users, as well as additional threats posed by potential technical vulnerabilities and zero-day attacks. This paper proposes a pro-active policy-based approach, enforcing the principle of least privilege, through hardware Security Policy Engine (SPE) that actively monitors communication of applications and system resources on the system communication bus (ARM AMBA-AXI4). Upon detecting a policy violation, for example, a malicious application accessing protected storage, it counteracts with predefined mitigations to limit the attack. The proposed SPE approach widely complements existing embedded hardware and software security technologies, targeting the mitigation of risks imposed by unknown vulnerabilities of embedded applications and protocols.
The dynamically changing landscape of DDoS threats increases the demand for advanced security solutions. The rise of massive IoT botnets enables attackers to mount high-intensity short-duration ”volatile ephemeral” attack waves in quick succession. Therefore the standard human-in-the-loop security center paradigm is becoming obsolete. To battle the new breed of volatile DDoS threats, the intrusion detection system (IDS) needs to improve markedly, at least in reaction times and in automated response (mitigation). Designing such an IDS is a daunting task as network operators are traditionally reluctant to act - at any speed - on potentially false alarms. The primary challenge of a low reaction time detection system is maintaining a consistently low false alarm rate. This paper aims to show how a practical FPGA-based DDoS detection and mitigation system can successfully address this. Besides verifying the model and algorithms with real traffic ”in the wild”, we validate the low false alarm ratio. Accordingly, we describe a methodology for determining the false alarm ratio for each involved threat type, then we categorize the causes of false detection, and provide our measurement results. As shown here, our methods can effectively mitigate the volatile ephemeral DDoS attacks, and accordingly are usable both in human out-of-loop and on-the-loop next-generation security solutions.
This paper describes the work done to design a SoC platform for real-time on-line pattern search in TCP packets for Deep Packet Inspection (DPI) applications. The platform is based on a Xilinx Zynq programmable SoC and includes an accelerator that implements a pattern search engine that extends the original Boyer-Moore algorithm with timing and logical rules, that produces a very complex set of rules. Also, the platform implements different modes of operation, including SIMD and MISD parallelism, which can be configured on-line. The platform is scalable depending of the analysis requirement up to 8 Gbps. High-Level synthesis and platform based design methodologies have been used to reduce the time to market of the completed system.
Deep Packet Inspection systems such as Snort and Bro express complex rules with regular expressions. In Snort, the search of a regular expression is performed with a Non-deterministic Finite Automaton (NFA). Traversing an NFA sequentially with a CPU is not deterministic in time, and it can be very time consuming. The sequential traversal of an NFA with a CPU is not deterministic in time consequently it can be time consuming. A fully parallel NFA implemented in hardware can search all rules, but most of the time only a small part is active. Furthermore, a string filter determines the traversal of an NFA. This paper proposes an FPGA Intermediate Fabric that can efficiently search regular expressions. The architecture is configured for a specific NFA based on a partial match of a rule found by the string filter. It can thus support all rules from a set such as Snort, while significantly reduce compute resources and power con-sumption compared to a fully parallel implementation. Multiple parameters can be selected to find the best tradeoff between resource consumption and the number and types of supported expressions. This architecture was implemented on a Xilinx R XC7VX1140 Virtex-7. The reported implementation, can sustain up to 512 regular expressions, while requiring 2% of the slices and 16% of the BRAM resources, for a throughput of 200 million characters per second.
A high definition(HD) wide dynamic video surveillance system is designed and implemented based on Field Programmable Gate Array(FPGA). This system is composed of three subsystems, which are video capture, video wide dynamic processing and video display subsystem. The images in the video are captured directly through the camera that is configured in a pattern have long exposure in odd frames and short exposure in even frames. The video data stream is buffered in DDR2 SDRAM to obtain two adjacent frames. Later, the image data fusion is completed by fusing the long exposure image with the short exposure image (pixel by pixel). The video image display subsystem can display the image through a HDMI interface. The system is designed on the platform of Lattice ECP3-70EA FPGA, and camera is the Panasonic MN34229 sensor. The experimental result shows that this system can expand dynamic range of the HD video with 30 frames per second and a resolution equal to 1920*1080 pixels by real-time wide dynamic range (WDR) video processing, and has a high practical value.
In this paper, machine learning attacks are performed on a novel hybrid delay based Arbiter Ring Oscillator PUF (AROPUF). The AROPUF exhibits improved results when compared to traditional Arbiter Physical Unclonable Function (APUF). The challenge-response pairs (CRPs) from both PUFs are fed to the multilayered perceptron model (MLP) with one hidden layer. The results show that the CRPs generated from the proposed AROPUF has more training and prediction errors when compared to the APUF, thus making it more difficult for the adversary to predict the CRPs.
Approximate Computing aims at trading off computational accuracy against improvements regarding performance, resource utilization and power consumption by making use of the capability of many applications to tolerate a certain loss of quality. A key issue is the dependency of the impact of approximation on the input data as well as user preferences and environmental conditions. In this context, we therefore investigate the concept of self-adaptive image processing that is able to autonomously adapt 2D-convolution filter operators of different accuracy degrees by means of partial reconfiguration on Field-Programmable-Gate-Arrays (FPGAs). Experimental evaluation shows that the dynamic system is able to better exploit a given error tolerance than any static approximation technique due to its responsiveness to changes in input data. Additionally, it provides a user control knob to select the desired output quality via the metric threshold at runtime.
The Department of Homeland Security Cyber Security Division (CSD) chose Moving Target Defense as one of the fourteen primary Technical Topic Areas pertinent to securing federal networks and the larger Internet. Moving Target Defense over IPv6 (MT6D) employs an obscuration technique offering keyed access to hosts at a network level without altering existing network infrastructure. This is accomplished through cryptographic dynamic addressing, whereby a new network address is bound to an interface every few seconds in a coordinated manner. The goal of this research is to produce a Register Transfer Level (RTL) network security processor implementation to enable the production of an Application Specific Integrated Circuit (ASIC) variant of MT6D processor for wide deployment. RTL development is challenging in that it must provide system level functions that are normally provided by the Operating System's kernel and supported libraries. This paper presents the architectural design of a hardware engine for MT6D (HE-MT6D) and is complete in simulation. Unique contributions are an inline stream-based network packet processor with a Complex Instruction Set Computer (CISC) architecture, Network Time Protocol listener, and theoretical increased performance over previous software implementations.
A5-1 algorithm is a stream cipher used to encrypt voice data in GSM, which needs to be realized with high performance due to real-time requirements. Traditional implementation on FPGA or ASIC can't obtain a trade-off among performance, cost and flexibility. To this aim, this paper introduces CGRCA to implement A5-1, and in order to optimize the performance and resource consumption, this paper proposes a resource-based path seeking (RPS) algorithm to develop an advanced implementation. Experimental results show that final optimal throughput of A5-1 implemented on CGRCA is 162.87Mbps when the frequency is 162.87MHz, and the set-up time is merely 87 cycles, which is optimal among similar works.
The transition effect ring oscillator (TERO) based true random number generator (TRNG) was proposed by Varchola and Drutarovsky in 2010. There were several stochastic models for this advanced TRNG based on ring oscillator. This paper proposed an improved TERO based TRNG and implements both on Altera Cyclone series FPGA platform and on a 0.13um CMOS ASIC process. FPGA experimental results show that this balanced TERO TRNG is in good performance as the experimental data results past the national institute of standards and technology (NIST) test in 1M bit/s. The TRNG is feasible for a security SoC.
Authentication and encryption within an embedded system environment using cameras, sensors, thermostats, autonomous vehicles, medical implants, RFID, etc. is becoming increasing important with ubiquitious wireless connectivity. Hardware-based authentication and encryption offer several advantages in these types of resource-constrained applications, including smaller footprints and lower energy consumption. Bitstring and key generation implemented with Physical Unclonable Functions or PUFs can further reduce resource utilization for authentication and encryption operations and reduce overall system cost by eliminating on-chip non-volatile-memory (NVM). In this paper, we propose a dynamic partial reconfiguration (DPR) strategy for implementing both authentication and encryption using a PUF for bitstring and key generation on FPGAs as a means of optimizing the utilization of the limited area resources. We show that the time and energy penalties associated with DPR are small in modern SoC-based architectures, such as the Xilinx Zynq SoC, and therefore, the overall approach is very attractive for emerging resource-constrained IoT applications.
QR codes, intended for maximum accessibility are widely in use these days and can be scanned readily by mobile phones. Their ease of accessibility makes them vulnerable to attacks and tampering. Certain scenarios require a QR code to be accessed by a group of users only. This is done by making the QR code cryptographically secure with the help of a password (key) for encryption and decryption. Symmetric key algorithms like AES requires the sender and the receiver to have a shared secret key. However, the whole motive of security fails if the shared key is not secure enough. Therefore, in our design we secure the key, which is a grey image using RSA algorithm. In this paper, FPGA implementation of 1024 bit RSA encryption and decryption is presented. For encryption, computation of modular exponentiation for 1024 bit size with accuracy and efficiency is needed and it is carried out by repeated modular multiplication technique. For decryption, L-R binary approach is used which deploys modular multiplication module. Efficiency in our design is achieved in terms of throughput/area ratio as compared to existing implementations. QR codes security is demonstrated by deploying AES-RSA hybrid design in Xilinx System Generator(XSG). XSG helps in hardware co-simulation and reduces the difficulty in structural design. Further, to ensure efficient encryption of the shared key by RSA, histograms of the images of key before and after encryption are generated and analysed for strength of encryption.
This work presents the proof of concept implementation for the first hardware-based design of Moving Target Defense over IPv6 (MT6D) in full Register Transfer Level (RTL) logic, with future sights on an embedded Application-Specified Integrated Circuit (ASIC) implementation. Contributions are an IEEE 802.3 Ethernet stream-based in-line network packet processor with a specialized Complex Instruction Set Computer (CISC) instruction set architecture, RTL-based Network Time Protocol v4 synchronization, and a modular crypto engine. Traditional static network addressing allows attackers the incredible advantage of taking time to plan and execute attacks against a network. To counter, MT6D provides a network host obfuscation technique that offers network-based keyed access to specific hosts without altering existing network infrastructure and is an excellent technique for protecting the Internet of Things, IPv6 over Low Power Wireless Personal Area Networks, and high value globally routable IPv6 interfaces. This is done by crypto-graphically altering IPv6 network addresses every few seconds in a synchronous manner at all endpoints. A border gateway device can be used to intercept select packets to unobtrusively perform this action. Software driven implementations have posed many challenges, namely, constant code maintenance to remain compliant with all library and kernel dependencies, the need for a host computing platform, and less than optimal throughput. This work seeks to overcome these challenges in a lightweight system to be developed for practical wide deployment.
Public-key cryptography schemes are widely used due to their high level of security. As a very efficient one among public-key cryptosystems, elliptic curve cryptography (ECC) has been studied for years. Researchers used to improve the efficiency of ECC through point multiplication, which is the most important and complex operation of ECC. In our research, we use special families of curves and prime fields which have special properties. After that, we introduce the instruction set architecture (ISA) extension method to accelerate this algorithm (192-bit private key) and build an ECC\_ASIP model with six new ECC custom instructions. Finally, the ECC\_ASIP model is implemented in a field-programmable gate array (FPGA) platform. The persuasive experiments have been conducted to evaluate the performance of our new model in the aspects of the performance, the code storage space and hardware resources. Experimental results show that our processor improves 69.6% in the execution efficiency and requires only 6.2% more hardware resources.
Lightweight block ciphers, which are required for IoT devices, have attracted attention. Simeck, which is one of the most popular lightweight block ciphers, can be implemented on IoT devices in the smallest area. Regarding the hardware security, the threat of electromagnetic analysis has been reported. However, electromagnetic analysis of Simeck has not been reported. Therefore, this study proposes a dedicated electromagnetic analysis for a lightweight block cipher Simeck to ensure the safety of IoT devices in the future. To our knowledge, this is the first electromagnetic analysis for Simeck. Experiments using a FPGA prove the validity of the proposed method.
Multimedia authentication is an integral part of multimedia signal processing in many real-time and security sensitive applications, such as video surveillance. In such applications, a full-fledged video digital rights management (DRM) mechanism is not applicable due to the real time requirement and the difficulties in incorporating complicated license/key management strategies. This paper investigates the potential of multimedia authentication from a brand new angle by employing hardware-based security primitives, such as physical unclonable functions (PUFs). We show that the hardware security approach is not only capable of accomplishing the authentication for both the hardware device and the multimedia stream but, more importantly, introduce minimum performance, resource, and power overhead. We justify our approach using a prototype PUF implementation on Xilinx FPGA boards. Our experimental results on the real hardware demonstrate the high security and low overhead in multimedia authentication obtained by using hardware security approaches.
The threat of inserting malicious logic in hardware design is increasing as the digital supply chains are becoming more deep and span the whole globe. Ring oscillators (ROs) can be used to detect deviations of circuit operations due to changes of its layout caused by the insertion of a hardware Trojan horse (Trojan). The placement and the length of the ring oscillator are two important parameters that define an RO sensitivity and capability to detect malicious alternations. We propose and study the use of ring oscillators with variable lengths, configurable at the runtime. Such oscillators push further the envelope for the attackers, as their design must be undetectable by all supported lengths. We study the efficiency of our proposal on defending against a family of hardware Trojans against an implementation of the AES cryptographic algorithm on an FPGA.
Wearable personal health monitoring systems can offer a cost effective solution for human healthcare. These systems must provide both highly accurate, secured and quick processing and delivery of vast amount of data. In addition, wearable biomedical devices are used in inpatient, outpatient, and at home e-Patient care that must constantly monitor the patient's biomedical and physiological signals 24/7. These biomedical applications require sampling and processing multiple streams of physiological signals with strict power and area footprint. The processing typically consists of feature extraction, data fusion, and classification stages that require a large number of digital signal processing and machine learning kernels. In response to these requirements, in this paper, a low-power, domain-specific many-core accelerator named Power Efficient Nano Clusters (PENC) is proposed to map and execute the kernels of these applications. Experimental results show that the manycore is able to reduce energy consumption by up to 80% and 14% for DSP and machine learning kernels, respectively, when optimally parallelized. The performance of the proposed PENC manycore when acting as a coprocessor to an Intel Atom processor is compared with existing commercial off-the-shelf embedded processing platforms including Intel Atom, Xilinx Artix-7 FPGA, and NVIDIA TK1 ARM-A15 with GPU SoC. The results show that the PENC manycore architecture reduces the energy by as much as 10X while outperforming all off-the-shelf embedded processing platforms across all studied machine learning classifiers.