Biblio
The IEEE Std. 1687 (IJTAG) was designed to provide on-chip access to the various embedded instruments (e.g. built-in self test, sensors, etc.) in complex system-on-chip designs. IJTAG facilitates access to on-chip instruments from third party intellectual property providers with hidden test-data registers. Although access to on-chip instruments provides valuable data specifically for debug and diagnosis, it can potentially expose the design to untrusted sources and instruments that can sniff and possibly manipulate the data that is being shifted through the IJTAG network. This paper provides a comprehensive protection scheme against data sniffing and data integrity attacks by selectively isolating the data flowing through the IJTAG network. The proposed scheme is modeled as a graph coloring problem to optimize the number of isolation signals required to protect the design. It is shown that combining the proposed approach with other existing schemes can also bolster the security against unauthorized user access as well. The proposed countermeasure is shown to add minimal overhead in terms of area and power consumption.
Concurrency vulnerabilities are extremely harmful and can be frequently exploited to launch severe attacks. Due to the non-determinism of multithreaded executions, it is very difficult to detect them. Recently, data race detectors and techniques based on maximal casual model have been applied to detect concurrency vulnerabilities. However, the former are ineffective and the latter report many false negatives. In this paper, we present CONVUL, an effective tool for concurrency vulnerability detection. CONVUL is based on exchangeable events, and adopts novel algorithms to detect three major kinds of concurrency vulnerabilities. In our experiments, CONVUL detected 9 of 10 known vulnerabilities, while other tools only detected at most 2 out of these 10 vulnerabilities. The 10 vulnerabilities are available at https://github.com/mryancai/ConVul.
A parallel brute force attack on RC4 algorithm based on FPGA (Field Programmable Gate Array) with an efficient style has been presented. The main idea of this design is to use number of forecast keying methods to reduce the overall clock pulses required depended to key searching operation by utilizes on-chip BRAMs (block RAMs) of FPGA for maximizing the total number of key searching unit with taking into account the highest clock rate. Depending on scheme, 32 key searching units and main controller will be used in one Xilinx XC3S1600E-4 FPGA device, all these units working in parallel and each unit will be searching in a specific range of keys, by comparing the current result with the well-known cipher text if its match the found flag signal will change from 0 to 1 and the main controller will receive this signal and stop the searching operation. This scheme operating at 128-MHz clock frequency and gives us key searching speed of 7.7 × 106 keys/sec. Testing all possible keys (40-bits length), requires only around 39.5h.
Real-time clock circuits are widely used in modern electronic systems to provide time information to the systems at the beginning of the system power-on. In this paper, we present two types of Hardware Trojan designs that employ the time information as the trigger conditions. One is a real-time based Trojan, which will attack a system at some specific realworld time. The other is a relative-time based Trojan, which will be triggered when a specific time period passes after the system is powered on. In either case when a Trojan is triggered its payload may corrupt the system or leakage internal information to the outside world. Experimental results show that the extra power consumption, area overhead and delay time are all quite small and thus the detection of the Trojans is difficult by using traditional side-channel detection methods.
Atomic multicast is a communication primitive that delivers messages to multiple groups of processes according to some total order, with each group receiving the projection of the total order onto messages addressed to it. To be scalable, atomic multicast needs to be genuine, meaning that only the destination processes of a message should participate in ordering it. In this paper we propose a novel genuine atomic multicast protocol that in the absence of failures takes as low as 3 message delays to deliver a message when no other messages are multicast concurrently to its destination groups, and 5 message delays in the presence of concurrency. This improves the latencies of both the fault-tolerant version of classical Skeen's multicast protocol (6 or 12 message delays, depending on concurrency) and its recent improvement by Coelho et al. (4 or 8 message delays). To achieve such low latencies, we depart from the typical way of guaranteeing fault-tolerance by replicating each group with Paxos. Instead, we weave Paxos and Skeen's protocol together into a single coherent protocol, exploiting opportunities for white-box optimisations. We experimentally demonstrate that the superior theoretical characteristics of our protocol are reflected in practical performance pay-offs.
The performance of many data security and reliability applications depends on computations in finite fields \$\textbackslashtextGF (2ˆm)\$. In finite field arithmetic, field multiplication is a complex operation and is also used in other operations such as inversion and exponentiation. By considering the application domain needs, a variety of efficient algorithms and architectures are proposed in the literature for field \$\textbackslashtextGF (2ˆm)\$ multiplier. With the rapid emergence of Internet of Things (IoT) and Wireless Sensor Networks (WSN), many resource-constrained devices such as IoT edge devices and WSN end nodes came into existence. The data bus width of these constrained devices is typically smaller. Digit-level architectures which can make use of the full data bus are suitable for these devices. In this paper, we propose a new fully digit-serial polynomial basis finite field \$\textbackslashtextGF (2ˆm)\$ multiplier where both the operands enter the architecture concurrently at digit-level. Though there are many digit-level multipliers available for polynomial basis multiplication in the literature, it is for the first time to propose a fully digit-serial polynomial basis multiplier. The proposed multiplication scheme is based on the multiplication scheme presented in the literature for a redundant basis multiplication. The proposed polynomial basis multiplication results in a high-throughput architecture. This multiplier is applicable for a class of trinomials, and this class of irreducible polynomials is highly desirable for IoT edge devices since it allows the least area and time complexities. The proposed multiplier achieves better throughput when compared with previous digit-level architectures.
Diffie-Hellman and RSA encryption/decryption involve computationally intensive cryptographic operations such as modular exponentiation. Computing modular exponentiation using appropriate pre-computed pairs of bases and exponents was first proposed by Boyko et al. In this paper, we present a reconfigurable architecture for pre-computation methods to compute modular exponentiation and thereby speeding up RSA and Diffie-Hellman like protocols. We choose Diffie-Hellman key pair (a, ga mod p) to illustrate the efficiency of Boyko et al's scheme in hardware architecture that stores pre-computed values ai and corresponding gai in individual block RAM. We use a Pseudo-random number generator (PRNG) to randomly choose ai values that are added and corresponding gai values are multiplied using modular multiplier to arrive at a new pair (a, ga mod p). Further, we present the advantage of using Montgomery and interleaved methods for batch multiplication to optimise time and area. We show that a 1024-bit modular exponentiation can be performed in less than 73$μ$s at a clock rate of 200MHz on a Xilinx Virtex 7 FPGA.
In VLSI industry the design cycle is categorized into Front End Design and Back End Design. Front End Design flow is from Specifications to functional verification of RTL design. Back End Design is from logic synthesis to fabrication of chip. Handheld devices like Mobile SOC's is an amalgamation of many components like GPU, camera, sensor, display etc. on one single chip. In order to integrate these components protocols are needed. One such protocol in the emerging trend is I3C protocol. I3C is abbreviated as Improved Inter Integrated Circuit developed by Mobile Industry Processor Interface (MIPI) alliance. Most probably used for the interconnection of sensors in Mobile SOC's. The main motivation of adapting the standard is for the increase speed and low pin count in most of the hardware chips. The bus protocol is backward compatible with I2C devices. The paper includes detailed study I3C bus protocol and developing verification environment for the protocol. The test bench environment is written and verified using system Verilog and UVM. The Universal Verification Methodology (UVM) is base class library built using System Verilog which provides the fundamental blocks needed to quickly develop reusable and well-constructed verification components and test environments. The Functional Coverage of around 93.55 % and Code Coverage of around 98.89 % is achieved by verification closure.
For aerospace FPGA software products, traditional simulation method faces severe challenges to verify product requirements under complicated scenarios. Given the increasing maturity of formal verification technology, this method can significantly improve verification work efficiency and product design quality, by expanding coverage on those "blind spots" in product design which were not easily identified previously. Taking UART communication as an example, this paper proposes several critical points to use formal verification for asynchronous communication protocol. Experiments and practices indicate that formal verification for asynchronous communication protocol can effectively reduce the time required, ensure a complete verification process and more importantly, achieve more accurate and intuitive results.
With the growth of technology, designs became more complex and may contain bugs. This makes verification an indispensable part in product development. UVM describe a standard method for verification of designs which is reusable and portable. This paper verifies IIC bus protocol using Universal Verification Methodology. IIC controller is designed in Verilog using Vivado. It have APB interface and its function and code coverage is carried out in Mentor graphic Questasim 10.4e. This work achieved 83.87% code coverage and 91.11% functional coverage.
Scan design is a universal design for test (DFT) technology to increase the observability and controllability of the circuits under test by using scan chains. However, it also leads to a potential security problem that attackers can use scan design as a backdoor to extract confidential information. Researchers have tried to address this problem by using secure scan structures that usually have some keys to confirm the identities of users. However, the traditional methods to store intermediate data or keys in memory are also under high risk of being attacked. In this paper, we propose a dynamic-key secure DFT structure that can defend scan-based and memory attacks without decreasing the system performance and the testability. The main idea is to build a scan design key generator that can generate the keys dynamically instead of storing and using keys in the circuit statically. Only specific patterns derived from the original test patterns are valid to construct the keys and hence the attackers cannot shift in any other patterns to extract correct internal response from the scan chains or retrieve the keys from memory. Analysis results show that the proposed method can achieve a very high security level and the security level will not decrease no matter how many guess rounds the attackers have tried due to the dynamic nature of our method.
A hardware Trojan (HT) denotes the malicious addition or modification of circuit elements. The purpose of this work is to improve the HT detection sensitivity in ICs using power side-channel analysis. This paper presents three detection techniques in power based side-channel analysis by increasing Trojan-to-circuit power consumption and reducing the variation effect in the detection threshold. Incorporating the three proposed methods has demonstrated that a realistic fine-grain circuit partitioning and an improved pattern set to increase HT activation chances can magnify Trojan detectability.
Design for Testability (DfT) techniques allow devices to be tested at various levels of the manufacturing process. Scan architecture is a dominantly used DfT technique, which supports a high level of fault coverage, observability and controllability. However, scan architecture can be used by hardware attackers to gain critical information stored within the device. The security threats due to an unrestricted access provided by scan architecture has to be addressed to ensure hardware security. In this work, a solution based on the Clock and Data Recovery (CDR) method has been presented to authenticate users and limit the access to the scan architecture to authorized users. As compared to the available solution the proposed method presents a robust performance and reduces the area overhead by more than 10%.
Embedded electronic devices and sensors such as smartphones, smart watches, medical implants, and Wireless Sensor Nodes (WSN) are making the “Internet of Things” (IoT) a reality. Such devices often require cryptographic services such as authentication, integrity and non-repudiation, which are provided by Public-Key Cryptography (PKC). As these devices are severely resource-constrained, choosing a suitable cryptographic system is challenging. Pairing Based Cryptography (PBC) is among the best candidates to implement PKC in lightweight devices. In this research, we present a fast and energy efficient implementation of PBC based on Barreto-Naehrig (BN) curves and optimal Ate pairing using hardware/software co-design. Our solution consists of a hardware-based Montgomery multiplier, and pairing software running on an ARM Cortex A9 processor in a Zynq-7020 System-on-Chip (SoC). The multiplier is protected against simple power analysis (SPA) and differential power analysis (DPA), and can be instantiated with a variable number of processing elements (PE). Our solution improves performance (in terms of latency) over an open-source software PBC implementation by factors of 2.34 and 2.02, for 256- and 160-bit field sizes, respectively, as measured in the Zynq-7020 SoC.
Hardware Trojans that can be easily embedded in synchronous clock generation circuits typical of what are used in large digital systems are discussed. These Trojans are both visible and transparent. Since they are visible, they will penetrate split-lot manufacturing security methods and their transparency will render existing detection methods ineffective.
Tamper detection circuits provide the first and most important defensive wall in protecting electronic modules containing security data. A widely used procedure is to cover the entire module with a foil containing fine conductive mesh, which detects intrusion attempts. Detection circuits are further classified as passive or active. Passive circuits have the advantage of low power consumption, however they are unable to detect small variations in the conductive mesh parameters. Since modern tools provide an upper leverage over the passive method, the most efficient way to protect security modules is thus to use active circuits. The active tamper detection circuits are typically probing the conductive mesh with short pulses, analyzing its response in terms of delay and shape. The method proposed in this paper generates short pulses at one end of the mesh and analyzes the response at the other end. Apart from measuring pulse delay, the analysis includes a frequency domain characterization of the system, determining whether there has been an intrusion or not, by comparing it to a reference (un-tampered with) spectrum. The novelty of this design is the combined analysis, in time and frequency domains, of the small variations in mesh characteristic parameters.
Caching query results is an efficient technique for Web search engines. A state-of-the-art approach named Static-Dynamic Cache (SDC) is widely used in practice. Replacement policy is the key factor on the performance of cache system, and has been widely studied such as LIRS, ARC, CLOCK, SKLRU and RANDOM in different research areas. In this paper, we discussed replacement policies for static-dynamic cache and conducted the experiments on real large scale query logs from two famous commercial Web search engine companies. The experimental results show that ARC replacement policy could work well with static-dynamic cache, especially for large scale query results cache.
Industrial Control Systems (ICS) are found in critical infrastructure such as for power generation and water treatment. When security requirements are incorporated into an ICS, one needs to test the additional code and devices added do improve the prevention and detection of cyber attacks. Conducting such tests in legacy systems is a challenge due to the high availability requirement. An approach using Timed Automata (TA) is proposed to overcome this challenge. This approach enables assessment of the effectiveness of an attack detection method based on process invariants. The approach has been demonstrated in a case study on one stage of a 6- stage operational water treatment plant. The model constructed captured the interactions among components in the selected stage. In addition, a set of attacks, attack detection mechanisms, and security specifications were also modeled using TA. These TA models were conjoined into a network and implemented in UPPAAL. The models so implemented were found effective in detecting the attacks considered. The study suggests the use of TA as an effective tool to model an ICS and study its attack detection mechanisms as a complement to doing so in a real plant-operational or under design.
The Department of Homeland Security Cyber Security Division (CSD) chose Moving Target Defense as one of the fourteen primary Technical Topic Areas pertinent to securing federal networks and the larger Internet. Moving Target Defense over IPv6 (MT6D) employs an obscuration technique offering keyed access to hosts at a network level without altering existing network infrastructure. This is accomplished through cryptographic dynamic addressing, whereby a new network address is bound to an interface every few seconds in a coordinated manner. The goal of this research is to produce a Register Transfer Level (RTL) network security processor implementation to enable the production of an Application Specific Integrated Circuit (ASIC) variant of MT6D processor for wide deployment. RTL development is challenging in that it must provide system level functions that are normally provided by the Operating System's kernel and supported libraries. This paper presents the architectural design of a hardware engine for MT6D (HE-MT6D) and is complete in simulation. Unique contributions are an inline stream-based network packet processor with a Complex Instruction Set Computer (CISC) architecture, Network Time Protocol listener, and theoretical increased performance over previous software implementations.