Visible to the public Biblio

Filters: Keyword is microprocessor chips  [Clear All Filters]
2019-12-17
Huang, Bo-Yuan, Ray, Sayak, Gupta, Aarti, Fung, Jason M., Malik, Sharad.  2018.  Formal Security Verification of Concurrent Firmware in SoCs Using Instruction-Level Abstraction for Hardware*. 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). :1-6.

Formal security verification of firmware interacting with hardware in modern Systems-on-Chip (SoCs) is a critical research problem. This faces the following challenges: (1) design complexity and heterogeneity, (2) semantics gaps between software and hardware, (3) concurrency between firmware/hardware and between Intellectual Property Blocks (IPs), and (4) expensive bit-precise reasoning. In this paper, we present a co-verification methodology to address these challenges. We model hardware using the Instruction-Level Abstraction (ILA), capturing firmware-visible behavior at the architecture level. This enables integrating hardware behavior with firmware in each IP into a single thread. The co-verification with multiple firmware across IPs is formulated as a multi-threaded program verification problem, for which we leverage software verification techniques. We also propose an optimization using abstraction to prevent expensive bit-precise reasoning. The evaluation of our methodology on an industry SoC Secure Boot design demonstrates its applicability in SoC security verification.

2019-05-01
Gundabolu, S., Wang, X..  2018.  On-chip Data Security Against Untrustworthy Software and Hardware IPs in Embedded Systems. 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). :644–649.

State-of-the-art system-on-chip (SoC) field programmable gate arrays (FPGAs) integrate hard powerful ARM processor cores and the reconfigurable logic fabric on a single chip in addition to many commonly needed high performance and high-bandwidth peripherals. The increasing reliance on untrustworthy third-party IP (3PIP) cores, including both hardware and software in FPGA-based embedded systems has made the latter increasingly vulnerable to security attacks. Detection of trojans in 3PIPs is extremely difficult to current static detection methods since there is no golden reference model for 3PIPs. Moreover, many FPGA-based embedded systems do not have the support of security services typically found in operating systems. In this paper, we present our run-time, low-cost, and low-latency hardware and software based solution for protecting data stored in on-chip memory blocks, which has attracted little research attention. The implemented memory protection design consists of a hierarchical top-down structure and controls memory access from software IPs running on the processor and hardware IPs running in the FPGA, based on a set of rules or access rights configurable at run time. Additionally, virtual addressing and encryption of data for each memory help protect confidentiality of data in case of a failure of the memory protection unit, making it hard for the attacker to gain access to the data stored in the memory. The design is implemented and tested on the Intel (Altera) DE1-SoC board featuring a SoC FPGA that integrates a dual-core ARM processor with reconfigurable logic and hundreds of memory blocks. The experimental results and case studies show that the protection model is successful in eliminating malicious IPs from the system without need for reconfiguration of the FPGA. It prevents unauthorized accesses from untrusted IPs, while arbitrating access from trusted IPs generating legal memory requests, without incurring a serious area or latency penalty.

2019-01-21
Zhang, Z., Li, Z., Xia, C., Cui, J., Ma, J..  2018.  H-Securebox: A Hardened Memory Data Protection Framework on ARM Devices. 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC). :325–332.

ARM devices (mobile phone, IoT devices) are getting more popular in our daily life due to the low power consumption and cost. These devices carry a huge number of user's private information, which attracts attackers' attention and increase the security risk. The operating systems (e.g., Android, Linux) works out many memory data protection strategies on user's private information. However, the monolithic OS may contain security vulnerabilities that are exploited by the attacker to get root or even kernel privilege. Once the kernel privilege is obtained by the attacker, all data protection strategies will be gone and user's private information can be taken away. In this paper, we propose a hardened memory data protection framework called H-Securebox to defeat kernel-level memory data stolen attacks. H-Securebox leverages ARM hardware virtualization technique to protect the data on the memory with hypervisor privilege. We designed three types H-Securebox for programing developers to use. Although the attacker may have kernel privilege, she can not touch private data inside H-Securebox, since hypervisor privilege is higher than kernel privilege. With the implementation of H-Securebox system assisting by a tiny hypervisor on Raspberry Pi2 development board, we measure the performance overhead of our system and do the security evaluations. The results positively show that the overhead is negligible and the malicious application with root or kernel privilege can not access the private data protected by our system.

2018-12-10
Shathanaa, R., Ramasubramanian, N..  2018.  Improving Power amp; Latency Metrics for Hardware Trojan Detection During High Level Synthesis. 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT). :1–7.

The globalization and outsourcing of the semiconductor industry has raised serious concerns about the trustworthiness of the hardware. Importing Third Party IP cores in the Integrated Chip design has opened gates for new form of attacks on hardware. Hardware Trojans embedded in Third Party IPs has necessitated the need for secure IC design process. Design-for-Trust techniques aimed at detection of Hardware Trojans come with overhead in terms of area, latency and power consumption. In this work, we present a Cuckoo Search algorithm based Design Space Exploration process for finding low cost hardware solutions during High Level Synthesis. The exploration is conducted with respect to datapath resource allocation for single and nested loops. The proposed algorithm is compared with existing Hardware Trojan detection mechanisms and experimental results show that the proposed algorithm is able to achieve 3x improvement in Cost when compared existing algorithms.

2018-02-21
Grgić, K., Kovačevic, Z., Čik, V. K..  2017.  Performance analysis of symmetric block cryptosystems on Android platform. 2017 International Conference on Smart Systems and Technologies (SST). :155–159.

The symmetric block ciphers, which represent a core element for building cryptographic communications systems and protocols, are used in providing message confidentiality, authentication and integrity. Various limitations in hardware and software resources, especially in terminal devices used in mobile communications, affect the selection of appropriate cryptosystem and its parameters. In this paper, an implementation of three symmetric ciphers (DES, 3DES, AES) used in different operating modes are analyzed on Android platform. The cryptosystems' performance is analyzed in different scenarios using several variable parameters: cipher, key size, plaintext size and number of threads. Also, the influence of parallelization supported by multi-core CPUs on cryptosystem performance is analyzed. Finally, some conclusions about the parameter selection for optimal efficiency are given.

Bai, Xu, Jiang, Lei, Dai, Qiong, Yang, Jiajia, Tan, Jianlong.  2017.  Acceleration of RSA processes based on hybrid ARM-FPGA cluster. 2017 IEEE Symposium on Computers and Communications (ISCC). :682–688.

Cooperation of software and hardware with hybrid architectures, such as Xilinx Zynq SoC combining ARM CPU and FPGA fabric, is a high-performance and low-power platform for accelerating RSA Algorithm. This paper adopts the none-subtraction Montgomery algorithm and the Chinese Remainder Theorem (CRT) to implement high-speed RSA processors, and deploys a 48-node cluster infrastructure based on Zynq SoC to achieve extremely high scalability and throughput of RSA computing. In this design, we use the ARM to implement node-to-node communication with the Message Passing Interface (MPI) while use the FPGA to handle complex calculation. Finally, the experimental results show that the overall performance is linear with the number of nodes. And the cluster achieves 6× 9× speedup against a multi-core desktop (Intel i7-3770) and comparable performance to a many-core server (288-core). In addition, we gain up to 2.5× energy efficiency compared to these two traditional platforms.

2018-02-02
Bruel, P., Chalamalasetti, S. R., Dalton, C., Hajj, I. El, Goldman, A., Graves, C., Hwu, W. m, Laplante, P., Milojicic, D., Ndu, G. et al..  2017.  Generalize or Die: Operating Systems Support for Memristor-Based Accelerators. 2017 IEEE International Conference on Rebooting Computing (ICRC). :1–8.

The deceleration of transistor feature size scaling has motivated growing adoption of specialized accelerators implemented as GPUs, FPGAs, ASICs, and more recently new types of computing such as neuromorphic, bio-inspired, ultra low energy, reversible, stochastic, optical, quantum, combinations, and others unforeseen. There is a tension between specialization and generalization, with the current state trending to master slave models where accelerators (slaves) are instructed by a general purpose system (master) running an Operating System (OS). Traditionally, an OS is a layer between hardware and applications and its primary function is to manage hardware resources and provide a common abstraction to applications. Does this function, however, apply to new types of computing paradigms? This paper revisits OS functionality for memristor-based accelerators. We explore one accelerator implementation, the Dot Product Engine (DPE), for a select pattern of applications in machine learning, imaging, and scientific computing and a small set of use cases. We explore typical OS functionality, such as reconfiguration, partitioning, security, virtualization, and programming. We also explore new types of functionality, such as precision and trustworthiness of reconfiguration. We claim that making an accelerator, such as the DPE, more general will result in broader adoption and better utilization.

2018-01-23
Karam, R., Hoque, T., Ray, S., Tehranipoor, M., Bhunia, S..  2017.  MUTARCH: Architectural diversity for FPGA device and IP security. 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). :611–616.
Field Programmable Gate Arrays (FPGAs) are being increasingly deployed in diverse applications including the emerging Internet of Things (IoT), biomedical, and automotive systems. However, security of the FPGA configuration file (i.e. bitstream), especially during in-field reconfiguration, as well as effective safeguards against unauthorized tampering and piracy during operation, are notably lacking. The current practice of bitstreram encryption is only available in high-end FPGAs, incurs unacceptably high overhead for area/energy-constrained devices, and is susceptible to side channel attacks. In this paper, we present a fundamentally different and novel approach to FPGA security that can protect against all major attacks on FPGA, namely, unauthorized in-field reprogramming, piracy of FPGA intellectual property (IP) blocks, and targeted malicious modification of the bitstream. Our approach employs the security through diversity principle to FPGA, which is often used in the software domain. We make each device architecturally different from the others using both physical (static) and logical (time-varying) configuration keys, ensuring that attackers cannot use a priori knowledge about one device to mount an attack on another. It therefore mitigates the economic motivation for attackers to reverse engineering the bitstream and IP. The approach is compatible with modern remote upgrade techniques, and requires only small modifications to existing FPGA tool flows, making it an attractive addition to the FPGA security suite. Our experimental results show that the proposed approach achieves provably high security against tampering and piracy with worst-case 14% latency overhead and 13% area overhead.
2017-12-28
Panetta, J., Filho, P. R. P. S., Laranjeira, L. A. F., Teixeira, C. A..  2017.  Scalability of CPU and GPU Solutions of the Prime Elliptic Curve Discrete Logarithm Problem. 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). :33–40.

Elliptic curve asymmetric cryptography has achieved increased popularity due to its capability of providing comparable levels of security as other existing cryptographic systems while requiring less computational work. Pollard Rho and Parallel Collision Search, the fastest known sequential and parallel algorithms for breaking this cryptographic system, have been successfully applied over time to break ever-increasing bit-length system instances using implementations heavily optimized for the available hardware. This work presents portable, general implementations of a Parallel Collision Search based solution for prime elliptic curve asymmetric cryptographic systems that use publicly available big integer libraries and make no assumption on prime curve properties. It investigates which bit-length keys can be broken in reasonable time by a user that has access to a state of the art, public HPC equipment with CPUs and GPUs. The final implementation breaks a 79-bit system in about two hours using 80 GPUs and 94-bits system in about 15 hours using 256 GPUs. Extensive experimentation investigates scalability of CPU, GPU and CPU+GPU runs. The discussed results indicate that speed-up is not a good metric for parallel scalability. This paper proposes and evaluates a new metric that is better suited for this task.

2017-12-04
Johnston, B., Lee, B., Angove, L., Rendell, A..  2017.  Embedded Accelerators for Scientific High-Performance Computing: An Energy Study of OpenCL Gaussian Elimination Workloads. 2017 46th International Conference on Parallel Processing Workshops (ICPPW). :59–68.

Energy efficient High-Performance Computing (HPC) is becoming increasingly important. Recent ventures into this space have introduced an unlikely candidate to achieve exascale scientific computing hardware with a small energy footprint. ARM processors and embedded GPU accelerators originally developed for energy efficiency in mobile devices, where battery life is critical, are being repurposed and deployed in the next generation of supercomputers. Unfortunately, the performance of executing scientific workloads on many of these devices is largely unknown, yet the bulk of computation required in high-performance supercomputers is scientific. We present an analysis of one such scientific code, in the form of Gaussian Elimination, and evaluate both execution time and energy used on a range of embedded accelerator SoCs. These include three ARM CPUs and two mobile GPUs. Understanding how these low power devices perform on scientific workloads will be critical in the selection of appropriate hardware for these supercomputers, for how can we estimate the performance of tens of thousands of these chips if the performance of one is largely unknown?

2017-11-27
Yi, Su-Wen, Li, Wei, Dai, Zi-Bin, Liu, Jun-Wei.  2016.  A compact and efficient architecture for elliptic curve cryptographic processor. 2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT). :1276–1280.

In this paper, a dual-field elliptic curve cryptographic processor is proposed to support arbitrary curves within 576-bit in dual field. Besides, two heterogeneous function units are coupled with the processor for the parallel operations in finite field based on the analysis of the characteristics of elliptic curve cryptographic algorithms. To simplify the hardware complexity, the clustering technology is adopted in the processor. At last, a fast Montgomery modular division algorithm and its implementation is proposed based on the Kaliski's Montgomery modular inversion. Using UMC 90-nm CMOS 1P9M technology, the proposed processor occupied 0.86-mm2 can perform the scalar multiplication in 0.34ms in GF(p160) and 0.22ms in GF(2160), respectively. Compared to other elliptic curve cryptographic processors, our design is advantageous in hardware efficiency and speed moderation.

2017-03-08
Saxena, U., Bachhan, O. P., Majumdar, R..  2015.  Static and dynamic malware behavioral analysis based on arm based board. 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom). :272–277.

A trap set to detect attempts at unauthorized use of information systems. But setting up these honeypots and keep these guzzling electricity 24X7 is rather expensive. Plus there is always a risk of a skillful hacker or a deadly malware may break through this and compromise the whole system. Honeypot name suggest, a pot that contents full of honey to allure beers, but in networks Scenario honeypot is valuable tool that helps to allure attackers. It helps to detect and analyze malicious activity over your network. However honeypots used for commercial organization do not share data and large honeypot gives read only data. We propose an Arm based device having all capability of honeypots to allure attackers. Current honeypots are based on large Network but we are trying to make s device which have the capabilities to establish in small network and cost effective. This research helps us to make a device based on arm board and CCFIS Software to allure attackers which is easy to install and cost effective. CCFIS Sensor helps us to Capture malware and Analysis the attack. In this we did reverse Engineering of honeypots to know about how it captures malware. During reverse engineering we know about pros and cons of honeypots that are mitigated in CCFIS Sensor. After Completion of device we compared honeypots and CCFIS Sensor to check the effectiveness of device.

Voyiatzis, I., Sgouropoulou, C., Estathiou, C..  2015.  Detecting untestable hardware Trojan with non-intrusive concurrent on line testing. 2015 10th International Conference on Design Technology of Integrated Systems in Nanoscale Era (DTIS). :1–2.

Hardware Trojans are an emerging threat that intrudes in the design and manufacturing cycle of the chips and has gained much attention lately due to the severity of the problems it draws to the chip supply chain. Hardware Typically, hardware Trojans are not detected during the usual manufacturing testing due to the fact that they are activated as an effect of a rare event. A class of published HTs are based on the geometrical characteristics of the circuit and claim to be undetectable, in the sense that their activation cannot be detected. In this work we study the effect of continuously monitoring the inputs of the module under test with respect to the detection of HTs possibly inserted in the module, either in the design or the manufacturing stage.

Reis, R..  2015.  Trends on EDA for low power. 2015 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization (NEMO). :1–4.

One of the main issues in the design of modern integrated circuits is power reduction. Mainly in digital circuits, the power consumption was defined by the dynamic power consumption, during decades. But in the new NanoCMOs technologies, the static power due to the leakage current is becoming the main issue in power consumption. As the leakage power is related to the amount of components, it is becoming mandatory to reduce the amount of transistors in any type of design, to reduce power consumption. So, it is important to obtain new EDA algorithms and tools to optimize the amount of components (transistors). It is also needed tools for the layout design automation that are able to design any network of components that is provided by an optimization tool that is able to reduce the size of the network of components. It is presented an example of a layout design automation tool that can do the layout of any network of transistors using transistors of any size. Another issue for power optimization is the use of tools and algorithms for gate sizing. The designer can manage the sizing of transistors to reduce power consumption, without compromising the clock frequency. There are two types of gate sizing, discrete gate sizing and continuous gate sizing. The discrete gate sizing tools are used when it is being used a cell library that has only few available sizes for each cell. The continuous gate sizing considers that the EDA tool can define any transistor sizing. In this case, the designer needs to have a layout design tool able to do the layout of transistors with any size. It will be presented the winner tools of the ISPD Contest 2012 and 2013. Also, it will be discussed the inclusion of our gate sizing algorithms in an industrial flow used to design state-of-the-art microprocessors. Another type of EDA tool that is becoming more and more useful is the visualization tools that provide an animated visual output of the running of EDA tools. This kind of tools is very usef- l to show to the tool developers how the tool is running. So, the EDA developers can use this information to improve the algorithms used in an EDA Tool.

2017-02-27
Kainth, M., Krishnan, L., Narayana, C., Virupaksha, S. G., Tessier, R..  2015.  Hardware-assisted code obfuscation for FPGA soft microprocessors. 2015 Design, Automation Test in Europe Conference Exhibition (DATE). :127–132.

Soft microprocessors are vital components of many embedded FPGA systems. As the application domain for FPGAs expands, the security of the software used by soft processors increases in importance. Although software confidentiality approaches (e.g. encryption) are effective, code obfuscation is known to be an effective enhancement that further deters code understanding for attackers. The availability of specialization in FPGAs provides a unique opportunity for code obfuscation on a per-application basis with minimal hardware overhead. In this paper we describe a new technique to obfuscate soft microprocessor code which is located outside the FPGA chip in an unprotected area. Our approach provides customizable, data-dependent control flow modification to make it difficult for attackers to easily understand program behavior. The application of the approach to three benchmarks illustrates a control flow cyclomatic complexity increase of about 7× with a modest logic overhead for the soft processor.

2017-02-13
S. V. Trivedi, M. A. Hasamnis.  2015.  "Development of platform using NIOS II soft core processor for image encryption and decryption using AES algorithm". 2015 International Conference on Communications and Signal Processing (ICCSP). :1147-1151.

In our digital world internet is a widespread channel for transmission of information. Information that is transmitted can be in form of messages, images, audios and videos. Due to this escalating use of digital data exchange cryptography and network security has now become very important in modern digital communication network. Cryptography is a method of storing and transmitting data in a particular form so that only those for whom it is intended can read and process it. The term cryptography is most often associated with scrambling plaintext into ciphertext. This process is called as encryption. Today in industrial processes images are very frequently used, so it has become essential for us to protect the confidential image data from unauthorized access. In this paper Advanced Encryption Standard (AES) which is a symmetric algorithm is used for encryption and decryption of image. Performance of Advanced Encryption Standard algorithm is further enhanced by adding a key stream generator W7. NIOS II soft core processor is used for implementation of encryption and decryption algorithm. A system is designed with the help of SOPC (System on programmable chip) builder tool which is available in QUARTUS II (Version 10.1) environment using NIOS II soft core processor. Developed single core system is implemented using Altera DE2 FPGA board (Cyclone II EP2C35F672). Using MATLAB the image is read and then by using DWT (Discrete Wavelet Transform) the image is compressed. The image obtained after compression is now given as input to proposed AES encryption algorithm. The output of encryption algorithm is given as input to decryption algorithm in order to get back the original image. The implementation of which is done on the developed single core platform using NIOS II processor. Finally the output is analyzed in MATLAB by plotting histogram of original and encrypted image.

2015-04-30
Zheng, J.X., Dongfang Li, Potkonjak, M..  2014.  A secure and unclonable embedded system using instruction-level PUF authentication. Field Programmable Logic and Applications (FPL), 2014 24th International Conference on. :1-4.

In this paper we present a secure and unclonable embedded system design that can target either an FPGA or an ASIC technology. The premise of the security is that the executed machine code and the executing environment (the embedded processor) will authenticate each other at a per-instruction basis using Physical Unclonable Functions (PUFs) that are built into the processor. The PUFs ensure that the execution of the binary code may only proceed if the binary is compiled with the correct intrinsic knowledge of the PUFs, and that such intrinsic knowledge is virtually unique to each processor and therefore unclonable. We will explain how to implement and integrate the PUFs into the processor's execution environment such that each instruction is authenticated and de-obfuscated on-demand and how to transform an ordinary binary executable into PUF-aware, obfuscated binaries. We will also present a prototype system on a Xilinx Spartan6-based FPGA board.