Biblio
Now-a-days, the security of data becomes more and more important, people store many personal information in their phones. However, stored information require security and maintain privacy. Encryption algorithm has become the main force of maintaining the security of data. Thus, the algorithm complexity and encryption efficiency have become the main measurement of whether the encryption algorithm is save or not. With the development of hardware, we have many tools to improve the algorithm at present. Because modular exponentiation in RSA algorithm can be divided into several parts mathematically. In this paper, we introduce a conception by dividing the process of encryption and add the model into graphics process unit (GPU). By using GPU's capacity in parallel computing, the core of RSA can be accelerated by using central process unit (CPU) and GPU. Compute unified device architecture (CUDA) is a platform which can combine CPU and GPU together to realize GPU parallel programming and this is the tool we use to perform experience of accelerating RSA algorithm. This paper will also build up a mathematical model to help understand the mechanism of RSA encryption algorithm.
In this paper we examine the use of covert channels based on CPU load in order to achieve persistent user identification through browser sessions. In particular, we demonstrate that an HTML5 video, a GIF image, or CSS animations on a webpage can be used to force the CPU to produce a sequence of distinct load levels, even without JavaScript or any client-side code. These load levels can be then captured either by another browsing session, running on the same or a different browser in parallel to the browsing session we want to identify, or by a malicious app installed on the device. To get a good estimation of the CPU load caused by the target session, the receiver can observe system statistics about CPU activity (app), or constantly measure time it takes to execute a known code segment (app and browser). Furthermore, for mobile devices we propose a sensor-based approach to estimate the CPU load, based on exploiting disturbances of the magnetometer sensor data caused by the high CPU activity. Captured loads can be decoded and translated into an identifying bit string, which is transmitted back to the attacker. Due to the way loads are produced, these methods are applicable even in highly restrictive browsers, such as the Tor Browser, and run unnoticeably to the end user. Therefore, unlike existing ways of web tracking, our methods circumvent most of the existing countermeasures, as they store the identifying information outside the browsing session being targeted. Finally, we also thoroughly evaluate and assess each presented method of generating and receiving the signal, and provide an overview of potential countermeasures.
To solve the problems associated with large data volume real-time processing, heterogeneous systems using various computing devices are increasingly used. The characteristic of solving this class of problems is related to the fact that there are two directions for improving methods of real-time data analysis: the first is the development of algorithms and approaches to analysis, and the second is the development of hardware and software. This article reviews the main approaches to the architecture of a hardware-software solution for traffic capture and deep packet inspection (DPI) in data transmission networks with a bandwidth of 80 Gbit/s and higher. At the moment there are software and hardware tools that allow designing the architecture of capture system and deep packet inspection: 1) Using only the central processing unit (CPU); 2) Using only the graphics processing unit (GPU); 3) Using the central processing unit and graphics processing unit simultaneously (CPU + GPU). In this paper, we consider these key approaches. Also attention is paid to both hardware and software requirements for the architecture of solutions. Pain points and remedies are described.
This paper introduces a newly developed Object-Oriented Open Software Architecture designed for supporting security applications, while leveraging on the capabilities offered by dedicated Open Hardware devices. Specifically, we target the SEcube™ platform, an Open Hardware security platform based on a 3D SiP (System on Package) designed and produced by Blu5 Group. The platform integrates three components employed for security in a single package: a Cortex-M4 CPU, a FPGA and an EAL5+ certified Smart Card. The Open Software Architecture targets both the host machine and the security device, together with the secure communication among them. To maximize its usability, this architecture is organized in several abstraction layers, ranging from hardware interfaces to device drivers, from security APIs to advanced applications, like secure messaging and data protection. We aim at releasing a multi-platform Open Source security framework, where software and hardware cooperate to hide to both the developer and the final users classical security concepts like cryptographic algorithms and keys, focusing, instead, on common operational security concepts like groups and policies.
Energy efficient High-Performance Computing (HPC) is becoming increasingly important. Recent ventures into this space have introduced an unlikely candidate to achieve exascale scientific computing hardware with a small energy footprint. ARM processors and embedded GPU accelerators originally developed for energy efficiency in mobile devices, where battery life is critical, are being repurposed and deployed in the next generation of supercomputers. Unfortunately, the performance of executing scientific workloads on many of these devices is largely unknown, yet the bulk of computation required in high-performance supercomputers is scientific. We present an analysis of one such scientific code, in the form of Gaussian Elimination, and evaluate both execution time and energy used on a range of embedded accelerator SoCs. These include three ARM CPUs and two mobile GPUs. Understanding how these low power devices perform on scientific workloads will be critical in the selection of appropriate hardware for these supercomputers, for how can we estimate the performance of tens of thousands of these chips if the performance of one is largely unknown?