Visible to the public Biblio

Filters: Keyword is Instruction sets  [Clear All Filters]
2020-07-27
Gorodnichev, Mikhail G., Kochupalov, Alexander E., Gematudinov, Rinat A..  2018.  Asynchronous Rendering of Texts in iOS Applications. 2018 IEEE International Conference "Quality Management, Transport and Information Security, Information Technologies" (IT QM IS). :643–645.
This article is devoted to new asynchronous methods for rendering text information in mobile applications for iOS operating system.
2020-06-26
M, Raviraja Holla, D, Suma.  2019.  Memory Efficient High-Performance Rotational Image Encryption. 2019 International Conference on Communication and Electronics Systems (ICCES). :60—64.

Image encryption is an essential part of a Visual Cryptography. Existing traditional sequential encryption techniques are infeasible to real-time applications. High-performance reformulations of such methods are increasingly growing over the last decade. These reformulations proved better performances over their sequential counterparts. A rotational encryption scheme encrypts the images in such a way that the decryption is possible with the rotated encrypted images. A parallel rotational encryption technique makes use of a high-performance device. But it less-leverages the optimizations offered by them. We propose a rotational image encryption technique which makes use of memory coalescing provided by the Compute Unified Device Architecture (CUDA). The proposed scheme achieves improved global memory utilization and increased efficiency.

2020-05-22
Despotovski, Filip, Gusev, Marjan, Zdraveski, Vladimir.  2018.  Parallel Implementation of K-Nearest-Neighbors for Face Recognition. 2018 26th Telecommunications Forum (℡FOR). :1—4.
Face recognition is a fast-expanding field of research. Countless classification algorithms have found use in face recognition, with more still being developed, searching for better performance and accuracy. For high-dimensional data such as images, the K-Nearest-Neighbours classifier is a tempting choice. However, it is very computationally-intensive, as it has to perform calculations on all items in the stored dataset for each classification it makes. Fortunately, there is a way to speed up the process by performing some of the calculations in parallel. We propose a parallel CUDA implementation of the KNN classifier and then compare it to a serial implementation to demonstrate its performance superiority.
2020-03-30
Kim, Sejin, Oh, Jisun, Kim, Yoonhee.  2019.  Data Provenance for Experiment Management of Scientific Applications on GPU. 2019 20th Asia-Pacific Network Operations and Management Symposium (APNOMS). :1–4.
Graphics Processing Units (GPUs) are getting popularly utilized for multi-purpose applications in order to enhance highly performed parallelism of computation. As memory virtualization methods in GPU nodes are not efficiently provided to deal with diverse memory usage patterns for these applications, the success of their execution depends on exclusive and limited use of physical memory in GPU environments. Therefore, it is important to predict a pattern change of GPU memory usage during runtime execution of an application. Data provenance extracted from application characteristics, GPU runtime environments, input, and execution patterns from runtime monitoring, is defined for supporting application management to set runtime configuration and predict an experimental result, and utilize resource with co-located applications. In this paper, we define data provenance of an application on GPUs and manage data by profiling the execution of CUDA scientific applications. Data provenance management helps to predict execution patterns of other similar experiments and plan efficient resource configuration.
2020-03-16
Kholidy, Hisham A..  2019.  Towards A Scalable Symmetric Key Cryptographic Scheme: Performance Evaluation and Security Analysis. 2019 2nd International Conference on Computer Applications Information Security (ICCAIS). :1–6.
In most applications, security attributes are pretty difficult to meet but it becomes even a bigger challenge when talking about Grid Computing. To secure data passes in Grid Systems, we need a professional scheme that does not affect the overall performance of the grid system. Therefore, we previously developed a new security scheme “ULTRA GRIDSEC” that is used to accelerate the performance of the symmetric key encryption algorithms for both stream and block cipher encryption algorithms. The scheme is used to accelerate the security of data pass between elements of our newly developed pure peer-to-peer desktop grid framework, “HIMAN”. It also enhances the security of the encrypted data resulted from the scheme and prevents the problem of weak keys of the encryption algorithms. This paper covers the analysis and evaluation of this scheme showing the different factors affecting the scheme performance, and covers the efficiency of the scheme from the security prospective. The experimental results are highlighted for two types of encryption algorithms, TDES as an example for the block cipher algorithms, and RC4 as an example for the stream cipher algorithms. The scheme speeds up the former algorithm by 202.12% and the latter one by 439.7%. These accelerations are also based on the running machine's capabilities.
2020-03-09
Hăjmăȿan, Gheorghe, Mondoc, Alexandra, Creț, Octavian.  2019.  Bytecode Heuristic Signatures for Detecting Malware Behavior. 2019 Conference on Next Generation Computing Applications (NextComp). :1–6.
For a long time, the most important approach for detecting malicious applications was the use of static, hash-based signatures. This approach provides a fast response time, has a low performance overhead and is very stable due to its simplicity. However, with the rapid growth in the number of malware, as well as their increased complexity in terms of polymorphism and evasion, the era of reactive security solutions started to fade in favor of new, proactive approaches such as behavior based detection. We propose a novel approach that uses an interpreter virtual machine to run proactive behavior heuristics from bytecode signatures, thus combining the advantages of behavior based detection with those of signatures. Based on our approximation, using this approach we succeeded to reduce by 85% the time required to update a behavior based detection solution to detect new threats, while continuing to benefit from the versatility of behavior heuristics.
2020-02-10
Wan, Shengye, Sun, Jianhua, Sun, Kun, Zhang, Ning, Li, Qi.  2019.  SATIN: A Secure and Trustworthy Asynchronous Introspection on Multi-Core ARM Processors. 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :289–301.

On ARM processors with TrustZone security extension, asynchronous introspection mechanisms have been developed in the secure world to detect security policy violations in the normal world. These mechanisms provide security protection via passively checking the normal world snapshot. However, since previous secure world checking solutions require to suspend the entire rich OS, asynchronous introspection has not been widely adopted in the real world. Given a multi-core ARM system that can execute the two worlds simultaneously on different cores, secure world introspection can check the rich OS without suspension. However, we identify a new normal-world evasion attack that can defeat the asynchronous introspection by removing the attacking traces in parallel from one core when the security checking is performing on another core. We perform a systematic study on this attack and present its efficiency against existing asynchronous introspection mechanisms. As the countermeasure, we propose a secure and trustworthy asynchronous introspection mechanism called SATIN, which can efficiently detect the evasion attacks by increasing the attackers' evasion time cost and decreasing the defender's execution time under a safe limit. We implement a prototype on an ARM development board and the experimental results show that SATIN can effectively prevent evasion attacks on multi-core systems with a minor system overhead.

2020-01-13
Gopaluni, Jitendra, Unwala, Ishaq, Lu, Jiang, Yang, Xiaokun.  2019.  Graphical User Interface for OpenThread. 2019 IEEE 16th International Conference on Smart Cities: Improving Quality of Life Using ICT IoT and AI (HONET-ICT). :235–237.
This paper presents an implementation of a Graphical User Interface (GUI) for the OpenThread software. OpenThread is a software package for Thread. Thread is a networking protocol for Internet of Things (IoT) designed for home automation. OpenThread package was released by Nest Labs as an open source implementation of the Thread specification v1.1.1. The OpenThread includes IPv6, 6LoWPAN, IEEE 802.15.4 with MAC security, Mesh Link Establishment, and Mesh Routing. OpenThread includes all Thread supported device types and supports both SOC and NCP implementations. OpenThread runs on Linux and allows the users to use it as a simulator with a command line interface. This research is focused on adding a Graphical User Interface (GUI) to the OpenThread. The GUI package is implemented in TCL/Tk (Tool Control Language). OpenThread with a GUI makes working with OpenThread much easier for researchers and students. The GUI also makes it easier to visualize the Thread network and its operations.
2019-12-17
Huang, Jeff.  2018.  UFO: Predictive Concurrency Use-After-Free Detection. 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). :609-619.

Use-After-Free (UAF) vulnerabilities are caused by the program operating on a dangling pointer and can be exploited to compromise critical software systems. While there have been many tools to mitigate UAF vulnerabilities, UAF remains one of the most common attack vectors. UAF is particularly di cult to detect in concurrent programs, in which a UAF may only occur with rare thread schedules. In this paper, we present a novel technique, UFO, that can precisely predict UAFs based on a single observed execution trace with a provably higher detection capability than existing techniques with no false positives. The key technical advancement of UFO is an extended maximal thread causality model that captures the largest possible set of feasible traces that can be inferred from a given multithreaded execution trace. By formulating UAF detection as a constraint solving problem atop this model, we can explore a much larger thread scheduling space than classical happens-before based techniques. We have evaluated UFO on several real-world large complex C/C++ programs including Chromium and FireFox. UFO scales to real-world systems with hundreds of millions of events in their execution and has detected a large number of real concurrency UAFs.

Zhao, Shixiong, Gu, Rui, Qiu, Haoran, Li, Tsz On, Wang, Yuexuan, Cui, Heming, Yang, Junfeng.  2018.  OWL: Understanding and Detecting Concurrency Attacks. 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :219-230.
Just like bugs in single-threaded programs can lead to vulnerabilities, bugs in multithreaded programs can also lead to concurrency attacks. We studied 31 real-world concurrency attacks, including privilege escalations, hijacking code executions, and bypassing security checks. We found that compared to concurrency bugs' traditional consequences (e.g., program crashes), concurrency attacks' consequences are often implicit, extremely hard to be observed and diagnosed by program developers. Moreover, in addition to bug-inducing inputs, extra subtle inputs are often needed to trigger the attacks. These subtle features make existing tools ineffective to detect concurrency attacks. To tackle this problem, we present OWL, the first practical tool that models general concurrency attacks' implicit consequences and automatically detects them. We implemented OWL in Linux and successfully detected five new concurrency attacks, including three confirmed and fixed by developers, and two exploited from previously known and well-studied concurrency bugs. OWL has also detected seven known concurrency attacks. Our evaluation shows that OWL eliminates 94.1% of the reports generated by existing concurrency bug detectors as false positive, greatly reducing developers' efforts on diagnosis. All OWL source code, concurrency attack exploit scripts, and results are available on github.com/hku-systems/owl.
Li, Ming, Hawrylak, Peter, Hale, John.  2019.  Concurrency Strategies for Attack Graph Generation. 2019 2nd International Conference on Data Intelligence and Security (ICDIS). :174-179.

The network attack graph is a powerful tool for analyzing network security, but the generation of a large-scale graph is non-trivial. The main challenge is from the explosion of network state space, which greatly increases time and storage costs. In this paper, three parallel algorithms are proposed to generate scalable attack graphs. An OpenMP-based programming implementation is used to test their performance. Compared with the serial algorithm, the best performance from the proposed algorithms provides a 10X speedup.

2019-11-04
Harrison, William L., Allwein, Gerard.  2018.  Semantics-Directed Prototyping of Hardware Runtime Monitors. 2018 International Symposium on Rapid System Prototyping (RSP). :42-48.

Building memory protection mechanisms into embedded hardware is attractive because it has the potential to neutralize a host of software-based attacks with relatively small performance overhead. A hardware monitor, being at the lowest level of the system stack, is more difficult to bypass than a software monitor and hardware-based protections are also potentially more fine-grained than is possible in software: an individual instruction executing on a processor may entail multiple memory accesses, all of which may be tracked in hardware. Finally, hardware-based protection can be performed without the necessity of altering application binaries. This article presents a proof-of-concept codesign of a small embedded processor with a hardware monitor protecting against ROP-style code reuse attacks. While the case study is small, it indicates, we argue, an approach to rapid-prototyping runtime monitors in hardware that is quick, flexible, and extensible as well as being amenable to formal verification.

2019-10-14
Rong, Z., Xie, P., Wang, J., Xu, S., Wang, Y..  2018.  Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks. 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP). :1–8.

With the implementation of W ⊕ X security model on computer system, Return-Oriented Programming(ROP) has become the primary exploitation technique for adversaries. Although many solutions that defend against ROP exploits have been proposed, they still suffer from various shortcomings. In this paper, we propose a new way to mitigate ROP attacks that are based on return instructions. We clean the scratch registers which are also the parameter registers based on the features of ROP malicious code and calling convention. A prototype is implemented on x64-based Linux platform based on Pin. Preliminary experimental results show that our method can efficiently mitigate conventional ROP attacks.

2019-09-26
Khatchadourian, R., Tang, Y., Bagherzadeh, M., Ahmed, S..  2019.  Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). :619-630.

Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming languages. For example, the Stream API introduced in Java 8 allows for functional-like, MapReduce-style operations in processing both finite and infinite data structures. However, using this API efficiently involves subtle considerations like determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. In this paper, we present an automated refactoring approach that assists developers in writing efficient stream code in a semantics-preserving fashion. The approach, based on a novel data ordering and typestate analysis, consists of preconditions for automatically determining when it is safe and possibly advantageous to convert sequential streams to parallel and unorder or de-parallelize already parallel streams. The approach was implemented as a plug-in to the Eclipse IDE, uses the WALA and SAFE analysis frameworks, and was evaluated on 11 Java projects consisting of ?642K lines of code. We found that 57 of 157 candidate streams (36.31%) were refactorable, and an average speedup of 3.49 on performance tests was observed. The results indicate that the approach is useful in optimizing stream code to their full potential.

2019-01-21
Nemati, H., Dagenais, M. R..  2018.  VM processes state detection by hypervisor tracing. 2018 Annual IEEE International Systems Conference (SysCon). :1–8.

The diagnosis of performance issues in cloud environments is a challenging problem, due to the different levels of virtualization, the diversity of applications and their interactions on the same physical host. Moreover, because of privacy, security, ease of deployment and execution overhead, an agent-less method, which limits its data collection to the physical host level, is often the only acceptable solution. In this paper, a precise host-based method, to recover wait state for the processes inside a given Virtual Machine (VM), is proposed. The virtual Process State Detection (vPSD) algorithm computes the state of processes through host kernel tracing. The state of a virtual Process (vProcess) is displayed in an interactive trace viewer (Trace Compass) for further inspection. Our proposed VM trace analysis algorithm has been open-sourced for further enhancements and for the benefit of other developers. Experimental evaluations were conducted using a mix of workload types (CPU, Disk, and Network), with different applications like Hadoop, MySQL, and Apache. vPSD, being based on host hypervisor tracing, brings a lower overhead (around 0.03%) as compared to other approaches.

2018-02-28
Sagisi, J., Tront, J., Marchany, R..  2017.  System architectural design of a hardware engine for moving target IPv6 defense over IEEE 802.3 Ethernet. MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM). :551–556.

The Department of Homeland Security Cyber Security Division (CSD) chose Moving Target Defense as one of the fourteen primary Technical Topic Areas pertinent to securing federal networks and the larger Internet. Moving Target Defense over IPv6 (MT6D) employs an obscuration technique offering keyed access to hosts at a network level without altering existing network infrastructure. This is accomplished through cryptographic dynamic addressing, whereby a new network address is bound to an interface every few seconds in a coordinated manner. The goal of this research is to produce a Register Transfer Level (RTL) network security processor implementation to enable the production of an Application Specific Integrated Circuit (ASIC) variant of MT6D processor for wide deployment. RTL development is challenging in that it must provide system level functions that are normally provided by the Operating System's kernel and supported libraries. This paper presents the architectural design of a hardware engine for MT6D (HE-MT6D) and is complete in simulation. Unique contributions are an inline stream-based network packet processor with a Complex Instruction Set Computer (CISC) architecture, Network Time Protocol listener, and theoretical increased performance over previous software implementations.

2018-02-21
Zhao, S., Ding, X..  2017.  On the Effectiveness of Virtualization Based Memory Isolation on Multicore Platforms. 2017 IEEE European Symposium on Security and Privacy (EuroS P). :546–560.

Virtualization based memory isolation has been widely used as a security primitive in many security systems. This paper firstly provides an in-depth analysis of its effectiveness in the multicore setting, a first in the literature. Our study reveals that memory isolation by itself is inadequate for security. Due to the fundamental design choices in hardware, it faces several challenging issues including page table maintenance, address mapping validation and thread identification. As demonstrated by our attacks implemented on XMHF and BitVisor, these issues undermine the security of memory isolation. Next, we propose a new isolation approach that is immune to the aforementioned problems. In our design, the hypervisor constructs a fully isolated micro computing environment (FIMCE) that exposes a minimal attack surface to an untrusted OS on a multicore platform. By virtue of its architectural niche, FIMCE offers stronger assurance and greater versatility than memory isolation. We have built a prototype of FIMCE and measured its performance. To show the benefits of using FIMCE as a building block, we have also implemented several practical applications which cannot be securely realized by using memory isolation alone.

2018-01-16
Sagisi, J., Tront, J., Bradley, R. M..  2017.  Platform agnostic, scalable, and unobtrusive FPGA network processor design of moving target defense over IPv6 (MT6D) over IEEE 802.3 Ethernet. 2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). :165–165.

This work presents the proof of concept implementation for the first hardware-based design of Moving Target Defense over IPv6 (MT6D) in full Register Transfer Level (RTL) logic, with future sights on an embedded Application-Specified Integrated Circuit (ASIC) implementation. Contributions are an IEEE 802.3 Ethernet stream-based in-line network packet processor with a specialized Complex Instruction Set Computer (CISC) instruction set architecture, RTL-based Network Time Protocol v4 synchronization, and a modular crypto engine. Traditional static network addressing allows attackers the incredible advantage of taking time to plan and execute attacks against a network. To counter, MT6D provides a network host obfuscation technique that offers network-based keyed access to specific hosts without altering existing network infrastructure and is an excellent technique for protecting the Internet of Things, IPv6 over Low Power Wireless Personal Area Networks, and high value globally routable IPv6 interfaces. This is done by crypto-graphically altering IPv6 network addresses every few seconds in a synchronous manner at all endpoints. A border gateway device can be used to intercept select packets to unobtrusively perform this action. Software driven implementations have posed many challenges, namely, constant code maintenance to remain compliant with all library and kernel dependencies, the need for a host computing platform, and less than optimal throughput. This work seeks to overcome these challenges in a lightweight system to be developed for practical wide deployment.

2017-11-27
Qin, Y., Wang, H., Jia, Z., Xia, H..  2016.  A flexible and scalable implementation of elliptic curve cryptography over GF(p) based on ASIP. 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC). :1–8.

Public-key cryptography schemes are widely used due to their high level of security. As a very efficient one among public-key cryptosystems, elliptic curve cryptography (ECC) has been studied for years. Researchers used to improve the efficiency of ECC through point multiplication, which is the most important and complex operation of ECC. In our research, we use special families of curves and prime fields which have special properties. After that, we introduce the instruction set architecture (ISA) extension method to accelerate this algorithm (192-bit private key) and build an ECC\_ASIP model with six new ECC custom instructions. Finally, the ECC\_ASIP model is implemented in a field-programmable gate array (FPGA) platform. The persuasive experiments have been conducted to evaluate the performance of our new model in the aspects of the performance, the code storage space and hardware resources. Experimental results show that our processor improves 69.6% in the execution efficiency and requires only 6.2% more hardware resources.

2017-02-23
Jia, L., Sen, S., Garg, D., Datta, A..  2015.  "A Logic of Programs with Interface-Confined Code". 2015 IEEE 28th Computer Security Foundations Symposium. :512–525.

Interface-confinement is a common mechanism that secures untrusted code by executing it inside a sandbox. The sandbox limits (confines) the code's interaction with key system resources to a restricted set of interfaces. This practice is seen in web browsers, hypervisors, and other security-critical systems. Motivated by these systems, we present a program logic, called System M, for modeling and proving safety properties of systems that execute adversary-supplied code via interface-confinement. In addition to using computation types to specify effects of computations, System M includes a novel invariant type to specify the properties of interface-confined code. The interpretation of invariant type includes terms whose effects satisfy an invariant. We construct a step-indexed model built over traces and prove the soundness of System M relative to the model. System M is the first program logic that allows proofs of safety for programs that execute adversary-supplied code without forcing the adversarial code to be available for deep static analysis. System M can be used to model and verify protocols as well as system designs. We demonstrate the reasoning principles of System M by verifying the state integrity property of the design of Memoir, a previously proposed trusted computing system.

2017-02-21
Q. Wang, Y. Ren, M. Scaperoth, G. Parmer.  2015.  "SPeCK: a kernel for scalable predictability". 21st IEEE Real-Time and Embedded Technology and Applications Symposium. :121-132.

Multi- and many-core systems are increasingly prevalent in embedded systems. Additionally, isolation requirements between different partitions and criticalities are gaining in importance. This difficult combination is not well addressed by current software systems. Parallel systems require consistency guarantees on shared data-structures often provided by locks that use predictable resource sharing protocols. However, as the number of cores increase, even a single shared cache-line (e.g. for the lock) can cause significant interference. In this paper, we present a clean-slate design of the SPeCK kernel, the next generation of our COMPOSITE OS, that attempts to provide a strong version of scalable predictability - where predictability bounds made on a single core, remain constant with an increase in cores. Results show that, despite using a non-preemptive kernel, it has strong scalable predictability, low average-case overheads, and demonstrates better response-times than a state-of-the-art preemptive system.

2017-02-14
J. Brynielsson, R. Sharma.  2015.  "Detectability of low-rate HTTP server DoS attacks using spectral analysis". 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). :954-961.

Denial-of-Service (DoS) attacks pose a threat to any service provider on the internet. While traditional DoS flooding attacks require the attacker to control at least as much resources as the service provider in order to be effective, so-called low-rate DoS attacks can exploit weaknesses in careless design to effectively deny a service using minimal amounts of network traffic. This paper investigates one such weakness found within version 2.2 of the popular Apache HTTP Server software. The weakness concerns how the server handles the persistent connection feature in HTTP 1.1. An attack simulator exploiting this weakness has been developed and shown to be effective. The attack was then studied with spectral analysis for the purpose of examining how well the attack could be detected. Similar to other papers on spectral analysis of low-rate DoS attacks, the results show that disproportionate amounts of energy in the lower frequencies can be detected when the attack is present. However, by randomizing the attack pattern, an attacker can efficiently reduce this disproportion to a degree where it might be impossible to correctly identify an attack in a real world scenario.

2015-05-06
Balkesen, C., Teubner, J., Alonso, G., Ozsu, M.T..  2014.  Main-Memory Hash Joins on Modern Processor Architectures. Knowledge and Data Engineering, IEEE Transactions on. PP:1-1.

Existing main-memory hash join algorithms for multi-core can be classified into two camps. Hardware-oblivious hash join variants do not depend on hardware-specific parameters. Rather, they consider qualitative characteristics of modern hardware and are expected to achieve good performance on any technologically similar platform. The assumption behind these algorithms is that hardware is now good enough at hiding its own limitations-through automatic hardware prefetching, out-of-order execution, or simultaneous multi-threading (SMT)-to make hardware-oblivious algorithms competitive without the overhead of carefully tuning to the underlying hardware. Hardware-conscious implementations, such as (parallel) radix join, aim to maximally exploit a given architecture by tuning the algorithm parameters (e.g., hash table sizes) to the particular features of the architecture. The assumption here is that explicit parameter tuning yields enough performance advantages to warrant the effort required. This paper compares the two approaches under a wide range of workloads (relative table sizes, tuple sizes, effects of sorted data, etc.) and configuration parameters (VM page sizes, number of threads, number of cores, SMT, SIMD, prefetching, etc.). The results show that hardware-conscious algorithms generally outperform hardware-oblivious ones. However, on specific workloads and special architectures with aggressive simultaneous multi-threading, hardware-oblivious algorithms are competitive. The main conclusion of the paper is that, in existing multi-core architectures, it is still important to carefully tailor algorithms to the underlying hardware to get the necessary performance. But processor developments may require to revisit this conclusion in the future.