Visible to the public Biblio

Filters: Keyword is Program processors  [Clear All Filters]
2020-03-04
Voronych, Artur, Nyckolaychuk, Lyubov, Vozna, Nataliia, Pastukh, Taras.  2019.  Methods and Special Processors of Entropy Signal Processing. 2019 IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM). :1–4.

The analysis of applied tasks and methods of entropy signal processing are carried out in this article. The theoretical comments about the specific schemes of special processors for the determination of probability and correlation activity are given. The perspective of the influence of probabilistic entropy of C. Shannon as cipher signal receivers is reviewed. Examples of entropy-manipulated signals and system characteristics of the proposed special processors are given.

2020-02-18
Pasyeka, Mykola, Sheketa, Vasyl, Pasieka, Nadiia, Chupakhina, Svitlana, Dronyuk, Ivanna.  2019.  System Analysis of Caching Requests on Network Computing Nodes. 2019 3rd International Conference on Advanced Information and Communications Technologies (AICT). :216–222.

A systematic study of technologies and concepts used for the design and construction of distributed fail-safe web systems has been conducted. The general principles of the design of distributed web-systems and information technologies that are used in the design of web-systems are considered. As a result of scientific research, it became clear that data backup is a determining attribute of most web systems serving. Thus, the main role in building modern web systems is to scaling them. Scaling in distributed systems is used when performing a particular operation requires a large amount of computing resources. There are two scaling options, namely vertical and horizontal. Vertical scaling is to increase the performance of existing components in order to increase overall productivity. However, for the construction of distributed systems, use horizontal scaling. Horizontal scaling is that the system is split into small components and placed on various physical computers. This approach allows the addition of new nodes to increase the productivity of the web system as a whole.

2019-12-02
Kelly, Daniel M., Wellons, Christopher C., Coffman, Joel, Gearhart, Andrew S..  2019.  Automatically Validating the Effectiveness of Software Diversity Schemes. 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks – Supplemental Volume (DSN-S). :1–2.
Software diversity promises to invert the current balance of power in cybersecurity by preventing exploit reuse. Nevertheless, the comparative evaluation of diversity techniques has received scant attention. In ongoing work, we use the DARPA Cyber Grand Challenge (CGC) environment to assess the effectiveness of diversifying compilers in mitigating exploits. Our approach provides a quantitative comparison of diversity strategies and demonstrates wide variation in their effectiveness.
Besson, Frédéric, Dang, Alexandre, Jensen, Thomas.  2019.  Information-Flow Preservation in Compiler Optimisations. 2019 IEEE 32nd Computer Security Foundations Symposium (CSF). :230–23012.

Correct compilers perform program transformations preserving input/output behaviours of programs. Yet, correctness does not prevent program optimisations from introducing information-flow leaks that would make the target program more vulnerable to side-channel attacks than the source program. To tackle this problem, we propose a notion of Information-Flow Preserving (IFP) program transformation which ensures that a target program is no more vulnerable to passive side-channel attacks than a source program. To protect against a wide range of attacks, we model an attacker who is granted arbitrary memory accesses for a pre-defined set of observation points. We propose a compositional proof principle for proving that a transformation is IFP. Using this principle, we show how a translation validation technique can be used to automatically verify and even close information-flow leaks introduced by standard compiler passes such as dead-store elimination and register allocation. The technique has been experimentally validated on the CompCert C compiler.

Simon, Laurent, Chisnall, David, Anderson, Ross.  2018.  What You Get is What You C: Controlling Side Effects in Mainstream C Compilers. 2018 IEEE European Symposium on Security and Privacy (EuroS P). :1–15.
Security engineers have been fighting with C compilers for years. A careful programmer would test for null pointer dereferencing or division by zero; but the compiler would fail to understand, and optimize the test away. Modern compilers now have dedicated options to mitigate this. But when a programmer tries to control side effects of code, such as to make a cryptographic algorithm execute in constant time, the problem remains. Programmers devise complex tricks to obscure their intentions, but compiler writers find ever smarter ways to optimize code. A compiler upgrade can suddenly and without warning open a timing channel in previously secure code. This arms race is pointless and has to stop. We argue that we must stop fighting the compiler, and instead make it our ally. As a starting point, we analyze the ways in which compiler optimization breaks implicit properties of crypto code; and add guarantees for two of these properties in Clang/LLVM. Our work explores what is actually involved in controlling side effects on modern CPUs with a standard toolchain. Similar techniques can and should be applied to other security properties; achieving intentions by compiler commands or annotations makes them explicit, so we can reason about them. It is already understood that explicitness is essential for cryptographic protocol security and for compiler performance; it is essential for language security too. We therefore argue that this should be only the first step in a sustained engineering effort.
2019-10-14
Kocher, P., Horn, J., Fogh, A., Genkin, D., Gruss, D., Haas, W., Hamburg, M., Lipp, M., Mangard, S., Prescher, T. et al..  2019.  Spectre Attacks: Exploiting Speculative Execution. 2019 IEEE Symposium on Security and Privacy (SP). :1–19.

Modern processors use branch prediction and speculative execution to maximize performance. For example, if the destination of a branch depends on a memory value that is in the process of being read, CPUs will try to guess the destination and attempt to execute ahead. When the memory value finally arrives, the CPU either discards or commits the speculative computation. Speculative logic is unfaithful in how it executes, can access the victim's memory and registers, and can perform operations with measurable side effects. Spectre attacks involve inducing a victim to speculatively perform operations that would not occur during correct program execution and which leak the victim's confidential information via a side channel to the adversary. This paper describes practical attacks that combine methodology from side channel attacks, fault attacks, and return-oriented programming that can read arbitrary memory from the victim's process. More broadly, the paper shows that speculative execution implementations violate the security assumptions underpinning numerous software security mechanisms, including operating system process separation, containerization, just-in-time (JIT) compilation, and countermeasures to cache timing and side-channel attacks. These attacks represent a serious threat to actual systems since vulnerable speculative execution capabilities are found in microprocessors from Intel, AMD, and ARM that are used in billions of devices. While makeshift processor-specific countermeasures are possible in some cases, sound solutions will require fixes to processor designs as well as updates to instruction set architectures (ISAs) to give hardware architects and software developers a common understanding as to what computation state CPU implementations are (and are not) permitted to leak.

Tymburibá, M., Sousa, H., Pereira, F..  2019.  Multilayer ROP Protection Via Microarchitectural Units Available in Commodity Hardware. 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :315–327.

This paper presents a multilayer protection approach to guard programs against Return-Oriented Programming (ROP) attacks. Upper layers validate most of a program's control flow at a low computational cost; thus, not compromising runtime. Lower layers provide strong enforcement guarantees to handle more suspicious flows; thus, enhancing security. Our multilayer system combines techniques already described in the literature with verifications that we introduce in this paper. We argue that modern versions of x86 processors already provide the microarchitectural units necessary to implement our technique. We demonstrate the effectiveness of our multilayer protection on a extensive suite of benchmarks, which includes: SPEC CPU2006; the three most popular web browsers; 209 benchmarks distributed with LLVM and four well-known systems shown to be vulnerable to ROP exploits. Our experiments indicate that we can protect programs with almost no overhead in practice, allying the good performance of lightweight security techniques with the high dependability of heavyweight approaches.

2019-05-01
Li, J. H., Schafer, D., Whelihan, D., Lassini, S., Evancich, N., Kwak, K. J., Vai, M., Whitman, H..  2018.  Designing Secure and Resilient Embedded Avionics Systems. 2018 IEEE Cybersecurity Development (SecDev). :139–139.

Over the past decade, the reliance on Unmanned Aerial Systems (UAS) to carry out critical missions has grown drastically. With an increased reliance on UAS as mission assets and the dependency of UAS on cyber resources, cyber security of UAS must be improved by adopting sound security principles and relevant technologies from the computing community. On the other hand, the traditional avionics community, being aware of the importance of cyber security, is looking at new architecture and designs that can accommodate both the traditional safety oriented principles as well as the cyber security principles and techniques. It is with the effective and timely convergence of these domains that a holistic approach and co-design can meet the unique requirements of modern systems and operations. In this paper, authors from both the cyber security and avionics domains describe our joint effort and insights obtained during the course of designing secure and resilient embedded avionics systems.

2019-03-18
Condé, R. C. R., Maziero, C. A., Will, N. C..  2018.  Using Intel SGX to Protect Authentication Credentials in an Untrusted Operating System. 2018 IEEE Symposium on Computers and Communications (ISCC). :00158–00163.
An important principle in computational security is to reduce the attack surface, by maintaining the Trusted Computing Base (TCB) small. Even so, no security technique ensures full protection against any adversary. Thus, sensitive applications should be designed with several layers of protection so that, even if a layer might be violated, sensitive content will not be compromised. In 2015, Intel released the Software Guard Extensions (SGX) technology in its processors. This mechanism allows applications to allocate enclaves, which are private memory regions that can hold code and data. Other applications and even privileged code, like the OS kernel and the BIOS, are not able to access enclaves' contents. This paper presents a novel password file protection scheme, which uses Intel SGX to protect authentication credentials in the PAM authentication framework, commonly used in UNIX systems. We defined and implemented an SGX-enabled version of the pam\_unix.so authentication module, called UniSGX. This module uses an SGX enclave to handle the credentials informed by the user and to check them against the password file. To add an extra security layer, the password file is stored using SGX sealing. A threat model was proposed to assess the security of the proposed solution. The obtained results show that the proposed solution is secure against the threat model considered, and that its performance overhead is acceptable from the user point of view. The scheme presented here is also suitable to other authentication frameworks.
2019-03-06
Leung, C. K., Hoi, C. S. H., Pazdor, A. G. M., Wodi, B. H., Cuzzocrea, A..  2018.  Privacy-Preserving Frequent Pattern Mining from Big Uncertain Data. 2018 IEEE International Conference on Big Data (Big Data). :5101-5110.
As we are living in the era of big data, high volumes of wide varieties of data which may be of different veracity (e.g., precise data, imprecise and uncertain data) are easily generated or collected at a high velocity in many real-life applications. Embedded in these big data is valuable knowledge and useful information, which can be discovered by big data science solutions. As a popular data science task, frequent pattern mining aims to discover implicit, previously unknown and potentially useful information and valuable knowledge in terms of sets of frequently co-occurring merchandise items and/or events. Many of the existing frequent pattern mining algorithms use a transaction-centric mining approach to find frequent patterns from precise data. However, there are situations in which an item-centric mining approach is more appropriate, and there are also situations in which data are imprecise and uncertain. Hence, in this paper, we present an item-centric algorithm for mining frequent patterns from big uncertain data. In recent years, big data have been gaining the attention from the research community as driven by relevant technological innovations (e.g., clouds) and novel paradigms (e.g., social networks). As big data are typically published online to support knowledge management and fruition processes, these big data are usually handled by multiple owners with possible secure multi-part computation issues. Thus, privacy and security of big data has become a fundamental problem in this research context. In this paper, we present, not only an item-centric algorithm for mining frequent patterns from big uncertain data, but also a privacy-preserving algorithm. In other words, we present- in this paper-a privacy-preserving item-centric algorithm for mining frequent patterns from big uncertain data. Results of our analytical and empirical evaluation show the effectiveness of our algorithm in mining frequent patterns from big uncertain data in a privacy-preserving manner.
2019-02-18
Singh, S., Saini, H. S..  2018.  Security approaches for data aggregation in Wireless Sensor Networks against Sybil Attack. 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT). :190–193.
A wireless sensor network consists of many important elements like Sensors, Bass station and User. A Sensor can measure many non electrical quantities like pressure, temperature, sound, etc and transmit this information to the base station by using internal transreceiver. A security of this transmitted data is very important as the data may contain important information. As wireless sensor network have many application in the military and civil domains so security of wireless sensor network become a critical concern. A Sybil attack is one of critical attack which can affect the routing protocols, fair resourse allocation, data aggregation and misbehavior detection parameters of network. A number of detection techniques to detect Sybil nodes have already designed to overcome the Sybil attack. Out of all the techniques few techniques which can improve the true detection rate and reduce false detection rate are discussed in this paper.
2019-02-13
Dessouky, G., Abera, T., Ibrahim, A., Sadeghi, A..  2018.  LiteHAX: Lightweight Hardware-Assisted Attestation of Program Execution. 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). :1–8.

Unlike traditional processors, embedded Internet of Things (IoT) devices lack resources to incorporate protection against modern sophisticated attacks resulting in critical consequences. Remote attestation (RA) is a security service to establish trust in the integrity of a remote device. While conventional RA is static and limited to detecting malicious modification to software binaries at load-time, recent research has made progress towards runtime attestation, such as attesting the control flow of an executing program. However, existing control-flow attestation schemes are inefficient and vulnerable to sophisticated data-oriented programming (DOP) attacks subvert these schemes and keep the control flow of the code intact. In this paper, we present LiteHAX, an efficient hardware-assisted remote attestation scheme for RISC-based embedded devices that enables detecting both control-flow attacks as well as DOP attacks. LiteHAX continuously tracks both the control-flow and data-flow events of a program executing on a remote device and reports them to a trusted verifying party. We implemented and evaluated LiteHAX on a RISC-V System-on-Chip (SoC) and show that it has minimal performance and area overhead.

2018-09-12
Domínguez, A., Carballo, P. P., Núñez, A..  2017.  Programmable SoC platform for deep packet inspection using enhanced Boyer-Moore algorithm. 2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC). :1–8.

This paper describes the work done to design a SoC platform for real-time on-line pattern search in TCP packets for Deep Packet Inspection (DPI) applications. The platform is based on a Xilinx Zynq programmable SoC and includes an accelerator that implements a pattern search engine that extends the original Boyer-Moore algorithm with timing and logical rules, that produces a very complex set of rules. Also, the platform implements different modes of operation, including SIMD and MISD parallelism, which can be configured on-line. The platform is scalable depending of the analysis requirement up to 8 Gbps. High-Level synthesis and platform based design methodologies have been used to reduce the time to market of the completed system.

2018-06-07
Whatmough, P. N., Lee, S. K., Lee, H., Rama, S., Brooks, D., Wei, G. Y..  2017.  14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with \#x003E;0.1 timing error rate tolerance for IoT applications. 2017 IEEE International Solid-State Circuits Conference (ISSCC). :242–243.

This paper presents a 28nm SoC with a programmable FC-DNN accelerator design that demonstrates: (1) HW support to exploit data sparsity by eliding unnecessary computations (4× energy reduction); (2) improved algorithmic error tolerance using sign-magnitude number format for weights and datapath computation; (3) improved circuit-level timing violation tolerance in datapath logic via timeborrowing; (4) combined circuit and algorithmic resilience with Razor timing violation detection to reduce energy via VDD scaling or increase throughput via FCLK scaling; and (5) high classification accuracy (98.36% for MNIST test set) while tolerating aggregate timing violation rates \textbackslashtextgreater10-1. The accelerator achieves a minimum energy of 0.36μJ/pred at 667MHz, maximum throughput at 1.2GHz and 0.57μJ/pred, or a 10%-margined operating point at 1GHz and 0.58μJ/pred.

2018-02-21
Bai, Xu, Jiang, Lei, Dai, Qiong, Yang, Jiajia, Tan, Jianlong.  2017.  Acceleration of RSA processes based on hybrid ARM-FPGA cluster. 2017 IEEE Symposium on Computers and Communications (ISCC). :682–688.

Cooperation of software and hardware with hybrid architectures, such as Xilinx Zynq SoC combining ARM CPU and FPGA fabric, is a high-performance and low-power platform for accelerating RSA Algorithm. This paper adopts the none-subtraction Montgomery algorithm and the Chinese Remainder Theorem (CRT) to implement high-speed RSA processors, and deploys a 48-node cluster infrastructure based on Zynq SoC to achieve extremely high scalability and throughput of RSA computing. In this design, we use the ARM to implement node-to-node communication with the Message Passing Interface (MPI) while use the FPGA to handle complex calculation. Finally, the experimental results show that the overall performance is linear with the number of nodes. And the cluster achieves 6× 9× speedup against a multi-core desktop (Intel i7-3770) and comparable performance to a many-core server (288-core). In addition, we gain up to 2.5× energy efficiency compared to these two traditional platforms.

Kinsy, M. A., Khadka, S., Isakov, M., Farrukh, A..  2017.  Hermes: Secure heterogeneous multicore architecture design. 2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). :14–20.

The emergence of general-purpose system-on-chip (SoC) architectures has given rise to a number of significant security challenges. The current trend in SoC design is system-level integration of heterogeneous technologies consisting of a large number of processing elements such as programmable RISC cores, memory, DSPs, and accelerator function units/ASIC. These processing elements may come from different providers, and application executable code may have varying levels of trust. Some of the pressing architecture design questions are: (1) how to implement multi-level user-defined security; (2) how to optimally and securely share resources and data among processing elements. In this work, we develop a secure multicore architecture, named Hermes. It represents a new architectural framework that integrates multiple processing elements (called tenants) of secure and non-secure cores into the same chip design while (a) maintaining individual tenant security, (b) preventing data leakage and corruption, and (c) promoting collaboration among the tenants. The Hermes architecture is based on a programmable secure router interface and a trust-aware routing algorithm. With 17% hardware overhead, it enables the implementation of processing-element-oblivious secure multicore systems with a programmable distributed group key management scheme.

Silva, M. R., Zeferino, C. A..  2017.  Confidentiality and Authenticity in a Platform Based on Network-on-Chip. 2017 VII Brazilian Symposium on Computing Systems Engineering (SBESC). :225–230.

In many-core systems, the processing elements are interconnected using Networks-on-Chip. An example of on-chip network is SoCIN, a low-cost interconnect architecture whose original design did not take into account security aspects. This network is vulnerable to eavesdropping and spoofing attacks, what limits its use in systems that require security. This work addresses this issue and aims to ensure the security properties of confidentiality and authenticity of SoCIN-based systems. For this, we propose the use of security mechanisms based on symmetric encryption at the network level using the AES (Advanced Encryption Standard) model. A reference multi-core platform was implemented and prototyped in programmable logic aiming at performing experiments to evaluate the implemented mechanisms. Results demonstrate the effectiveness of the proposed solution in protecting the system against the target attacks. The impact on the network performance is acceptable and the silicon overhead is equivalent to other solutions found in the literature.

Zheng, H., Zhang, X..  2017.  Optimizing Task Assignment with Minimum Cost on Heterogeneous Embedded Multicore Systems Considering Time Constraint. 2017 ieee 3rd international conference on big data security on cloud (bigdatasecurity), ieee international conference on high performance and smart computing (hpsc), and ieee international conference on intelligent data and security (ids). :225–230.
Time and cost are the most critical performance metrics for computer systems including embedded system, especially for the battery-based embedded systems, such as PC, mainframe computer, and smart phone. Most of the previous work focuses on saving energy in a deterministic way by taking the average or worst scenario into account. However, such deterministic approaches usually are inappropriate in modeling energy consumption because of uncertainties in conditional instructions on processors and time-varying external environments. Through studying the relationship between energy consumption, execution time and completion probability of tasks on heterogeneous multi-core architectures this paper proposes an optimal energy efficiency and system performance model and the OTHAP (Optimizing Task Heterogeneous Assignment with Probability) algorithm to address the Processor and Voltage Assignment with Probability (PVAP) problem of data-dependent aperiodic tasks in real-time embedded systems, ensuring that all the tasks can be done under the time constraint with areal-time embedded systems guaranteed probability. We adopt a task DAG (Directed Acyclic Graph) to model the PVAP problem. We first use a processor scheduling algorithm to map the task DAG onto a set of voltage-variable processors, and then use our dynamic programming algorithm to assign a proper voltage to each task and The experimental results demonstrate our approach outperforms state-of-the-art algorithms in this field (maximum improvement of 24.6%).
2018-02-14
Dou, C., Chen, W. H., Chen, Y. J., Lin, H. T., Lin, W. Y., Ho, M. S., Chang, M. F..  2017.  Challenges of emerging memory and memristor based circuits: Nonvolatile logics, IoT security, deep learning and neuromorphic computing. 2017 IEEE 12th International Conference on ASIC (ASICON). :140–143.

Emerging nonvolatile memory (NVM) devices are not limited to build nonvolatile memory macros. They can also be used in developing nonvolatile logics (nvLogics) for nonvolatile processors, security circuits for the internet of things (IoT), and computing-in-memory (CIM) for artificial intelligence (AI) chips. This paper explores the challenges in circuit designs of emerging memory devices for application in nonvolatile logics, security circuits, and CIM for deep neural networks (DNN). Several silicon-verified examples of these circuits are reviewed in this paper.

2018-01-10
Patrignani, M., Garg, D..  2017.  Secure Compilation and Hyperproperty Preservation. 2017 IEEE 30th Computer Security Foundations Symposium (CSF). :392–404.

The area of secure compilation aims to design compilers which produce hardened code that can withstand attacks from low-level co-linked components. So far, there is no formal correctness criterion for secure compilers that comes with a clear understanding of what security properties the criterion actually provides. Ideally, we would like a criterion that, if fulfilled by a compiler, guarantees that large classes of security properties of source language programs continue to hold in the compiled program, even as the compiled program is run against adversaries with low-level attack capabilities. This paper provides such a novel correctness criterion for secure compilers, called trace-preserving compilation (TPC). We show that TPC preserves a large class of security properties, namely all safety hyperproperties. Further, we show that TPC preserves more properties than full abstraction, the de-facto criterion used for secure compilation. Then, we show that several fully abstract compilers described in literature satisfy an additional, common property, which implies that they also satisfy TPC. As an illustration, we prove that a fully abstract compiler from a typed source language to an untyped target language satisfies TPC.

2017-04-20
Tan, B., Biglari-Abhari, M., Salcic, Z..  2016.  A system-level security approach for heterogeneous MPSoCs. 2016 Conference on Design and Architectures for Signal and Image Processing (DASIP). :74–81.

Embedded systems are becoming increasingly complex as designers integrate different functionalities into a single application for execution on heterogeneous hardware platforms. In this work we propose a system-level security approach in order to provide isolation of tasks without the need to trust a central authority at run-time. We discuss security requirements that can be found in complex embedded systems that use heterogeneous execution platforms, and by regulating memory access we create mechanisms that allow safe use of shared IP with direct memory access, as well as shared libraries. We also present a prototype Isolation Unit that checks memory transactions and allows for dynamic configuration of permissions.

2017-03-08
Perez, R..  2015.  Silicon systems security and building a root of trust. 2015 IEEE Asian Solid-State Circuits Conference (A-SSCC). :1–4.

This paper briefly presents a position that hardware-based roots of trust, integrated in silicon with System-on-Chip (SoC) solutions, represent the most current stage in a progression of technologies aimed at realizing the most foundational computer security concepts. A brief look at this historical progression from a personal perspective is followed by an overview of more recent developments, with particular focus on a root of trust for cryptographic key provisioning and SoC feature management aimed at achieving supply chain assurances and serves as a basis for trust that is linked to properties enforced in hardware. The author assumes no prior knowledge of these concepts and developments by the reader.

2015-05-06
Kishore, N., Kapoor, B..  2014.  An efficient parallel algorithm for hash computation in security and forensics applications. Advance Computing Conference (IACC), 2014 IEEE International. :873-877.

Hashing algorithms are used extensively in information security and digital forensics applications. This paper presents an efficient parallel algorithm hash computation. It's a modification of the SHA-1 algorithm for faster parallel implementation in applications such as the digital signature and data preservation in digital forensics. The algorithm implements recursive hash to break the chain dependencies of the standard hash function. We discuss the theoretical foundation for the work including the collision probability and the performance implications. The algorithm is implemented using the OpenMP API and experiments performed using machines with multicore processors. The results show a performance gain by more than a factor of 3 when running on the 8-core configuration of the machine.

Kishore, N., Kapoor, B..  2014.  An efficient parallel algorithm for hash computation in security and forensics applications. Advance Computing Conference (IACC), 2014 IEEE International. :873-877.


Hashing algorithms are used extensively in information security and digital forensics applications. This paper presents an efficient parallel algorithm hash computation. It's a modification of the SHA-1 algorithm for faster parallel implementation in applications such as the digital signature and data preservation in digital forensics. The algorithm implements recursive hash to break the chain dependencies of the standard hash function. We discuss the theoretical foundation for the work including the collision probability and the performance implications. The algorithm is implemented using the OpenMP API and experiments performed using machines with multicore processors. The results show a performance gain by more than a factor of 3 when running on the 8-core configuration of the machine.
 

2015-05-04
Gimenez, A., Gamblin, T., Rountree, B., Bhatele, A., Jusufi, I., Bremer, P.-T., Hamann, B..  2014.  Dissecting On-Node Memory Access Performance: A Semantic Approach. High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for. :166-176.

Optimizing memory access is critical for performance and power efficiency. CPU manufacturers have developed sampling-based performance measurement units (PMUs) that report precise costs of memory accesses at specific addresses. However, this data is too low-level to be meaningfully interpreted and contains an excessive amount of irrelevant or uninteresting information. We have developed a method to gather fine-grained memory access performance data for specific data objects and regions of code with low overhead and attribute semantic information to the sampled memory accesses. This information provides the context necessary to more effectively interpret the data. We have developed a tool that performs this sampling and attribution and used the tool to discover and diagnose performance problems in real-world applications. Our techniques provide useful insight into the memory behaviour of applications and allow programmers to understand the performance ramifications of key design decisions: domain decomposition, multi-threading, and data motion within distributed memory systems.