Visible to the public Biblio

Filters: Keyword is DRAM chips  [Clear All Filters]
2021-03-22
Jeong, S., Kang, S., Yang, J.-S..  2020.  PAIR: Pin-aligned In-DRAM ECC architecture using expandability of Reed-Solomon code. 2020 57th ACM/IEEE Design Automation Conference (DAC). :1–6.
The computation speed of computer systems is getting faster and the memory has been enhanced in performance and density through process scaling. However, due to the process scaling, DRAMs are recently suffering from numerous inherent faults. DRAM vendors suggest In-DRAM Error Correcting Code (IECC) to cope with the unreliable operation. However, the conventional IECC schemes have concerns about miscorrection and performance degradation. This paper proposes a pin-aligned In-DRAM ECC architecture using the expandability of a Reed-Solomon code (PAIR), that aligns ECC codewords with DQ pin lines (data passage of DRAM). PAIR is specialized in managing widely distributed inherent faults without the performance degradation, and its correction capability is sufficient to correct burst errors as well. The experimental results analyzed with the latest DRAM model show that the proposed architecture achieves up to 106 times higher reliability than XED with 14% performance improvement, and 10 times higher reliability than DUO with a similar performance, on average.
2020-10-30
Xu, Lai, Yu, Rongwei, Wang, Lina, Liu, Weijie.  2019.  Memway: in-memorywaylaying acceleration for practical rowhammer attacks against binaries. Tsinghua Science and Technology. 24:535—545.

The Rowhammer bug is a novel micro-architectural security threat, enabling powerful privilege-escalation attacks on various mainstream platforms. It works by actively flipping bits in Dynamic Random Access Memory (DRAM) cells with unprivileged instructions. In order to set up Rowhammer against binaries in the Linux page cache, the Waylaying algorithm has previously been proposed. The Waylaying method stealthily relocates binaries onto exploitable physical addresses without exhausting system memory. However, the proof-of-concept Waylaying algorithm can be easily detected during page cache eviction because of its high disk I/O overhead and long running time. This paper proposes the more advanced Memway algorithm, which improves on Waylaying in terms of both I/O overhead and speed. Running time and disk I/O overhead are reduced by 90% by utilizing Linux tmpfs and inmemory swapping to manage eviction files. Furthermore, by combining Memway with the unprivileged posix fadvise API, the binary relocation step is made 100 times faster. Equipped with our Memway+fadvise relocation scheme, we demonstrate practical Rowhammer attacks that take only 15-200 minutes to covertly relocate a victim binary, and less than 3 seconds to flip the target instruction bit.

2020-04-06
Chin, Paul, Cao, Yuan, Zhao, Xiaojin, Zhang, Leilei, Zhang, Fan.  2019.  Locking Secret Data in the Vault Leveraging Fuzzy PUFs. 2019 Asian Hardware Oriented Security and Trust Symposium (AsianHOST). :1–6.

Physical Unclonable Functions (PUFs) are considered as an attractive low-cost security anchor. The unique features of PUFs are dependent on the Nanoscale variations introduced during the manufacturing variations. Most PUFs exhibit an unreliability problem due to aging and inherent sensitivity to the environmental conditions. As a remedy to the reliability issue, helper data algorithms are used in practice. A helper data algorithm generates and stores the helper data in the enrollment phase in a secure environment. The generated helper data are used then for error correction, which can transform the unique feature of PUFs into a reproducible key. The key can be used to encrypt secret data in the security scheme. In contrast, this work shows that the fuzzy PUFs can be used to secret important data directly by an error-tolerant protocol without the enrollment phase and error-correction algorithm. In our proposal, the secret data is locked in a vault leveraging the unique fuzzy pattern of PUF. Although the noise exists, the data can then be released only by this unique PUF. The evaluation was performed on the most prominent intrinsic PUF - DRAM PUF. The test results demonstrate that our proposal can reach an acceptable reconstruction rate in various environment. Finally, the security analysis of the new proposal is discussed.

2019-12-02
Li, Congwu, Lin, Jingqiang, Cai, Quanwei, Luo, Bo.  2018.  Peapods: OS-Independent Memory Confidentiality for Cryptographic Engines. 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom). :862–869.
Cryptography is widely adopted in computer systems to protect the confidentiality of sensitive information. The security relies on the assumption that cryptography keys are never leaked, which may be broken by the memory disclosure attacks, e.g., the Heartbleed and coldboot attacks. Various schemes are proposed to defend against memory disclosure attacks, e.g., performing the cryptographic computations in registers, or adopting the hardware features (e.g., Intel TSX and Intel SGX) to ensure that the plaintext of the cryptography key never appears in memory. However, these schemes are still not widely deployed due to the following limitations: (a) Most of the schemes are deployed in the OS kernel and require the root (or administrator) privileges of the host; and (b) They require the programmers to integrate these protection schemes in the implementation of different cryptography algorithms on different platforms. In this paper, we propose a tool implemented in Clang/LLVM, named Peapods, which provides the user-mode protection for cryptographic keys in software engines. It introduces one qualifier and three intrinsics for the programmers to specify the sensitive variables and code fragments to be protected, making it easier to be deployed. Peapods adopts transactional memory to protect cryptographic keys, while it is OS-independent and does not require the cryptographic computation performed in the OS kernel. Peapods supports the automatic protection between transactions for better performance. We have implemented the prototype of Peapods. Evaluation results demonstrate that Peapods achieves the design goals with a modest overhead (less than 10%).
2018-06-07
Marques, J., Andrade, J., Falcao, G..  2017.  Unreliable memory operation on a convolutional neural network processor. 2017 IEEE International Workshop on Signal Processing Systems (SiPS). :1–6.

The evolution of convolutional neural networks (CNNs) into more complex forms of organization, with additional layers, larger convolutions and increasing connections, established the state-of-the-art in terms of accuracy errors for detection and classification challenges in images. Moreover, as they evolved to a point where Gigabytes of memory are required for their operation, we have reached a stage where it becomes fundamental to understand how their inference capabilities can be impaired if data elements somehow become corrupted in memory. This paper introduces fault-injection in these systems by simulating failing bit-cells in hardware memories brought on by relaxing the 100% reliable operation assumption. We analyze the behavior of these networks calculating inference under severe fault-injection rates and apply fault mitigation strategies to improve on the CNNs resilience. For the MNIST dataset, we show that 8x less memory is required for the feature maps memory space, and that in sub-100% reliable operation, fault-injection rates up to 10-1 (with most significant bit protection) can withstand only a 1% error probability degradation. Furthermore, considering the offload of the feature maps memory to an embedded dynamic RAM (eDRAM) system, using technology nodes from 65 down to 28 nm, up to 73 80% improved power efficiency can be obtained.

2018-05-09
Shin, S., Tuck, J., Solihin, Y..  2017.  Hiding the Long Latency of Persist Barriers Using Speculative Execution. 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). :175–186.

Byte-addressable non-volatile memory technology is emerging as an alternative for DRAM for main memory. This new Non-Volatile Main Memory (NVMM) allows programmers to store important data in data structures in memory instead of serializing it to the file system, thereby providing a substantial performance boost. However, modern systems reorder memory operations and utilize volatile caches for better performance, making it difficult to ensure a consistent state in NVMM. Intel recently announced a new set of persistence instructions, clflushopt, clwb, and pcommit. These new instructions make it possible to implement fail-safe code on NVMM, but few workloads have been written or characterized using these new instructions. In this work, we describe how these instructions work and how they can be used to implement write-ahead logging based transactions. We implement several common data structures and kernels and evaluate the performance overhead incurred over traditional non-persistent implementations. In particular, we find that persistence instructions occur in clusters along with expensive fence operations, they have long latency, and they add a significant execution time overhead, on average by 20.3% over code with logging but without fence instructions to order persists. To deal with this overhead and alleviate the performance bottleneck, we propose to speculate past long latency persistency operations using checkpoint-based processing. Our speculative persistence architecture reduces the execution time overheads to only 3.6%.

2015-05-06
Kanizo, Y., Hay, D., Keslassy, I..  2015.  Maximizing the Throughput of Hash Tables in Network Devices with Combined SRAM/DRAM Memory. Parallel and Distributed Systems, IEEE Transactions on. 26:796-809.

Hash tables form a core component of many algorithms as well as network devices. Because of their large size, they often require a combined memory model, in which some of the elements are stored in a fast memory (for example, cache or on-chip SRAM) while others are stored in much slower memory (namely, the main memory or off-chip DRAM). This makes the implementation of real-life hash tables particularly delicate, as a suboptimal choice of the hashing scheme parameters may result in a higher average query time, and therefore in a lower throughput. In this paper, we focus on multiple-choice hash tables. Given the number of choices, we study the tradeoff between the load of a hash table and its average lookup time. The problem is solved by analyzing an equivalent problem: the expected maximum matching size of a random bipartite graph with a fixed left-side vertex degree. Given two choices, we provide exact results for any finite system, and also deduce asymptotic results as the fast memory size increases. In addition, we further consider other variants of this problem and model the impact of several parameters. Finally, we evaluate the performance of our models on Internet backbone traces, and illustrate the impact of the memories speed difference on the choice of parameters. In particular, we show that the common intuition of entirely avoiding slow memory accesses by using highly efficient schemes (namely, with many fast-memory choices) is not always optimal.
 

Hyesook Lim, Kyuhee Lim, Nara Lee, Kyong-Hye Park.  2014.  On Adding Bloom Filters to Longest Prefix Matching Algorithms. Computers, IEEE Transactions on. 63:411-423.

High-speed IP address lookup is essential to achieve wire-speed packet forwarding in Internet routers. Ternary content addressable memory (TCAM) technology has been adopted to solve the IP address lookup problem because of its ability to perform fast parallel matching. However, the applicability of TCAMs presents difficulties due to cost and power dissipation issues. Various algorithms and hardware architectures have been proposed to perform the IP address lookup using ordinary memories such as SRAMs or DRAMs without using TCAMs. Among the algorithms, we focus on two efficient algorithms providing high-speed IP address lookup: parallel multiple-hashing (PMH) algorithm and binary search on level algorithm. This paper shows how effectively an on-chip Bloom filter can improve those algorithms. A performance evaluation using actual backbone routing data with 15,000-220,000 prefixes shows that by adding a Bloom filter, the complicated hardware for parallel access is removed without search performance penalty in parallel-multiple hashing algorithm. Search speed has been improved by 30-40 percent by adding a Bloom filter in binary search on level algorithm.
 

2015-05-04
Hilgers, C., Macht, H., Muller, T., Spreitzenbarth, M..  2014.  Post-Mortem Memory Analysis of Cold-Booted Android Devices. IT Security Incident Management IT Forensics (IMF), 2014 Eighth International Conference on. :62-75.

As recently shown in 2013, Android-driven smartphones and tablet PCs are vulnerable to so-called cold boot attacks. With physical access to an Android device, forensic memory dumps can be acquired with tools like FROST that exploit the remanence effect of DRAM to read out what is left in memory after a short reboot. While FROST can in some configurations be deployed to break full disk encryption, encrypted user partitions are usually wiped during a cold boot attack, such that a post-mortem analysis of main memory remains the only source of digital evidence. Therefore, we provide an in-depth analysis of Android's memory structures for system and application level memory. To leverage FROST in the digital investigation process of Android cases, we provide open-source Volatility plugins to support an automated analysis and extraction of selected Dalvik VM memory structures.

2014-09-17
Chang Liu, Hicks, M., Shi, E..  2013.  Memory Trace Oblivious Program Execution. Computer Security Foundations Symposium (CSF), 2013 IEEE 26th. :51-65.

Cloud computing allows users to delegate data and computation to cloud service providers, at the cost of giving up physical control of their computing infrastructure. An attacker (e.g., insider) with physical access to the computing platform can perform various physical attacks, including probing memory buses and cold-boot style attacks. Previous work on secure (co-)processors provides hardware support for memory encryption and prevents direct leakage of sensitive data over the memory bus. However, an adversary snooping on the bus can still infer sensitive information from the memory access traces. Existing work on Oblivious RAM (ORAM) provides a solution for users to put all data in an ORAM; and accesses to an ORAM are obfuscated such that no information leaks through memory access traces. This method, however, incurs significant memory access overhead. This work is the first to leverage programming language techniques to offer efficient memory-trace oblivious program execution, while providing formal security guarantees. We formally define the notion of memory-trace obliviousness, and provide a type system for verifying that a program satisfies this property. We also describe a compiler that transforms a program into a structurally similar one that satisfies memory trace obliviousness. To achieve optimal efficiency, our compiler partitions variables into several small ORAM banks rather than one large one, without risking security. We use several example programs to demonstrate the efficiency gains our compiler achieves in comparison with the naive method of placing all variables in the same ORAM.