Biblio
Filters: Keyword is persistent memory [Clear All Filters]
Analysis and Mitigation of Data Sanitization Overhead in DAX File Systems. 2022 IEEE 40th International Conference on Computer Design (ICCD). :255–258.
.
2022. A direct access (DAX) file system maximizes the benefit of persistent memory(PM)’s low latency through removing the page cache layer from the file system access paths. However, this paper reveals that data block allocation of the DAX file systems in common is significantly slower than that of conventional file systems because the DAX file systems require the zero-out operation for the newly allocated blocks to prevent the leakage of old data previously stored in the allocated data blocks. The retarded block allocation significantly affects the file write performance. In addition to this revelation, this paper proposes an off-critical-path data block sanitization scheme tailored for DAX file systems. The proposed scheme detaches the zero-out operation from the latency-critical I/O path and performs that of released data blocks in the background. The proposed scheme’s design principle is universally applicable to most DAX file systems. For evaluation, we implemented our approach in Ext4-DAX and XFS-DAX. Our evaluation showed that the proposed scheme reduces the append write latency by 36.8%, and improved the performance of FileBench’s fileserver workload by 30.4%, YCSB’s workload A on RocksDB by 3.3%, and the Redis-benchmark by 7.4% on average, respectively.
ISSN: 2576-6996
Temporal Exposure Reduction Protection for Persistent Memory. 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). :908–924.
.
2022. The long-living nature and byte-addressability of persistent memory (PM) amplifies the importance of strong memory protections. This paper develops temporal exposure reduction protection (TERP) as a framework for enforcing memory safety. Aiming to minimize the time when a PM region is accessible, TERP offers a complementary dimension of memory protection. The paper gives a formal definition of TERP, explores the semantics space of TERP constructs, and the relations with security and composability in both sequential and parallel executions. It proposes programming system and architecture solutions for the key challenges for the adoption of TERP, which draws on novel supports in both compilers and hardware to efficiently meet the exposure time target. Experiments validate the efficacy of the proposed support of TERP, in both efficiency and exposure time minimization.
ISSN: 2378-203X
Romulus: Efficient Algorithms for Persistent Transactional Memory. Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures. :271–282.
.
2018. Byte addressable persistent memory eliminates the need for serialization and deserialization of data, to and from persistent storage, allowing applications to interact with it through common store and load instructions. In the event of a process or system failure, applications rely on persistent techniques to provide consistent storage of data in non-volatile memory (NVM). For most of these techniques, consistency is ensured through logging of updates, with consequent intensive cache line flushing and persistent fences necessary to guarantee correctness. Undo log based approaches require store interposition and persistence fences before each in-place modification. Redo log based techniques can execute transactions using just two persistence fences, although they require store and load interposition which may incur a performance penalty for large transactions. So far, these techniques have been difficult to integrate with known memory allocators, requiring allocators or garbage collectors specifically designed for NVM. We present Romulus, a user-level library persistent transactional memory (PTM) which provides durable transactions through the usage of twin copies of the data. A transaction in Romulus requires at most four persistence fences, regardless of the transaction size. Romulus uses only store interposition. Any sequential implementation of a memory allocator can be adapted to work with Romulus. Thanks to its lightweight design and low synchronization overhead, Romulus achieves twice the throughput of current state of the art PTMs in update-only workloads, and more than one order of magnitude in read-mostly scenarios.