Visible to the public Biblio

Filters: Keyword is non-volatile memory  [Clear All Filters]
2022-03-01
Huang, Shanshi, Peng, Xiaochen, Jiang, Hongwu, Luo, Yandong, Yu, Shimeng.  2021.  Exploiting Process Variations to Protect Machine Learning Inference Engine from Chip Cloning. 2021 IEEE International Symposium on Circuits and Systems (ISCAS). :1–5.
Machine learning inference engine is of great interest to smart edge computing. Compute-in-memory (CIM) architecture has shown significant improvements in throughput and energy efficiency for hardware acceleration. Emerging nonvolatile memory (eNVM) technologies offer great potentials for instant on and off by dynamic power gating. Inference engine is typically pre-trained by the cloud and then being deployed to the field. There is a new security concern on cloning of the weights stored on eNVM-based CIM chip. In this paper, we propose a countermeasure to the weight cloning attack by exploiting the process variations of the periphery circuitry. In particular, we use weight fine-tuning to compensate the analog-to-digital converter (ADC) offset for a specific chip instance while inducing significant accuracy drop for cloned chip instances. We evaluate our proposed scheme on a CIFAR-10 classification task using a VGG- 8 network. Our results show that with precisely chosen transistor size on the employed SAR-ADC, we could maintain 88% 90% accuracy for the fine-tuned chip while the same set of weights cloned on other chips will only have 20 40% accuracy on average. The weight fine-tune could be completed within one epoch of 250 iterations. On average only 0.02%, 0.025%, 0.142% of cells are updated for 2-bit, 4-bit, 8-bit weight precisions in each iteration.
2018-03-19
Imani, Mohsen, Gupta, Saransh, Rosing, Tajana.  2017.  Ultra-Efficient Processing In-Memory for Data Intensive Applications. Proceedings of the 54th Annual Design Automation Conference 2017. :6:1–6:6.

Recent years have witnessed a rapid growth in the domain of Internet of Things (IoT). This network of billions of devices generates and exchanges huge amount of data. The limited cache capacity and memory bandwidth make transferring and processing such data on traditional CPUs and GPUs highly inefficient, both in terms of energy consumption and delay. However, many IoT applications are statistical at heart and can accept a part of inaccuracy in their computation. This enables the designers to reduce complexity of processing by approximating the results for a desired accuracy. In this paper, we propose an ultra-efficient approximate processing in-memory architecture, called APIM, which exploits the analog characteristics of non-volatile memories to support addition and multiplication inside the crossbar memory, while storing the data. The proposed design eliminates the overhead involved in transferring data to processor by virtually bringing the processor inside memory. APIM dynamically configures the precision of computation for each application in order to tune the level of accuracy during runtime. Our experimental evaluation running six general OpenCL applications shows that the proposed design achieves up to 20x performance improvement and provides 480x improvement in energy-delay product, ensuring acceptable quality of service. In exact mode, it achieves 28x energy savings and 4.8x speed up compared to the state-of-the-art GPU cores.

2017-05-18
Boehm, Hans-J., Chakrabarti, Dhruva R..  2016.  Persistence Programming Models for Non-volatile Memory. Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management. :55–67.

It is expected that DRAM memory will be augmented, and perhaps eventually replaced, by one of several up-and-coming memory technologies. These are all non-volatile, in that they retain their contents without power. This allows primary memory to be used as a fast disk replacement. It also enables more aggressive programming models that directly leverage persistence of primary memory. However, it is challenging to maintain consistency of memory in such an environment. There is no consensus on the right programming model for doing so, and subtle differences can have large, and sometimes surprising, effects on the implementation and its performance. The existing literature describes multiple programming systems that provide point solutions to the selective persistence for user data structures. Real progress in this area requires a choice of programming model, which we cannot reasonably make without a real understanding of the design space. Point solutions are insufficient. We systematically explore what we consider to be the most promising part of the space, precisely defining semantics and identifying implementation costs. This allows us to be much more explicit and precise about semantic and implementation trade-offs that were usually glossed over in prior work. It also exposes some promising new design alternatives.