Visible to the public Biblio

Filters: Keyword is size 28.0 nm  [Clear All Filters]
2021-02-15
Hu, X., Deng, C., Yuan, B..  2020.  Reduced-Complexity Singular Value Decomposition For Tucker Decomposition: Algorithm And Hardware. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :1793–1797.
Tensors, as the multidimensional generalization of matrices, are naturally suited for representing and processing high-dimensional data. To date, tensors have been widely adopted in various data-intensive applications, such as machine learning and big data analysis. However, due to the inherent large-size characteristics of tensors, tensor algorithms, as the approaches that synthesize, transform or decompose tensors, are very computation and storage expensive, thereby hindering the potential further adoptions of tensors in many application scenarios, especially on the resource-constrained hardware platforms. In this paper, we propose a reduced-complexity SVD (Singular Vector Decomposition) scheme, which serves as the key operation in Tucker decomposition. By using iterative self-multiplication, the proposed scheme can significantly reduce the storage and computational costs of SVD, thereby reducing the complexity of the overall process. Then, corresponding hardware architecture is developed with 28nm CMOS technology. Our synthesized design can achieve 102GOPS with 1.09 mm2 area and 37.6 mW power consumption, and thereby providing a promising solution for accelerating Tucker decomposition.
2018-06-07
Marques, J., Andrade, J., Falcao, G..  2017.  Unreliable memory operation on a convolutional neural network processor. 2017 IEEE International Workshop on Signal Processing Systems (SiPS). :1–6.

The evolution of convolutional neural networks (CNNs) into more complex forms of organization, with additional layers, larger convolutions and increasing connections, established the state-of-the-art in terms of accuracy errors for detection and classification challenges in images. Moreover, as they evolved to a point where Gigabytes of memory are required for their operation, we have reached a stage where it becomes fundamental to understand how their inference capabilities can be impaired if data elements somehow become corrupted in memory. This paper introduces fault-injection in these systems by simulating failing bit-cells in hardware memories brought on by relaxing the 100% reliable operation assumption. We analyze the behavior of these networks calculating inference under severe fault-injection rates and apply fault mitigation strategies to improve on the CNNs resilience. For the MNIST dataset, we show that 8x less memory is required for the feature maps memory space, and that in sub-100% reliable operation, fault-injection rates up to 10-1 (with most significant bit protection) can withstand only a 1% error probability degradation. Furthermore, considering the offload of the feature maps memory to an embedded dynamic RAM (eDRAM) system, using technology nodes from 65 down to 28 nm, up to 73 80% improved power efficiency can be obtained.