Sequence Modeling with Hierarchical Deep Generative Models with Dual Memory
Title | Sequence Modeling with Hierarchical Deep Generative Models with Dual Memory |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Zheng, Yanan, Wen, Lijie, Wang, Jianmin, Yan, Jun, Ji, Lei |
Conference Name | Proceedings of the 2017 ACM on Conference on Information and Knowledge Management |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4918-5 |
Keywords | dual memory mechanism, hierarchical deep generative models, Human Behavior, inference and learning, Metrics, pubcrawl, random key generation, resilience, Resiliency, Scalability, sequence modeling |
Abstract | Deep Generative Models (DGMs) are able to extract high-level representations from massive unlabeled data and are explainable from a probabilistic perspective. Such characteristics favor sequence modeling tasks. However, it still remains a huge challenge to model sequences with DGMs. Unlike real-valued data that can be directly fed into models, sequence data consist of discrete elements and require being transformed into certain representations first. This leads to the following two challenges. First, high-level features are sensitive to small variations of inputs as well as the way of representing data. Second, the models are more likely to lose long-term information during multiple transformations. In this paper, we propose a Hierarchical Deep Generative Model With Dual Memory to address the two challenges. Furthermore, we provide a method to efficiently perform inference and learning on the model. The proposed model extends basic DGMs with an improved hierarchically organized multi-layer architecture. Besides, our model incorporates memories along dual directions, respectively denoted as broad memory and deep memory. The model is trained end-to-end by optimizing a variational lower bound on data log-likelihood using the improved stochastic variational method. We perform experiments on several tasks with various datasets and obtain excellent results. The results of language modeling show our method significantly outperforms state-of-the-art results in terms of generative performance. Extended experiments including document modeling and sentiment analysis, prove the high-effectiveness of dual memory mechanism and latent representations. Text random generation provides a straightforward perception for advantages of our model. |
URL | https://dl.acm.org/citation.cfm?doid=3132847.3132952 |
DOI | 10.1145/3132847.3132952 |
Citation Key | zheng_sequence_2017 |