A Hierarchy-to-Sequence Attentional Neural Machine Translation Model
Title | A Hierarchy-to-Sequence Attentional Neural Machine Translation Model |
Publication Type | Journal Article |
Year of Publication | 2018 |
Authors | Su, Jinsong, Zeng, Jiali, Xiong, Deyi, Liu, Yang, Wang, Mingxuan, Xie, Jun |
Journal | IEEE/ACM Transactions on Audio, Speech, and Language Processing |
Volume | 26 |
Pagination | 623—632 |
Date Published | March 2018 |
ISSN | 2329-9304 |
Keywords | attention models, Chinese-English translation, clause level, compositionality, Context modeling, conventional NMT model, Decoding, English-German translation, grammars, hierarchical neural network structure, Hierarchy-to-sequence, hierarchy-to-sequence attentional neural machine translation model, hierarchy-to-sequence attentional NMT model, language translation, learning (artificial intelligence), long parallel sentences, natural language processing, neural machine translation, neural nets, optimal model parameters, parameter learning, pubcrawl, recurrent neural nets, Recurrent neural networks, segmented clause sequence, segmented clauses, semantic compositionality modeling, Semantics, sequence-to-sequence attentional neural machine translation, short clauses, Speech, speech processing, text analysis, Training, translation prediction |
Abstract | Although sequence-to-sequence attentional neural machine translation (NMT) has achieved great progress recently, it is confronted with two challenges: learning optimal model parameters for long parallel sentences and well exploiting different scopes of contexts. In this paper, partially inspired by the idea of segmenting a long sentence into short clauses, each of which can be easily translated by NMT, we propose a hierarchy-to-sequence attentional NMT model to handle these two challenges. Our encoder takes the segmented clause sequence as input and explores a hierarchical neural network structure to model words, clauses, and sentences at different levels, particularly with two layers of recurrent neural networks modeling semantic compositionality at the word and clause level. Correspondingly, the decoder sequentially translates segmented clauses and simultaneously applies two types of attention models to capture contexts of interclause and intraclause for translation prediction. In this way, we can not only improve parameter learning, but also well explore different scopes of contexts for translation. Experimental results on Chinese-English and English-German translation demonstrate the superiorities of the proposed model over the conventional NMT model. |
URL | https://ieeexplore.ieee.org/document/8246560 |
DOI | 10.1109/TASLP.2018.2789721 |
Citation Key | su_hierarchy–sequence_2018 |
- semantic compositionality modeling
- neural nets
- optimal model parameters
- parameter learning
- pubcrawl
- recurrent neural nets
- Recurrent neural networks
- segmented clause sequence
- segmented clauses
- neural machine translation
- Semantics
- sequence-to-sequence attentional neural machine translation
- short clauses
- Speech
- speech processing
- text analysis
- Training
- translation prediction
- hierarchical neural network structure
- Chinese-English translation
- clause level
- Compositionality
- Context modeling
- conventional NMT model
- Decoding
- English-German translation
- grammars
- attention models
- Hierarchy-to-sequence
- hierarchy-to-sequence attentional neural machine translation model
- hierarchy-to-sequence attentional NMT model
- language translation
- learning (artificial intelligence)
- long parallel sentences
- natural language processing