Visible to the public A Hierarchy-to-Sequence Attentional Neural Machine Translation Model

TitleA Hierarchy-to-Sequence Attentional Neural Machine Translation Model
Publication TypeJournal Article
Year of Publication2018
AuthorsSu, Jinsong, Zeng, Jiali, Xiong, Deyi, Liu, Yang, Wang, Mingxuan, Xie, Jun
JournalIEEE/ACM Transactions on Audio, Speech, and Language Processing
Volume26
Pagination623—632
Date PublishedMarch 2018
ISSN2329-9304
Keywordsattention models, Chinese-English translation, clause level, compositionality, Context modeling, conventional NMT model, Decoding, English-German translation, grammars, hierarchical neural network structure, Hierarchy-to-sequence, hierarchy-to-sequence attentional neural machine translation model, hierarchy-to-sequence attentional NMT model, language translation, learning (artificial intelligence), long parallel sentences, natural language processing, neural machine translation, neural nets, optimal model parameters, parameter learning, pubcrawl, recurrent neural nets, Recurrent neural networks, segmented clause sequence, segmented clauses, semantic compositionality modeling, Semantics, sequence-to-sequence attentional neural machine translation, short clauses, Speech, speech processing, text analysis, Training, translation prediction
Abstract

Although sequence-to-sequence attentional neural machine translation (NMT) has achieved great progress recently, it is confronted with two challenges: learning optimal model parameters for long parallel sentences and well exploiting different scopes of contexts. In this paper, partially inspired by the idea of segmenting a long sentence into short clauses, each of which can be easily translated by NMT, we propose a hierarchy-to-sequence attentional NMT model to handle these two challenges. Our encoder takes the segmented clause sequence as input and explores a hierarchical neural network structure to model words, clauses, and sentences at different levels, particularly with two layers of recurrent neural networks modeling semantic compositionality at the word and clause level. Correspondingly, the decoder sequentially translates segmented clauses and simultaneously applies two types of attention models to capture contexts of interclause and intraclause for translation prediction. In this way, we can not only improve parameter learning, but also well explore different scopes of contexts for translation. Experimental results on Chinese-English and English-German translation demonstrate the superiorities of the proposed model over the conventional NMT model.

URLhttps://ieeexplore.ieee.org/document/8246560
DOI10.1109/TASLP.2018.2789721
Citation Keysu_hierarchy–sequence_2018