A Universal Bert-Based Front-End Model for Mandarin Text-To-Speech Synthesis

Submitted by aekwall on Fri, 05/06/2022 - 12:57pm

Title	A Universal Bert-Based Front-End Model for Mandarin Text-To-Speech Synthesis
Publication Type	Conference Paper
Year of Publication	2021
Authors	Bai, Zilong, Hu, Beibei
Conference Name	ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date Published	jun
Keywords	Acoustics, BERT, compositionality, Conferences, expandability, feature extraction, front-end, multi-task learning, Predictive models, pubcrawl, Resiliency, Signal processing, speech processing, Task Analysis, text-to-speech
Abstract	The front-end text processing module is considered as an essential part that influences the intelligibility and naturalness of a Mandarin text-to-speech system significantly. For commercial text-to-speech systems, the Mandarin front-end should meet the requirements of high accuracy and low time latency while also ensuring maintainability. In this paper, we propose a universal BERT-based model that can be used for various tasks in the Mandarin front-end without changing its architecture. The feature extractor and classifiers in the model are shared for several sub-tasks, which improves the expandability and maintainability. We trained and evaluated the model with polyphone disambiguation, text normalization, and prosodic boundary prediction for single task modules and multi-task learning. Results show that, the model maintains high performance for single task modules and shows higher accuracy and lower time latency for multi-task modules, indicating that the proposed universal front-end model is promising as a maintainable Mandarin front-end for commercial applications.
DOI	10.1109/ICASSP39728.2021.9414935
Citation Key	bai_universal_2021

Groups:

Science of Security VO