Visible to the public Adversarial Networks-Based Speech Enhancement with Deep Regret Loss

TitleAdversarial Networks-Based Speech Enhancement with Deep Regret Loss
Publication TypeConference Paper
Year of Publication2022
AuthorsPardede, Hilman, Zilvan, Vicky, Ramdan, Ade, Yuliani, Asri R., Suryawati, Endang, Kusumowardani, Renni
Conference Name2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS)
KeywordsDeep Learning, deep regret analytic generative adversarial networks, Generative Adversarial Learning, generative adversarial networks, Metrics, music, pubcrawl, resilience, Resiliency, Scalability, security, speech based system, speech enhancement, Stability analysis, Training
AbstractSpeech enhancement is often applied for speech-based systems due to the proneness of speech signals to additive background noise. While speech processing-based methods are traditionally used for speech enhancement, with advancements in deep learning technologies, many efforts have been made to implement them for speech enhancement. Using deep learning, the networks learn mapping functions from noisy data to clean ones and then learn to reconstruct the clean speech signals. As a consequence, deep learning methods can reduce what is so-called musical noise that is often found in traditional speech enhancement methods. Currently, one popular deep learning architecture for speech enhancement is generative adversarial networks (GAN). However, the cross-entropy loss that is employed in GAN often causes the training to be unstable. So, in many implementations of GAN, the cross-entropy loss is replaced with the least-square loss. In this paper, to improve the training stability of GAN using cross-entropy loss, we propose to use deep regret analytic generative adversarial networks (Dragan) for speech enhancements. It is based on applying a gradient penalty on cross-entropy loss. We also employ relativistic rules to stabilize the training of GAN. Then, we applied it to the least square and Dragan losses. Our experiments suggest that the proposed method improve the quality of speech better than the least-square loss on several objective quality metrics.
DOI10.1109/NISS55057.2022.10085296
Citation Keypardede_adversarial_2022