Multilingual Text-to-Speech Training Using Cross Language Voice Conversion and Self-supervised Learning of Speech Representations

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Abstract

State of the art text-to-speech (TTS) models can generate high fidelity monolingual speech, but it is still challenging to synthesize multilingual speech from the same speaker. One major hurdle is for training data. It’s hard to find speakers who have native proficiency in several languages. One way of mitigating this issue is by generating polyglot corpus through voice conversion. In this paper, we train such multilingual TTS system through a novel cross-lingual voice conversion model trained with speaker-invariant features extracted from a speech representation model which is pre-trained with 53 languages through self-supervised learning [1]. To further improve the speaker identity shift, we also adopt a speaker similarity loss term during training. We then use this model to convert multilingual multi-speaker speech data to the voice of the target speaker. Through augmenting data from 4 other languages, we train a multilingual TTS system for a native monolingual English speaker which speaks 5 languages (English, French, German, Italian and Spanish). Our system achieves improved mean opinion score (MOS) compared with the baseline of multi-speaker system for all languages, specifically: 3.74 vs 3.62 for Spanish, 3.11 vs 2.71 for German, 3.47 vs 2.84 for Italian, and 2.72 vs 2.41 for French.

Latest Publications

A Practical Stereo Depth System for Smart Glasses

Jialiang Wang, Daniel Scharstein, Akash Bapat, Kevin Blackburn-Matzen Matthew Yu, Jonathan Lehman, Suhib Alsisan, Yanghan Wang, Sam Tsai, Jan-Michael Frahm, Zijian He, Peter Vajda, Michael Cohen, Matt Uyttendaele

CVPR - 2023

Presto: A Decade of SQL Analytics at Meta

Yutian James Sun, Tim Meehan, Rebecca Schlussel, Wenlei Xie, Masha Basmanova, Orri Erling, Andrii Rosa, Shixuan Fan, Rongrong Zhong, Arun Thirupathi, Nikhil Collooru, Ke Wang, Sameer Agarwal, Arjun Gupta, Dionysios Logothetis, Kostas Xirogiannopoulos, Bin Fan, Amit Dutta, Varun Gajjala, Rohit Jain, Ajay Palakuzhy, Prithvi Pandian, Sergey Pershin, Abhisek Saikia, Pranjal Shankhdhar, Neerad Somanchi, Swapnil Tailor, Jialiang Tan, Sreeni Viswanadha, Zac Wen, Deepak Majeti, Aditi Pandit, Biswapesh Chattopadhyay

SIGMOD - 2023