Talking Face Generation With Multilingual TTS

Hyoung-Kyu Song, Sang Hoon Woo, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, Kang-wook Kim; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 21425-21430


Recent studies in talking face generation have focused on building a model that can generalize from any source speech to any target identity. A number of works have already claimed this functionality and have added that their models will also generalize to any language. However, we show, using languages from different language families, that these models do not translate well when the training language and the testing language are sufficiently different. We reduce the scope of the problem to building a languagerobust talking face generation system on seen identities, i.e., the target identity is the same as the training identity. In this work, we introduce a talking face generation system that generalizes to different languages. We evaluate the efficacy of our system using a multilingual text-to-speech system. We present the joint text-to-speech system and the talking face generation system as a neural dubber system. Our demo is available at Also, our screencast is uploaded at

Related Material

[pdf] [arXiv]
@InProceedings{Song_2022_CVPR, author = {Song, Hyoung-Kyu and Woo, Sang Hoon and Lee, Junhyeok and Yang, Seungmin and Cho, Hyunjae and Lee, Youseong and Choi, Dongho and Kim, Kang-wook}, title = {Talking Face Generation With Multilingual TTS}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {21425-21430} }