Talking Head Anime 4: Distillation for Real-Time Performance

Khungurn, Pramook

Pramook Khungurn; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 5018-5029

Abstract

We study the problem of creating a character model that can be controlled in real time from a single image of an anime character. A solution would greatly reduce the cost of creating avatars computer games and other interactive applications. Talking Head Anime 3 (THA3) is an open source project that attempts to directly address the problem. It takes as input (1) an image of an anime character's upper body and (2) a 45-dimensional pose vector and outputs a new image of the same character taking the specified pose. The range of possible movements is expressive enough for personal avatars and certain types of game characters. THA3's main limitation is its speed. It can achieve interactive frame rates (approximately 20 FPS) only if it is run on a very powerful GPU (Nvidia Titan RTX or better). Based on the insight that avatars and game characters do not need to change their appearance every so often we propose a technique to distill the system into a small student neural network (< 2 MB) specific to a particular character. The student model can generate 512x512 animation frames in real time (no more than 30 FPS) using consumer gaming GPUs while preserving the image quality of the teacher model. For the first time our technique makes the whole system practical for real-time applications.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Khungurn_2025_WACV, author = {Khungurn, Pramook}, title = {Talking Head Anime 4: Distillation for Real-Time Performance}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {5018-5029} }