IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion

Tharun Anand, Aryan Garg, Kaushik Mitra; Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops, 2025, pp. 248-258

Abstract


Facial video editing has become increasingly important for content creators enabling the manipulation of facial expressions and attributes. However existing models encounter challenges such as poor editing quality high computational costs and difficulties in preserving facial identity across diverse edits. Additionally these models are often constrained to editing predefined facial attributes limiting their flexibility to diverse editing prompts. To address these challenges we propose a novel facial video editing framework that leverages the rich latent space of pre-trained text-to-image (T2I) diffusion models and fine-tune them specifically for facial video editing tasks. Our approach introduces a targeted fine-tuning scheme that enables high quality localized text-driven edits while ensuring identity preservation across video frames. Additionally by using pre-trained T2I models during inference our approach significantly reduces editing time by 80% while maintaining temporal consistency throughout the video sequence. We evaluate the effectiveness of our approach through extensive testing across a wide range of challenging scenarios including varying head poses complex action sequences and diverse facial expressions. Our method consistently outperforms existing techniques demonstrating superior performance across a broad set of metrics and benchmarks.

Related Material


[pdf]
[bibtex]
@InProceedings{Anand_2025_WACV, author = {Anand, Tharun and Garg, Aryan and Mitra, Kaushik}, title = {IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {248-258} }