UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts

Wan, Zhen; Qi, Chenyang; Liu, Zhiheng; Gui, Tao; Ma, Yue

Zhen Wan, Chenyang Qi, Zhiheng Liu, Tao Gui, Yue Ma; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 1882-1892

Abstract

In this paper, we present UniPaint, a unified generative space-time video inpainting framework that enables spatial-temporal inpainting and interpolation. Different from existing methods that treat video inpainting and video interpolation as two distinct tasks, we leverage a unified inpainting framework to tackle them and observe that these two tasks can mutually enhance synthesis performance. Specifically, we first introduce a plug-and-play space-time video inpainting adapter, which can be employed in various personalized models. The key insight is to propose a Mixture of Experts (MoE) attention to cover various tasks. Then, we design a \st masking strategy during the training stage to mutually enhance each other and improve performance. UniPaint produces high-quality and aesthetically pleasing results, achieving the best quantitative results across various tasks and scale setups. The code and checkpoints are available at https://github. com/mmmmm-w/UniPaint.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Wan_2025_ICCV, author = {Wan, Zhen and Qi, Chenyang and Liu, Zhiheng and Gui, Tao and Ma, Yue}, title = {UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {1882-1892} }