Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

Linzhan Mou, Jun-Kun Chen, Yu-Xiong Wang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 20176-20185

Abstract


This paper proposes Instruct 4D-to-4D that achieves 4D awareness and spatial-temporal consistency for 2D diffusion models to generate high-quality instruction-guided dynamic scene editing results. Traditional applications of 2D diffusion models in dynamic scene editing often result in inconsistency primarily due to their inherent frame-by-frame editing methodology. Addressing the complexities of extending instruction-guided editing to 4D our key insight is to treat a 4D scene as a pseudo-3D scene decoupled into two sub-problems: achieving temporal consistency in video editing and applying these edits to the pseudo-3D scene. Following this we first enhance the Instruct-Pix2Pix (IP2P) model with an anchor-aware attention module for batch processing and consistent editing. Additionally we integrate optical flow-guided appearance propagation in a sliding window fashion for more precise frame-to-frame editing and incorporate depth-based projection to manage the extensive data of pseudo-3D scenes followed by iterative editing to achieve convergence. We extensively evaluate our approach in various scenes and editing instructions and demonstrate that it achieves spatially and temporally consistent editing results with significantly enhanced detail and sharpness over the prior art. Notably Instruct 4D-to-4D is general and applicable to both monocular and challenging multi-camera scenes.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Mou_2024_CVPR, author = {Mou, Linzhan and Chen, Jun-Kun and Wang, Yu-Xiong}, title = {Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {20176-20185} }