CAMEL: CAusal Motion Enhancement Tailored for Lifting Text-driven Video Editing

Zhang, Guiwei; Zhang, Tianyu; Niu, Guanglin; Tan, Zichang; Bai, Yalong; Yang, Qing

Guiwei Zhang, Tianyu Zhang, Guanglin Niu, Zichang Tan, Yalong Bai, Qing Yang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 9079-9088

Abstract

Text-driven video editing poses significant challenges in exhibiting flicker-free visual continuity while preserving the inherent motion patterns of original videos. Existing methods operate under a paradigm where motion and appearance are intricately intertwined. This coupling leads to the network either over-fitting appearance content -- failing to capture motion patterns -- or focusing on motion patterns at the expense of content generalization to diverse textual scenarios. Inspired by the pivotal role of wavelet transform in dissecting video sequences we propose CAusal Motion Enhancement tailored for Lifting text-driven video editing (CAMEL) a novel technique with two core designs. First we introduce motion prompts designed to summarize motion concepts from video templates through direct optimization. The optimized prompts are purposefully integrated into latent representations of diffusion models to enhance the motion fidelity of generated results. Second to enhance motion coherence and extend the generalization of appearance content to creative textual prompts we propose the causal motion-enhanced attention mechanism. This mechanism is implemented in tandem with a novel causal motion filter synergistically enhancing the motion coherence of disentangled high-frequency components and concurrently preserving the generalization of appearance content across various textual scenarios. Extensive experimental results show the superior performance of CAMEL.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Zhang_2024_CVPR, author = {Zhang, Guiwei and Zhang, Tianyu and Niu, Guanglin and Tan, Zichang and Bai, Yalong and Yang, Qing}, title = {CAMEL: CAusal Motion Enhancement Tailored for Lifting Text-driven Video Editing}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {9079-9088} }