Implicit Motion Function

Yue Gao, Jiahao Li, Lei Chu, Yan Lu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 19278-19289

Abstract


Recent advancements in video modeling extensively rely on optical flow to represent the relationships across frames but this approach often lacks efficiency and fails to model the probability of the intrinsic motion of objects. In addition conventional encoder-decoder frameworks in video processing focus on modeling the correlation in the encoder leading to limited generative capabilities and redundant intermediate representations. To address these challenges this paper proposes a novel Implicit Motion Function (IMF) method. Our approach utilizes a low-dimensional latent token as the implicit representation along with the use of cross-attention to implicitly model the correlation between frames. This enables the implicit modeling of temporal correlations and understanding of object motions. Our method not only improves sparsity and efficiency in representation but also explores the generative capabilities of the decoder by integrating correlation modeling within it. The IMF framework facilitates video editing and other generative tasks by allowing the direct manipulation of latent tokens. We validate the effectiveness of IMF through extensive experiments on multiple video tasks demonstrating superior performance in terms of reconstructed video quality compression efficiency and generation ability.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Gao_2024_CVPR, author = {Gao, Yue and Li, Jiahao and Chu, Lei and Lu, Yan}, title = {Implicit Motion Function}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {19278-19289} }