Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning

Zou, Xiaohan; Ma, Wenchao; Zhao, Shu

Xiaohan Zou, Wenchao Ma, Shu Zhao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 4862-4873

Abstract

Recent advancements in prompt-based learning have significantly advanced image and video class-incremental learning. However, the prompts learned by these methods often fail to capture the diverse and informative characteristics of videos, and struggle to generalize effectively to future tasks and classes. To address these challenges, this paper proposes modeling the distribution of space-time prompts conditioned on the input video using a diffusion model. This generative approach allows the proposed model to naturally handle the diverse characteristics of videos, leading to more robust prompt learning and enhanced generalization capabilities. Additionally, we develop a simple yet effective mechanism to transfer the token relationship modeling capabilities of pre-trained image transformers to spatio-temporal modeling in videos. Our approach has been thoroughly evaluated across four established benchmarks, showing remarkable improvements over existing state-of-the-art methods in video class-incremental learning.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Zou_2025_CVPR, author = {Zou, Xiaohan and Ma, Wenchao and Zhao, Shu}, title = {Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {4862-4873} }