-
[pdf]
[supp]
[bibtex]@InProceedings{Wu_2025_WACV, author = {Wu, Tz-Ying and Min, Kyle and Tripathi, Subarna and Vasconcelos, Nuno}, title = {Ego-VPA: Egocentric Video Understanding with Parameter-Efficient Adaptation}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {9240-9250} }
Ego-VPA: Egocentric Video Understanding with Parameter-Efficient Adaptation
Abstract
Video understanding typically requires fine-tuning the large backbone when adapting to new domains. In this paper we leverage the egocentric video foundation models (Ego-VFMs) based on video-language pre-training and propose a parameter-efficient adaptation for egocentric video tasks namely Ego-VPA. It employs a local sparse approximation for each video frame/text feature using the basis prompts and the selected basis prompts are used to synthesize video/text prompts. Since the basis prompts are shared across frames and modalities it models context fusion and cross-modal transfer in an efficient fashion. Experiments show that Ego-VPA excels in lightweight adaptation (with only 0.84% learnable parameters) largely improving over baselines and reaching the performance of full fine-tuning.
Related Material