-
[pdf]
[supp]
[bibtex]@InProceedings{Kwon_2025_WACV, author = {Kwon, Donghyeon and Kim, Inho and Kwak, Suha}, title = {Boosting Semi-Supervised Video Action Detection with Temporal Context}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {847-858} }
Boosting Semi-Supervised Video Action Detection with Temporal Context
Abstract
This paper studies semi-supervised learning of video action detection (VAD) which assumes that only a small portion of training videos are labeled and the others remain unlabeled. The existing semi-supervised methods for VAD mainly focus on leveraging spatial context of unlabeled video lacking its exploration of temporal context. To resolve this we present a novel semi-supervised learning framework that effectively incorporates spatio-temporal context during training. We first introduce a new augmentation strategy called temporal cross-view augmentation to achieve robust representation across clips depicting the same action but not aligned on the time axis. We also propose a new context fusion method called global-local context fusion that effectively utilizes the spatio-temporal context of videos to enhances the features of each frame by incorporating those of other frames within a clip; this method aids in actively leveraging spatio-temporal context of video leading to significant performance improvement. Our framework was evaluated on UCF101-24 and JHMDB-21 where it outperformed all existing methods in every evaluation setting.
Related Material