Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation

Linjiang Huang, Liang Wang, Hongsheng Li; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 3272-3281

Abstract


Weakly supervised temporal action localization targets at localizing temporal boundaries of actions and simultaneously identify their categories with only video-level category labels. Many existing methods seek to generate pseudo labels for bridging the discrepancy between classification and localization, but usually only make use of limited contextual information for pseudo label generation. To alleviate this problem, we propose a representative snippet summarization and propagation framework. Our method seeks to mine the representative snippets in each video for better propagating information between video snippets. For each video, its own representative snippets and the representative snippets from a memory bank are propagated to update the input features in an intra- and inter-video manner. The pseudo labels are generated from the temporal class activation maps of the updated features to rectify the predictions of the main branch. Our method obtains superior performance in comparison to the existing methods on two benchmarks, THUMOS14 and ActivityNet1.3, achieving gains as high as 1.2% in terms of average mAP on THUMOS14. Our code is available at https://github.com/LeonHLJ/RSKP.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Huang_2022_CVPR, author = {Huang, Linjiang and Wang, Liang and Li, Hongsheng}, title = {Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {3272-3281} }