Hybrid Active Learning via Deep Clustering for Video Action Detection

Rana, Aayush J.; Rawat, Yogesh S.

Aayush J. Rana, Yogesh S. Rawat; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 18867-18877

Abstract

In this work, we focus on reducing the annotation cost for video action detection which requires costly frame-wise dense annotations. We study a novel hybrid active learning (AL) strategy which performs efficient labeling using both intra-sample and inter-sample selection. The intra-sample selection leads to labeling of fewer frames in a video as opposed to inter-sample selection which operates at video level. This hybrid strategy reduces the annotation cost from two different aspects leading to significant labeling cost reduction. The proposed approach utilize Clustering-Aware Uncertainty Scoring (CLAUS), a novel label acquisition strategy which relies on both informativeness and diversity for sample selection. We also propose a novel Spatio-Temporal Weighted (STeW) loss formulation, which helps in model training under limited annotations. The proposed approach is evaluated on UCF-101-24 and J-HMDB-21 datasets demonstrating its effectiveness in significantly reducing the annotation cost where it consistently outperforms other baselines. Project details available at https://sites.google.com/view/activesparselabeling/home

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Rana_2023_CVPR, author = {Rana, Aayush J. and Rawat, Yogesh S.}, title = {Hybrid Active Learning via Deep Clustering for Video Action Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {18867-18877} }