- [pdf] [supp] [code]
SCOAD: Single-frame Click Supervision for Online Action Detection
Online action detection based on supervised learning requires heavy manual annotation, which is difficult to obtain and may be impractical in real applications. Weakly supervised online action detection (WOAD) can effectively mitigate the problem of substantial labeling costs by using video-level labels. In this paper, we revisit WOAD and propose a weakly supervised online action detection using click-level labels for training, named Single-frame Click Supervision for Online Action Detection (SCOAD). Comparatively, click-level labels can effectively improve prediction accuracy by carrying a small amount of temporal information without massively increase the difficulty and cost of annotation. Specifically, SCOAD includes two joint training modules, i.e., Action Instance Miner (AIM) and Online Action Detector (OAD). To provide more guidance for training network as accuracy as possible, AIM mines pseudo-action instances under the supervision of click labels. Meanwhile, we generate video similarity instances offline by the similarity between video frames and use it to perform finer granularity filtering of error instances generated by AIM. OAD is trained jointly with AIM for online action detection by the pseudo frame-level labels converted from the filtered pseudo-action instances. We conduct extensive experiments on two benchmark datasets to demonstrate that SCOAD can effectively mine and utilize the small amount of temporal information in click-level labels. Code is available at https://github.com/zstarN70/SCOAD.git.