Temporal Structure Mining for Weakly Supervised Action Detection

Tan Yu, Zhou Ren, Yuncheng Li, Enxu Yan, Ning Xu, Junsong Yuan; The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 5522-5531

Abstract


Different from the fully-supervised action detection problem that is dependent on expensive frame-level annotations, weakly supervised action detection (WSAD) only needs video-level annotations, making it more practical for real-world applications. Existing WSAD methods detect action instances by scoring each video segment (a stack of frames) individually. Most of them fail to model the temporal relations among video segments and cannot effectively characterize action instances possessing latent temporal structure. To alleviate this problem in WSAD, we propose the temporal structure mining (TSM) approach. In TSM, each action instance is modeled as a multi-phase process and phase evolving within an action instance, i.e., the temporal structure, is exploited. Meanwhile, the video background is modeled by a background phase, which separates different action instances in an untrimmed video. In this framework, phase filters are used to calculate the confidence scores of the presence of an action's phases in each segment. Since in the WSAD task, frame-level annotations are not available and thus phase filters cannot be trained directly. To tackle the challenge, we treat each segment's phase as a hidden variable. We use segments' confidence scores from each phase filter to construct a table and determine hidden variables, i.e., phases of segments, by a maximal circulant path discovery along the table. Experiments conducted on three benchmark datasets demonstrate the state-of-the-art performance of the proposed TSM.

Related Material


[pdf]
[bibtex]
@InProceedings{Yu_2019_ICCV,
author = {Yu, Tan and Ren, Zhou and Li, Yuncheng and Yan, Enxu and Xu, Ning and Yuan, Junsong},
title = {Temporal Structure Mining for Weakly Supervised Action Detection},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}