Improving Action Localization by Progressive Cross-Stream Cooperation

Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 12016-12025

Abstract


Spatio-temporal action localization consists of three levels of tasks: spatial localization, action classification, and temporal segmentation. In this work, we propose a new Progressive Cross-stream Cooperation (PCSC) framework to iterative improve action localization results and generate better bounding boxes for one stream (i.e., Flow/RGB) by leveraging both region proposals and features from another stream (i.e., RGB/Flow) in an iterative fashion. Specifically, we first generate a larger set of region proposals by combining the latest region proposals from both streams, from which we can readily obtain a larger set of labelled training samples to help learn better action detection models. Second, we also propose a new message passing approach to pass information from one stream to another stream in order to learn better representations, which also leads to better action detection models. As a result, our iterative framework progressively improves action localization results at the frame level. To improve action localization results at the video level, we additionally propose a new strategy to train class-specific actionness detectors for better temporal segmentation, which can be readily learnt by using the training samples around temporal boundaries. Comprehensive experiments on two benchmark datasets UCF-101-24 and J-HMDB demonstrate the effectiveness of our newly proposed approaches for spatio-temporal action localization in realistic scenarios.

Related Material


[pdf]
[bibtex]
@InProceedings{Su_2019_CVPR,
author = {Su, Rui and Ouyang, Wanli and Zhou, Luping and Xu, Dong},
title = {Improving Action Localization by Progressive Cross-Stream Cooperation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}