Temporal Action Localization With Pyramid of Score Distribution Features

Jun Yuan, Bingbing Ni, Xiaokang Yang, Ashraf A. Kassim; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3093-3102

Abstract


We investigate the feature design and classification architectures in temporal action localization. This application focuses on detecting and labeling actions in untrimmed videos, which brings more challenge than classifying pre-segmented videos. The major difficulty for action localization is the uncertainty of action occurrence and utilization of information from different scales. Two innovations are proposed to address this issue. First, we propose a Pyramid of Score Distribution Feature (PSDF) to capture the motion information at multiple resolutions centered at each detection window. This novel feature mitigates the influence of unknown action position and duration, and shows significant performance gain over previous detection approaches. Second, inter-frame consistency is further explored by incorporating PSDF into the state-of-the-art Recurrent Neural Networks, which gives additional performance gain in detecting actions in temporally untrimmed videos. We tested our action localization framework on the THUMOS'15 and MPII Cooking Activities Dataset, both of which show a large performance improvement over previous attempts.

Related Material


[pdf] [video]
[bibtex]
@InProceedings{Yuan_2016_CVPR,
author = {Yuan, Jun and Ni, Bingbing and Yang, Xiaokang and Kassim, Ashraf A.},
title = {Temporal Action Localization With Pyramid of Score Distribution Features},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}