Embedding Sequential Information Into Spatiotemporal Features for Action Recognition

Yuancheng Ye, YingLi Tian; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2016, pp. 37-45

Abstract


In this paper, we introduce a novel framework for video-based action recognition, In this paper, we introduce a novel framework for video-based action recognition, which incorporates the sequential information with the spatiotemporal features. Specifically, the spatiotemporal features are extracted from the sliced clips of videos, and then a recurrent neural network is applied to embed the sequential information into the final feature representation of the video. In contrast to most current deep learning methods for the video-based tasks, our framework incorporates both long-term dependencies and spatiotemporal information of the clips in the video. To extract the spatiotemporal features from the clips, both dense trajectories (DT) and a newly proposed 3D neural network, C3D, are applied in our experiments. Our proposed framework is evaluated on the benchmark datasets of UCF101 and HMDB51, and achieves comparable performance compared with the state-of-the-art results.

Related Material


[pdf]
[bibtex]
@InProceedings{Ye_2016_CVPR_Workshops,
author = {Ye, Yuancheng and Tian, YingLi},
title = {Embedding Sequential Information Into Spatiotemporal Features for Action Recognition},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2016}
}