Human Action Recognition Based on Temporal Pose CNN and Multi-Dimensional Fusion

Yi Huang, Shang-Hong Lai, Shao-Heng Tai; Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0-0

Abstract


To take advantage of recent advances in human pose estimation from images, we develop a deep neural network model for action recognition from videos by computing temporal human pose features with a 3D CNN model. The proposed temporal pose features can provide more discriminative human action information than previous video features, such as appearance and short-term motion. In addition, we propose a novel fusion network that combines temporal pose, spatial and motion feature maps for the classification by bridging the gap between the dimension difference between 3D and 2D CNN feature maps. We show that the proposed action recognition system provides superior accuracy compared to the previous methods through experiments on Sub-JHMDB and PennAction datasets.

Related Material


[pdf]
[bibtex]
@InProceedings{Huang_2018_ECCV_Workshops,
author = {Huang, Yi and Lai, Shang-Hong and Tai, Shao-Heng},
title = {Human Action Recognition Based on Temporal Pose CNN and Multi-Dimensional Fusion},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV) Workshops},
month = {September},
year = {2018}
}