Video Action Recognition Based on Deeper Convolution Networks With Pair-Wise Frame Motion Concatenation

Yamin Han, Peng Zhang, Tao Zhuo, Wei Huang, Yanning Zhang; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017, pp. 8-17

Abstract


Deep convolution networks have shown a remarkable performance in different recognition tasks. Howerver, in reality, recognition is hard especially for the videos. Challenges such as cluttered backgrounds etc. generate the problem like large inter and intra class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pair-wise motion concatenation, which is named deep temporal convolutional networks. A temporal motion accumulation mechanism has been introduced as an effective data entry for learning of convolution networks. To handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are utilized for robust recognition. Experiments on UCF101 and ODAR dataset have verified a preferable performance when compared with state-of-art works.

Related Material


[pdf]
[bibtex]
@InProceedings{Han_2017_CVPR_Workshops,
author = {Han, Yamin and Zhang, Peng and Zhuo, Tao and Huang, Wei and Zhang, Yanning},
title = {Video Action Recognition Based on Deeper Convolution Networks With Pair-Wise Frame Motion Concatenation},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {July},
year = {2017}
}