Temporal Domain Neural Encoder for Video Representation Learning

Hao Hu, Zhaowen Wang, Joon-Young Lee, Zhe Lin, Guo-Jun Qi; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017, pp. 17-24

Abstract


We address the challenge of learning good video representations by explicitly modeling the relationship between visual concepts in time space. We propose a novel Temporal Preserving Recurrent Neural Network (TPRNN) that extracts and encodes visual dynamics with frame-level features as input. The proposed network architecture captures temporal dynamics by keeping track of the ordinal relationship of co-occurring visual concepts, and constructs video representations with their temporal order patterns. The resultant video representations effectively encode temporal information of dynamic patterns, which makes them more discriminative to human actions performed with different sequences of action patterns. We evaluate the proposed model on several real video datasets, and the results show that it successfully outperforms the baseline models. In particular, we observe significant improvement on action classes that can only be distinguished by capturing the temporal orders of action patterns.

Related Material


[pdf]
[bibtex]
@InProceedings{Hu_2017_CVPR_Workshops,
author = {Hu, Hao and Wang, Zhaowen and Lee, Joon-Young and Lin, Zhe and Qi, Guo-Jun},
title = {Temporal Domain Neural Encoder for Video Representation Learning},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {July},
year = {2017}
}