Temporal 3D ConvNets Using Temporal Transition Layer

Ali Diba, Mohsen Fayyaz, Vivek Sharma, A. Hossein Karami, M. Mahdi Arzani, Rahman Yousefzadeh, Luc Van Gool; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018, pp. 1117-1121

Abstract


The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on spatio-temporal approaches with fixed temporal convolution kernel depths. We introduce a new temporal layer that models variable temporal convolution kernel depths. We embed this new temporal layer in our proposed 3D CNN. We extend the DenseNet architecture - which normally is 2D - with 3D filters and pooling kernels. We name our proposed video convolutional network: Temporal 3D ConvNet (T3D) and its new temporal layer Temporal Transition Layer (TTL). Our experiments show that T3D outperforms the current state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets.

Related Material


[pdf]
[bibtex]
@InProceedings{Diba_2018_CVPR_Workshops,
author = {Diba, Ali and Fayyaz, Mohsen and Sharma, Vivek and Hossein Karami, A. and Mahdi Arzani, M. and Yousefzadeh, Rahman and Van Gool, Luc},
title = {Temporal 3D ConvNets Using Temporal Transition Layer},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2018}
}