Temporal Reasoning in Videos Using Convolutional Gated Recurrent Units

Debidatta Dwibedi, Pierre Sermanet, Jonathan Tompson; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018, pp. 1111-1116

Abstract


Recently, deep learning based models have pushed state-of-the-art performance for the task of action recognition in videos. Yet, for many action recognition datasets like Kinetics and UCF101, the correct temporal order of frames doesn't seem to be essential to solving the task. We find that the temporal order matters more for the recently introduced 20BN Something-Something dataset where the task of fine-grained action recognition necessitates the model to do temporal reasoning. We show that when temporal order matters, recurrent models can provide a significant boost in performance. Using qualitative methods, we show that when the task of action recognition requires temporal reasoning, the hidden states of the recurrent units encode meaningful state transitions.

Related Material


[pdf]
[bibtex]
@InProceedings{Dwibedi_2018_CVPR_Workshops,
author = {Dwibedi, Debidatta and Sermanet, Pierre and Tompson, Jonathan},
title = {Temporal Reasoning in Videos Using Convolutional Gated Recurrent Units},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2018}
}