Residual Stacked RNNs for Action Recognition

Mohamed Ilyes Lakhal, Albert Clapes, Sergio Escalera, Oswald Lanz, Andrea Cavallaro; Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0-0

Abstract


Action recognition pipelines that use Recurrent Neural Networks (RNN) are currently 5 − 10% less accurate than Convolutional Neural Networks (CNN). While most works that use RNNs employ a 2D CNN on each frame to extract descriptors for action recognition, we extract spatiotemporal features from a 3D CNN and then learn the temporal relationship of these descriptors through a stacked residual recurrent neural network (Res-RNN). We introduce for the first time residual learning to counter the degradation problem in multi-layer RNNs, which have been successful for temporal aggregation in two-stream action recognition pipelines. Finally, we use a late fusion strategy to combine RGB and optical flow data of the two-stream Res-RNN. Experimental results show that the proposed pipeline achieves competitive results on UCF-101 and state of-the-art results for RNN-like architectures on the challenging HMDB-51 dataset.

Related Material


[pdf]
[bibtex]
@InProceedings{Lakhal_2018_ECCV_Workshops,
author = {Ilyes Lakhal, Mohamed and Clapes, Albert and Escalera, Sergio and Lanz, Oswald and Cavallaro, Andrea},
title = {Residual Stacked RNNs for Action Recognition},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV) Workshops},
month = {September},
year = {2018}
}