Learning Spatiotemporal Features for Infrared Action Recognition With 3D Convolutional Neural Networks

Zhuolin Jiang, Viktor Rozgic, Sancar Adali; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017, pp. 115-123

Abstract


While the action recognition task on videos collected from visible spectrum imaging has received much attention, action recognition in infrared (IR) videos is significantly less explored. Our objective is to exploit imaging data in this modality for the action recognition task. In this work, we propose a novel two-stream 3D convolutional neural network architecture by introducing the discriminative code layer and the corresponding discriminative code loss function. The proposed network processes IR images and the IR-based optical flow field sequences. We pretrain the 3D CNN model on the visible spectrum Sports-1M action dataset and finetune it on the Infrared Action Recognition (InfAR) dataset. We conduct an elaborate analysis of different fusion schemes (weighted average, single and double-layer neural nets) applied to different 3D CNN outputs. Experimental results demonstrate that our approach can achieve state-of-the-art average precision performances on the InfAR dataset.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Jiang_2017_CVPR_Workshops,
author = {Jiang, Zhuolin and Rozgic, Viktor and Adali, Sancar},
title = {Learning Spatiotemporal Features for Infrared Action Recognition With 3D Convolutional Neural Networks},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {July},
year = {2017}
}