A Long Short-Term Memory Convolutional Neural Network for First-Person Vision Activity Recognition

Girmaw Abebe, Andrea Cavallaro; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1339-1346

Abstract


Temporal information is the main source of discriminating characteristics for the recognition of proprioceptive activities in first-person vision (FPV). In this paper, we propose a motion representation that uses stacked spectrograms. These spectrograms are generated over temporal windows from mean grid-optical-flow vectors and the displacement vectors of the intensity centroid. The stacked representation enables us to use 2D convolutions to learn and extract global motion features. Moreover, we employ a long short-term memory (LSTM) network to encode the temporal dependency among consecutive samples recursively. Experimental results show that the proposed approach achieves state-of-the-art performance in the largest public dataset for FPV activity recognition.

Related Material


[pdf]
[bibtex]
@InProceedings{Abebe_2017_ICCV,
author = {Abebe, Girmaw and Cavallaro, Andrea},
title = {A Long Short-Term Memory Convolutional Neural Network for First-Person Vision Activity Recognition},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}
}