First Person Action Recognition Using Deep Learned Descriptors

Suriya Singh, Chetan Arora, C. V. Jawahar; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2620-2628

Abstract


We focus on the problem of wearer's action recognition in first person a.k.a. egocentric videos. This problem is more challenging than third person activity recognition due to unavailability of wearer's pose and sharp movements in the videos caused by the natural head motion of the wearer. Carefully crafted features based on hands and objects cues for the problem have been shown to be successful for limited targeted datasets. We propose convolutional neural networks (CNNs) for end to end learning and classification of wearer's actions. The proposed network makes use of egocentric cues by capturing hand pose, head motion and saliency map. It is compact. It can also be trained from relatively small number of labeled egocentric videos that are available. We show that the proposed network can generalize and give state of the art performance on various disparate egocentric action datasets.

Related Material


[pdf]
[bibtex]
@InProceedings{Singh_2016_CVPR,
author = {Singh, Suriya and Arora, Chetan and Jawahar, C. V.},
title = {First Person Action Recognition Using Deep Learned Descriptors},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}