Action and Interaction Recognition in First-Person Videos

Sanath Narayan, Mohan S. Kankanhalli, Kalpathi R. Ramakrishnan; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2014, pp. 512-518


In this work, we evaluate the performance of the popular dense trajectories approach on first-person action recognition datasets. A person moving around with a wearable camera will actively interact with humans and objects and also passively observe others interacting. Hence, in order to represent real-world scenarios, the dataset must contain actions from first-person perspective as well as third-person perspective. For this purpose, we introduce a new dataset which contains actions from both the perspectives captured using a head-mounted camera. We employ a motion pyramidal structure for grouping the dense trajectory features. The relative strengths of motion along the trajectories are used to compute different bag-of-words descriptors and concatenated to form a single descriptor for the action. The motion pyramidal approach performs better than the baseline improved trajectory descriptors. The method achieves 96.7% on the JPL interaction dataset and 61.8% on our NUS interaction dataset. The same is used to detect actions in long video sequences and achieves average precision of 0.79 on JPL interaction dataset.

Related Material

author = {Narayan, Sanath and Kankanhalli, Mohan S. and Ramakrishnan, Kalpathi R.},
title = {Action and Interaction Recognition in First-Person Videos},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2014}