Context-Aware Modeling and Recognition of Activities in Video

Yingying Zhu, Nandita M. Nayak, Amit K. Roy-Chowdhury; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 2491-2498


In this paper, rather than modeling activities in videos individually, we propose a hierarchical framework that jointly models and recognizes related activities using motion and various context features. This is motivated from the observations that the activities related in space and time rarely occur independently and can serve as the context for each other. Given a video, action segments are automatically detected using motion segmentation based on a nonlinear dynamical model. We aim to merge these segments into activities of interest and generate optimum labels for the activities. Towards this goal, we utilize a structural model in a max-margin framework that jointly models the underlying activities which are related in space and time. The model explicitly learns the duration, motion and context patterns for each activity class, as well as the spatio-temporal relationships for groups of them. The learned model is then used to optimally label the activities in the testing videos using a greedy search method. We show promising results on the VIRAT Ground Dataset demonstrating the benefit of joint modeling and recognizing activities in a wide-area scene.

Related Material

author = {Zhu, Yingying and Nayak, Nandita M. and Roy-Chowdhury, Amit K.},
title = {Context-Aware Modeling and Recognition of Activities in Video},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2013}