stagNet: An Attentive Semantic RNN for Group Activity Recognition

Mengshi Qi, Jie Qin, Annan Li, Yunhong Wang, Jiebo Luo, Luc Van Gool; Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 101-117


Group activity recognition plays a fundamental role in a variety of applications, e.g. sports video analysis and intelligent surveillance. How to model the spatio-temporal contextual information in a scene still remains a crucial yet challenging issue. We propose a novel attentive semantic recurrent neural network (RNN), namely stagNet, for understanding group activities in videos, based on the spatio-temporal attention and semantic graph. A semantic graph is explicitly modeled to describe the spatial context of the whole scene, which is further integrated with the temporal factor via structural-RNN. Benefiting from the 'factor sharing' and 'message passing' mechanisms, our model is able to extract discriminative spatio-temporal features and to capture inter-group relationships. Moreover, we adopt a spatio-temporal attention model to attend to key persons/frames for improved performance. Two widely-used datasets are employed for performance evaluation, and the extensive results demonstrate the superiority of our method.

Related Material

author = {Qi, Mengshi and Qin, Jie and Li, Annan and Wang, Yunhong and Luo, Jiebo and Van Gool, Luc},
title = {stagNet: An Attentive Semantic RNN for Group Activity Recognition},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}