Concurrent Action Detection with Structural Prediction

Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013, pp. 3136-3143


Action recognition has often been posed as a classification problem, which assumes that a video sequence only have one action class label and different actions are independent. However, a single human body can perform multiple concurrent actions at the same time, and different actions interact with each other. This paper proposes a concurrent action detection model where the action detection is formulated as a structural prediction problem. In this model, an interval in a video sequence can be described by multiple action labels. An detected action interval is determined both by the unary local detector and the relations with other actions. We use a wavelet feature to represent the action sequence, and design a composite temporal logic descriptor to describe the action relations. The model parameters are trained by structural SVM learning. Given a long video sequence, a sequential decision window search algorithm is designed to detect the actions. Experiments on our new collected concurrent action dataset demonstrate the strength of our method.

Related Material

author = {Wei, Ping and Zheng, Nanning and Zhao, Yibiao and Zhu, Song-Chun},
title = {Concurrent Action Detection with Structural Prediction},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2013}