Combining the Right Features for Complex Event Recognition

Kevin Tang, Bangpeng Yao, Li Fei-Fei, Daphne Koller; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013, pp. 2696-2703


In this paper, we tackle the problem of combining features extracted from video for complex event recognition. Feature combination is an especially relevant task in video data, as there are many features we can extract, ranging from image features computed from individual frames to video features that take temporal information into account. To combine features effectively, we propose a method that is able to be selective of different subsets of features, as some features or feature combinations may be uninformative for certain classes. We introduce a hierarchical method for combining features based on the AND/OR graph structure, where nodes in the graph represent combinations of different sets of features. Our method automatically learns the structure of the AND/OR graph using score-based structure learning, and we introduce an inference procedure that is able to efficiently compute structure scores. We present promising results and analysis on the difficult and large-scale 2011 TRECVID Multimedia Event Detection dataset [17].

Related Material

author = {Tang, Kevin and Yao, Bangpeng and Fei-Fei, Li and Koller, Daphne},
title = {Combining the Right Features for Complex Event Recognition},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2013}