Deep Spatial Pyramid Ensemble for Cultural Event Recognition

Xiu-Shen Wei, Bin-Bin Gao, Jianxin Wu; The IEEE International Conference on Computer Vision (ICCV) Workshops, 2015, pp. 38-44


Semantic event recognition based only on image-based cues is a challenging problem in computer vision. In order to capture rich information and exploit important cues like human poses, human garments and scene categories, we propose the Deep Spatial Pyramid Ensemble framework, which is mainly based on our previous work, i.e., Deep Spatial Pyramid (DSP). DSP could build universal and powerful image representations from CNN models. Specifically, we employ five deep networks trained on different data sources to extract five corresponding DSP representations for event recognition images. For combining the complementary information from different DSP representations, we ensemble these features by both "early fusion" and "late fusion". Finally, based on the proposed framework, we come up with a solution for the track of the Cultural Event Recognition competition at the ChaLearn Looking at People (LAP) challenge in association with ICCV 2015. Our framework achieved one of the best cultural event recognition performance in this challenge.

Related Material

author = {Wei, Xiu-Shen and Gao, Bin-Bin and Wu, Jianxin},
title = {Deep Spatial Pyramid Ensemble for Cultural Event Recognition},
booktitle = {The IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {December},
year = {2015}