Exploiting Feature Hierarchies With Convolutional Neural Networks for Cultural Event Recognition

Mengyi Liu, Xin Liu, Yan Li, Xilin Chen, Alexander G. Hauptmann, Shiguang Shan; The IEEE International Conference on Computer Vision (ICCV) Workshops, 2015, pp. 32-37


Cultural events are kinds of typical events closely related to history and nationality, which play an important role in cultural heritage through generations. However, automatically recognizing cultural events still remains a great challenge since it depends on understanding of complex image contents such as people, objects, and scene context. Therefore, it is intuitive to associate this task with other high-level vision problems, e.g., object detection, recognition, and scene understanding. In this paper, we address this problem by combining both ideas of object / scene contents mining and strong image representation via CNN into a whole framework. Specifically, for object / scene contents mining, we employ selective search to extract a batch of bottom-up region proposals, which are served as key object / scene candidates in each event image; while for representation via CNN, we investigate two state-of-the-art deep architectures, VGGNet and GoogLeNet, and adapt them to our task by performing domain-specific (i.e., event) fine-tuning on both global image and hierarchical region proposals. These two models can complementarily exploit feature hierarchies spatially, which simultaneously capture the global context and local evidences within the image. In our final submission for ChaLearn LAP Challenge ICCV 2015, nine kinds of features extracted from five different deep models were exploited and followed with two kinds of classifiers for decision level fusion. Our method achieves the best performance of mAP=0.854 among all the participants in the track of cultural event recognition.

Related Material

