Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach

Arash Vahdat, Kevin Cannons, Greg Mori, Sangmin Oh, Ilseo Kim; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013, pp. 1185-1192

Abstract


We present a compositional model for video event detection. A video is modeled using a collection of both global and segment-level features and kernel functions are employed for similarity comparisons. The locations of salient, discriminative video segments are treated as a latent variable, allowing the model to explicitly ignore portions of the video that are unimportant for classification. A novel, multiple kernel learning (MKL) latent support vector machine (SVM) is defined, that is used to combine and re-weight multiple feature types in a principled fashion while simultaneously operating within the latent variable framework. The compositional nature of the proposed model allows it to respond directly to the challenges of temporal clutter and intra-class variation, which are prevalent in unconstrained internet videos. Experimental results on the TRECVID Multimedia Event Detection 2011 (MED11) dataset demonstrate the efficacy of the method.

Related Material


[pdf]
[bibtex]
@InProceedings{Vahdat_2013_ICCV,
author = {Vahdat, Arash and Cannons, Kevin and Mori, Greg and Oh, Sangmin and Kim, Ilseo},
title = {Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2013}
}