ACTIVE: Activity Concept Transitions in Video Event Classification

Chen Sun, Ram Nevatia; The IEEE International Conference on Computer Vision (ICCV), 2013, pp. 913-920


The goal of high level event classification from videos is to assign a single, high level event label to each query video. Traditional approaches represent each video as a set of low level features and encode it into a fixed length feature vector (e.g. Bag-of-Words), which leave a big gap between low level visual features and high level events. Our paper tries to address this problem by exploiting activity concept transitions in video events (ACTIVE). A video is treated as a sequence of short clips, all of which are observations corresponding to latent activity concept variables in a Hidden Markov Model (HMM). We propose to apply Fisher Kernel techniques so that the concept transitions over time can be encoded into a compact and fixed length feature vector very efficiently. Our approach can utilize concept annotations from independent datasets, and works well even with a very small number of training samples. Experiments on the challenging NIST TRECVID Multimedia Event Detection (MED) dataset shows our approach performs favorably over the state-of-the-art.

Related Material

author = {Sun, Chen and Nevatia, Ram},
title = {ACTIVE: Activity Concept Transitions in Video Event Classification},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2013}