Video Co-segmentation for Meaningful Action Extraction

Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou; The IEEE International Conference on Computer Vision (ICCV), 2013, pp. 2232-2239


Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. As a preprocessing step, we first remove background trajectories by a motion-based figureground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory cosaliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Finally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary labeling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smoothness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimental results show that the proposed method performs well in common action extraction.

Related Material

author = {Guo, Jiaming and Li, Zhuwen and Cheong, Loong-Fah and Zhiying Zhou, Steven},
title = {Video Co-segmentation for Meaningful Action Extraction},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2013}