A Spatio-temporal Feature Based on Triangulation of Dense SURF

Do Hang Nga, Keiji Yanai; Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, 2013, pp. 420-427

Abstract


In this paper, we propose a spatio-temporal feature which is based on the appearance and movement of interest SURF keypoints. Given a video, we extract its spatiotemporal features according to every small set of frames. For each frame set, we first extract dense SURF keypoints from its first frame and estimate their optical flows at each frame. We then detect camera motion and compensate flow vectors in case camera motion exists. Next, we select interest points based on their movement based relationship through the frame set. We then apply Delaunay triangulation to form triangles of selected points. From each triangle we extract its shape feature along with trajectory based visual features of its points. We show that concatenating these features with SURF feature can form a spatio-temporal feature which is comparable to the state of the art. Our proposed spatio-temporal feature is supposed to be robust and informative since it is not based on characteristics of individual points but groups of related interest points. We apply Fisher Vector encoding to represent videos using the proposed feature. We conduct various experiments on UCF101, the largest action dataset of realistic videos up to date, and show the effectiveness of our proposed method.

Related Material


[pdf]
[bibtex]
@InProceedings{Hang_2013_ICCV_Workshops,
author = {Do Hang Nga and Keiji Yanai},
title = {A Spatio-temporal Feature Based on Triangulation of Dense SURF},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {June},
year = {2013}
}