From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding

Weiyu Zhang, Menglong Zhu, Konstantinos G. Derpanis; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013, pp. 2248-2255

Abstract


This paper presents a novel approach for analyzing human actions in non-scripted, unconstrained video settings based on volumetric, x-y-t, patch classifiers, termed actemes. Unlike previous action-related work, the discovery of patch classifiers is posed as a strongly-supervised process. Specifically, keypoint labels (e.g., position) across spacetime are used in a data-driven training process to discover patches that are highly clustered in the spacetime keypoint configuration space. To support this process, a new human action dataset consisting of challenging consumer videos is introduced, where notably the action label, the 2D position of a set of keypoints and their visibilities are provided for each video frame. On a novel input video, each acteme is used in a sliding volume scheme to yield a set of sparse, non-overlapping detections. These detecsseddeetecctions provide the intermediate substrate for segmeegmenatot the action. For action classification, the proposed representation shows significant improvement over state-of-the-art low-level features, while providing spatiotemporal localization as additional output. This output sheds further light into detailed action understanding.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhang_2013_ICCV,
author = {Zhang, Weiyu and Zhu, Menglong and Derpanis, Konstantinos G.},
title = {From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2013}
}