Action Graphs: Weakly-supervised Action Localization with Graph Convolution Networks

Maheen Rashid, Hedvig Kjellstrom, Yong Jae Lee; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 615-624

Abstract


We present a method for weakly-supervised action localization based on graph convolutions. In order to find and classify video time segments that correspond to relevant action classes, a system must be able to both identify discriminative time segments in each video, and identify the full extent of each action. Achieving this with weak video level labels requires the system to use similarity and dissimilarity between moments across videos in the training data to understand both how an action appears, as well as the sub-actions that comprise the action's full extent. However, current methods do not make explicit use of similarity between video moments to inform the localization and classification predictions. We present a novel method that uses graph convolutions to explicitly model similarity between video moments. Our method utilizes similarity graphs that encode appearance and motion, and pushes the state of the art on THUMOS `14, ActivityNet 1.2, and Charades for weakly-supervised action localization.

Related Material


[pdf] [supp] [video]
[bibtex]
@InProceedings{Rashid_2020_WACV,
author = {Rashid, Maheen and Kjellstrom, Hedvig and Lee, Yong Jae},
title = {Action Graphs: Weakly-supervised Action Localization with Graph Convolution Networks},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}
}