Weakly Supervised Graph Convolutional Neural Network for Human Action Localization

Daisuke Miki, Shi Chen, Kazuyuki Demachi; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 653-661

Abstract


Skeleton-based human action recognition from video sequences is currently an active topic of research. Conventionally, human action recognition is performed after conducting feature extraction on a given spatial-temporal representation of a human pose by using statistical methods or deep learning methods. The spatial and temporal features are globally evaluated by a classifier and used to determine which action is closest. However, the conventional methodology does not identify the temporal location of the action that determines the classification. To address this problem, we propose a skeleton-based human action recognition and localization method using weakly supervised graph convolutional neural networks, which are both spatially and temporally connected. In this method, human action localization is accomplished using time series data of human joint positions as input and then applying regression to find an expected value for each action at each time frame. Our weakly supervised training is based on multiple-instance learning inspired by deep ranking, and we devise a loss function so that high scores can be spontaneously learned for temporally important time frames. In this paper, we first explain the network architecture and then present a multiple-instance learning method for its optimization. In the experiment, we performed localization and classification of human actions by using this method and confirmed the temporal localization efficacy of the method.

Related Material


[pdf] [video]
[bibtex]
@InProceedings{Miki_2020_WACV,
author = {Miki, Daisuke and Chen, Shi and Demachi, Kazuyuki},
title = {Weakly Supervised Graph Convolutional Neural Network for Human Action Localization},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}
}