Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-based Action Recognition

Rui Hang, Minxian Li; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 1265-1281

Abstract


Skeleton-based action recognition approaches usually construct the skeleton sequence as spatial-temporal graphs and perform graph convolution on these graphs to extract discriminative features. However, due to the fixed topology shared among different poses and the lack of direct long-range temporal dependencies, it is not trivial to learn the robust spatial-temporal feature. Therefore, we present a spatial-temporal adaptive graph convolutional network (STA-GCN) to learn adaptive spatial and temporal topologies and effectively aggregate features for skeletonbased action recognition. The proposed network is composed of spatial adaptive graph convolution (SA-GC) and temporal adaptive graph convolution (TA-GC) with an adaptive topology encoder. The SA-GC can extract the spatial feature for each pose with the spatial adaptive topology, while the TA-GC can learn the temporal feature by modeling adaptively the direct long-range temporal dependencies. On three large-scale skeleton action recognition datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton, the STA-GCN outperforms the existing stateof-the-art methods.

Related Material


[pdf] [supp] [code]
[bibtex]
@InProceedings{Hang_2022_ACCV, author = {Hang, Rui and Li, Minxian}, title = {Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-based Action Recognition}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {1265-1281} }