- [pdf] [supp] [code]
Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-based Action Recognition
Skeleton-based action recognition approaches usually construct the skeleton sequence as spatial-temporal graphs and perform graph convolution on these graphs to extract discriminative features. However, due to the fixed topology shared among different poses and the lack of direct long-range temporal dependencies, it is not trivial to learn the robust spatial-temporal feature. Therefore, we present a spatial-temporal adaptive graph convolutional network (STA-GCN) to learn adaptive spatial and temporal topologies and effectively aggregate features for skeletonbased action recognition. The proposed network is composed of spatial adaptive graph convolution (SA-GC) and temporal adaptive graph convolution (TA-GC) with an adaptive topology encoder. The SA-GC can extract the spatial feature for each pose with the spatial adaptive topology, while the TA-GC can learn the temporal feature by modeling adaptively the direct long-range temporal dependencies. On three large-scale skeleton action recognition datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton, the STA-GCN outperforms the existing stateof-the-art methods.