Parallel Attention Interaction Network for Few-Shot Skeleton-Based Action Recognition

Xingyu Liu, Sanping Zhou, Le Wang, Gang Hua; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 1379-1388

Abstract


Learning discriminative features from very few labeled samples to identify novel classes has received increasing attention in skeleton-based action recognition. Existing works aim to learn action-specific embeddings by exploiting either intra-skeleton or inter-skeleton spatial associations, which may lead to less discriminative representations. To address these issues, we propose a novel Parallel Attention Interaction Network (PAINet) that incorporates two complementary branches to strengthen the match by inter-skeleton and intra-skeleton correlation. Specifically, a topology encoding module utilizing topology and physical information is proposed to enhance the modeling of interactive parts and joint pairs in both branches. In the Cross Spatial Alignment branch, we employ a spatial cross-attention module to establish joint associations across sequences, and a directional Average Symmetric Surface Metric is introduced to locate the closest temporal similarity. In parallel, the Cross Temporal Alignment branch incorporates a spatial self-attention module to aggregate spatial context within sequences as well as applies the temporal cross-attention network to correct misalignment temporally and calculate similarity. Extensive experiments on three skeleton benchmarks, namely NTU-T, NTU-S, and Kinetics, demonstrate the superiority of our framework and consistently outperform state-of-the-art methods.

Related Material


[pdf]
[bibtex]
@InProceedings{Liu_2023_ICCV, author = {Liu, Xingyu and Zhou, Sanping and Wang, Le and Hua, Gang}, title = {Parallel Attention Interaction Network for Few-Shot Skeleton-Based Action Recognition}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {1379-1388} }