Semantic-guided Cross-Modal Prompt Learning for Skeleton-based Zero-shot Action Recognition

Anqi Zhu, Jingmin Zhu, James Bailey, Mingming Gong, Qiuhong Ke; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 13876-13885

Abstract


Skeleton-based human action recognition is promising due to its privacy preservation, robustness to visual challenges, and computational efficiency. Especially, the practical necessity to recognize unseen actions has led to increased interest in zero-shot skeleton-based action recognition (ZSSAR). Existing ZSSAR approaches often rely on manually crafted action descriptions or visual assumptions to enhance knowledge transfer, which is limited in flexibility and prone to inaccuracies and noise. To overcome this, we introduce Semantic-guided Cross-Modal Prompt Learning (SCoPLe), a novel framework that replaces manual guidance with data-driven prompt learning for refinement and alignment of skeletal and textual features. Specifically, we introduce a dual-stream language prompting module that preserves the original semantic context from the pre-trained text encoder while still effectively tuning its ouput for ZSSAR task adaptation. We also introduce a joint-shaped prompting module that learns tuning for skeleton features and incorporate an adaptive visual representation sampler that leverages text semantics to strengthen the cross-modal prompting interactions during skeleton-to-text embedding projection. Experimental results on the NTU-RGB+D and PKU-MMD datasets demonstrate the state-of-the-art performance of our method in both ZSSAR and generalized ZSSAR scenarios.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Zhu_2025_CVPR, author = {Zhu, Anqi and Zhu, Jingmin and Bailey, James and Gong, Mingming and Ke, Qiuhong}, title = {Semantic-guided Cross-Modal Prompt Learning for Skeleton-based Zero-shot Action Recognition}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {13876-13885} }