Hierarchical-aware Orthogonal Disentanglement Framework for Fine-grained Skeleton-based Action Recognition

Haochen Chang, Pengfei Ren, Haoyang Zhang, Liang Xie, Hongbo Chen, Erwei Yin; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 11252-11261

Abstract


In recent years, skeleton-based action recognition has gained significant attention due to its robustness in varying environmental conditions. However, most existing methods struggle to distinguish fine-grained actions due to subtle motion features, minimal inter-class variation, and they often fail to consider the underlying similarity relationships between action classes. To address these limitations, we propose a Hierarchical-aware Orthogonal Disentanglement framework (HiOD). We disentangle coarse-grained and fine-grained features by employing independent spatial-temporal granularity-aware bases, which encode semantic representations at varying levels of granularity. Additionally, we design a cross-granularity feature interaction mechanism that leverages complementary information between coarse-grained and fine-grained features. We further enhance the learning process through hierarchical prototype contrastive learning, which utilizes the parent class hierarchy to guide the learning of coarse-grained features while ensuring the distinguishability of fine-grained features within child classes. Extensive experiments on FineGYM, FSD-10, NTU RGB+D, and NTU RGB+D 120 datasets demonstrate the superiority of our method in fine-grained action recognition tasks.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Chang_2025_ICCV, author = {Chang, Haochen and Ren, Pengfei and Zhang, Haoyang and Xie, Liang and Chen, Hongbo and Yin, Erwei}, title = {Hierarchical-aware Orthogonal Disentanglement Framework for Fine-grained Skeleton-based Action Recognition}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {11252-11261} }