MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning

Mohamed Abdelfattah, Mariam Hassan, Alexandre Alahi; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 18678-18687

Abstract


Current transformer-based skeletal action recognition models tend to focus on a limited set of joints and low-level motion patterns to predict action classes. This results in significant performance degradation under small skeleton perturbations or changing the pose estimator between training and testing. In this work we introduce MaskCLR a new Masked Contrastive Learning approach for Robust skeletal action recognition. We propose an Attention-Guided Probabilistic Masking strategy to occlude the most important joints and encourage the model to explore a larger set of discriminative joints. Furthermore we propose a Multi-Level Contrastive Learning paradigm to enforce the representations of standard and occluded skeletons to be class-discriminative i.e. more compact within each class and more dispersed across different classes. Our approach helps the model capture the high-level action semantics instead of low-level joint variations and can be conveniently incorporated into transformer-based models. Without loss of generality we combine MaskCLR with three transformer backbones: the vanilla transformer DSTFormer and STTFormer. Extensive experiments on NTU60 NTU120 and Kinetics400 show that MaskCLR consistently outperforms previous state-of-the-art methods on standard and perturbed skeletons from different pose estimators showing improved accuracy generalization and robustness. Project website: https://maskclr.github.io.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Abdelfattah_2024_CVPR, author = {Abdelfattah, Mohamed and Hassan, Mariam and Alahi, Alexandre}, title = {MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {18678-18687} }