Bootstrapped Representation Learning for Skeleton-Based Action Recognition

Olivier Moliner, Sangxia Huang, Kalle Åström; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 4154-4164

Abstract


In this work, we study self-supervised representation learning for 3D skeleton-based action recognition. We extend Bootstrap Your Own Latent (BYOL) for representation learning on skeleton sequence data and propose a new data augmentation strategy including two asymmetric transformation pipelines. We also introduce a multi-viewpoint sampling method that leverages multiple viewing angles of the same action captured by different cameras. In the semi-supervised setting, we show that the performance can be further improved by knowledge distillation from wider networks, leveraging once more the unlabeled samples. We conduct extensive experiments on the NTU-60, NTU-120 and PKU-MMD datasets to demonstrate the performance of our proposed method. Our method consistently outperforms the current state of the art on linear evaluation, semi-supervised and transfer learning benchmarks.

Related Material


[pdf]
[bibtex]
@InProceedings{Moliner_2022_CVPR, author = {Moliner, Olivier and Huang, Sangxia and \r{A}str\"om, Kalle}, title = {Bootstrapped Representation Learning for Skeleton-Based Action Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {4154-4164} }