How and What To Learn: Taxonomizing Self-Supervised Learning for 3D Action Recognition

Amor Ben Tanfous, Aimen Zerroug, Drew Linsley, Thomas Serre; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 2696-2705

Abstract


There are two competing standards for self-supervised learning in action recognition from 3D skeletons. Su et al., 2020 used an auto-encoder architecture and an image reconstruction objective function to achieve state-of-the-art performance on the NTU60 C-View benchmark. Rao et al., 2020 used Contrastive learning in the latent space to achieve state-of-the-art performance on the NTU60 C-Sub benchmark. Here, we reconcile these disparate approaches by developing a taxonomy of self-supervised learning for action recognition. We observe that leading approaches generally use one of two types of objective functions: those that seek to reconstruct the input from a latent representation ("Attractive" learning) versus those that also try to maximize the representations distinctiveness ("Contrastive" learning). Independently, leading approaches also differ in how they implement these objective functions: there are those that optimize representations in the decoder output space and those which optimize representations in the network's latent space (encoder output). We find that combining these approaches leads to larger gains in performance and tolerance to transformation than is achievable by any individual method, leading to state-of-the-art performance on three standard action recognition datasets. We include links to our code and data.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Ben_Tanfous_2022_WACV, author = {Ben Tanfous, Amor and Zerroug, Aimen and Linsley, Drew and Serre, Thomas}, title = {How and What To Learn: Taxonomizing Self-Supervised Learning for 3D Action Recognition}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2022}, pages = {2696-2705} }