Multi-View Action Recognition Using Contrastive Learning

Ketul Shah, Anshul Shah, Chun Pong Lau, Celso M. de Melo, Rama Chellappa; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3381-3391


In this work, we present a method for RGB-based action recognition using multi-view videos. We present a supervised contrastive learning framework to learn a feature embedding robust to changes in viewpoint, by effectively leveraging multi-view data. We use an improved supervised contrastive loss and augment the positives with those coming from synchronized viewpoints. We also propose a new approach to use classifier probabilities to guide the selection of hard negatives in the contrastive loss, to learn a more discriminative representation. Negative samples from confusing classes based on posterior are weighted higher. We also show that our method leads to better domain generalization compared to the standard supervised training based on synthetic multi-view data. Extensive experiments on real (NTU-60, NTU-120, NUMA) and synthetic (RoCoG) data demonstrate the effectiveness of our approach.

Related Material

[pdf] [supp]
@InProceedings{Shah_2023_WACV, author = {Shah, Ketul and Shah, Anshul and Lau, Chun Pong and de Melo, Celso M. and Chellappa, Rama}, title = {Multi-View Action Recognition Using Contrastive Learning}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {3381-3391} }