Pose-Based Contrastive Learning for Domain Agnostic Activity Representations

David Schneider, Saquib Sarfraz, Alina Roitberg, Rainer Stiefelhagen; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 3433-3443

Abstract


While recognition accuracies of video classification models trained on conventional benchmarks are gradually saturating, recent studies raise alarm about the learned representations not generalizing well across different domains. Learning abstract concepts behind an activity instead of overfitting to the appearances and biases of a specific benchmark domain is vital for building generalizable behaviour understanding models. In this paper, we introduce Pose-based High Level View Contrasting (P-HLVC), a novel method that leverages human pose dynamics as a supervision signal aimed at learning domain-invariant activity representations. Our model learns to link image sequences to more abstract body pose information through iterative contrastive clustering and the Sinkhorn-Knopp algorithm, providing us with video representations more resistant to domain shifts. We demonstrate the effectiveness of our approach in a cross-domain action recognition setting and achieve significant improvements on the synthetic-to-real Sims4Action benchmark.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Schneider_2022_CVPR, author = {Schneider, David and Sarfraz, Saquib and Roitberg, Alina and Stiefelhagen, Rainer}, title = {Pose-Based Contrastive Learning for Domain Agnostic Activity Representations}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {3433-3443} }