ViewCLR: Learning Self-Supervised Video Representation for Unseen Viewpoints

Srijan Das, Michael S. Ryoo; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 5573-5583

Abstract


Learning self-supervised video representation predominantly focuses on discriminating instances generated from simple data augmentation schemes. However, the learned representation often fails to generalize over unseen camera viewpoints. To this end, we propose ViewCLR, that learns self-supervised video representation invariant to camera viewpoint changes. We introduce a viewpoint-generator that can be considered as a learnable augmentation for any self-supervised pre-text tasks, to generate latent viewpoint representation of a video. ViewCLR maximizes the similarities between the representation of the latent viewpoint and that of the original viewpoint, enabling the learned video encoder to generalize over unseen camera viewpoints. Experiments on cross-view benchmark datasets including NTU RGB+D dataset show that ViewCLR stands as a state-of-the-art viewpoint invariant self-supervised method.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Das_2023_WACV, author = {Das, Srijan and Ryoo, Michael S.}, title = {ViewCLR: Learning Self-Supervised Video Representation for Unseen Viewpoints}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {5573-5583} }