Enhancing 2D Representation Learning with a 3D Prior

Mehmet Aygun, Prithviraj Dhar, Zhicheng Yan, Oisin Mac Aodha, Rakesh Ranjan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 7750-7760

Abstract


Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally this is achieved by training models with expensive to obtain supervised data. Self-supervised learning attempts to circumvent the requirement for labeled data by learning representations from raw visual data alone. However unlike humans who obtain rich 3D information from their binocular vision and through motion the majority of current self-supervised methods are tasked with learning from monocular 2D images alone. This is noteworthy as it has been demonstrated that shape-centric visual processing is more robust compared to texture-biased automated methods. Inspired by this we propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural prior directly into the model during training. Through experiments across a range of datasets we demonstrate that our 3D aware representations are more robust compared to conventional self-supervised baselines.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Aygun_2024_CVPR, author = {Aygun, Mehmet and Dhar, Prithviraj and Yan, Zhicheng and Mac Aodha, Oisin and Ranjan, Rakesh}, title = {Enhancing 2D Representation Learning with a 3D Prior}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {7750-7760} }