Unsupervised 3D Structure Inference from Category-Specific Image Collections

Weikang Wang, Dongliang Cao, Florian Bernard; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 10704-10714

Abstract


Understanding 3D object structure from image collections of general object categories remains a long-standing challenge in computer vision. Due to the high relevance of image keypoints (e.g. for graph matching controlling generative models scene understanding etc.) in this work we specifically focus on inferring 3D structure in terms of sparse keypoints. Existing 3D keypoint inference approaches rely on strong priors such as spatio-temporal consistency multi-view images of the same object 3D shape priors (e.g. templates skeleton) or supervisory signals e.g. in the form of 2D keypoint annotations. In contrast we propose the first unsupervised 3D keypoint inference approach that can be trained for general object categories solely from an inhomogeneous image collection (containing different instances of objects from the same category). Our experiments show that our method not only improves upon unsupervised 2D keypoint inference but more importantly it also produces reasonable 3D structure for various object categories both qualitatively and quantitatively.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Wang_2024_CVPR, author = {Wang, Weikang and Cao, Dongliang and Bernard, Florian}, title = {Unsupervised 3D Structure Inference from Category-Specific Image Collections}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {10704-10714} }