Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval

Fan Zhang, Xian-Sheng Hua, Chong Chen, Xiao Luo; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 17016-17026

Abstract


This paper studies the problem of semi-supervised 2D-3D retrieval which aims to align both labeled and unlabeled 2D and 3D data into the same embedding space. The problem is challenging due to the complicated heterogeneous relationships between 2D and 3D data. Moreover label scarcity in real-world applications hinders from generating discriminative representations. In this paper we propose a semi-supervised approach named Fine-grained Prototypcical Voting with Heterogeneous Mixup (FIVE) which maps both 2D and 3D data into a common embedding space for cross-modal retrieval. Specifically we generate fine-grained prototypes to model inter-class variation for both 2D and 3D data. Then considering each unlabeled sample as a query we retrieve relevant prototypes to vote for reliable and robust pseudo-labels which serve as guidance for discriminative learning under label scarcity. Furthermore to bridge the semantic gap between two modalities we mix cross-modal pairs with similar semantics in the embedding space and then perform similarity learning for cross-modal discrepancy reduction in a soft manner. The whole FIVE is optimized with the consideration of sharpness to mitigate the impact of potential label noise. Extensive experiments on benchmark datasets validate the superiority of FIVE compared with a range of baselines in different settings. On average FIVE outperforms the second-best approach by 4.74% on 3D MNIST 12.94% on ModelNet10 and 22.10% on ModelNet40.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Zhang_2024_CVPR, author = {Zhang, Fan and Hua, Xian-Sheng and Chen, Chong and Luo, Xiao}, title = {Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {17016-17026} }