Bridging Viewpoint Gaps: Geometric Reasoning Boosts Semantic Correspondence

Qiyang Qian, Hansheng Chen, Masayoshi Tomizuka, Kurt Keutzer, Qianqian Wang, Chenfeng Xu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 11579-11589

Abstract


Finding semantic correspondences between images is a challenging problem in computer vision, particularly under significant viewpoint changes. Previous methods rely on semantic features from pre-trained 2D models like Stable Diffusion and DINOv2, which often struggle to extract viewpoint-invariant features. To overcome this, we propose a novel approach that integrates geometric and semantic reasoning. Unlike prior methods relying on heuristic geometric enhancements, our framework fine-tunes DUSt3R on synthetic cross-instance data to reconstruct distinct objects in an aligned 3D space. By learning to deform these objects into similar shapes using semantic supervision, we enable efficient KNN-based geometric matching, followed by sparse semantic matching within local KNN candidates. While trained on synthetic data, our method generalizes effectively to real-world images, achieving up to 7.4-point improvements in zero-shot settings on the rigid-body subset of Spair-71K and up to 19.6-point gains under extreme viewpoint variations. Additionally, it accelerates runtime by up to 40 times, demonstrating both its robustness to viewpoint changes and its efficiency for practical applications.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Qian_2025_CVPR, author = {Qian, Qiyang and Chen, Hansheng and Tomizuka, Masayoshi and Keutzer, Kurt and Wang, Qianqian and Xu, Chenfeng}, title = {Bridging Viewpoint Gaps: Geometric Reasoning Boosts Semantic Correspondence}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {11579-11589} }