Cross-View Completion Models are Zero-shot Correspondence Estimators

An, Honggyu; Kim, Jin Hyeon; Park, Seonghoon; Jung, Jaewoo; Han, Jisang; Hong, Sunghwan; Kim, Seungryong

Honggyu An, Jin Hyeon Kim, Seonghoon Park, Jaewoo Jung, Jisang Han, Sunghwan Hong, Seungryong Kim; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 1103-1115

Abstract

In this work, we analyze new aspects of cross-view completion, mainly through the analogy of cross-view completion and traditional self-supervised correspondence learning algorithms. Based on our analysis, we reveal that the cross-attention map of Croco-v2, best reflects this correspondence information compared to other correlations from the encoder or decoder features. We further verify the effectiveness of the cross-attention map by evaluating on both zero-shot and supervised dense geometric correspondence and multi-frame depth estimation.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{An_2025_CVPR, author = {An, Honggyu and Kim, Jin Hyeon and Park, Seonghoon and Jung, Jaewoo and Han, Jisang and Hong, Sunghwan and Kim, Seungryong}, title = {Cross-View Completion Models are Zero-shot Correspondence Estimators}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {1103-1115} }