MV-RoMa: From Pairwise Matching into Multi-View Track Reconstruction

Lee, Jongmin; Kang, Seungyeop; Yoo, Sungjoo

Jongmin Lee, Seungyeop Kang, Sungjoo Yoo; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 7446-7456

Abstract

Establishing consistent correspondences across images is essential for 3D vision tasks such as structure-from-motion (SfM), yet most existing matchers operate in a pairwise manner, often producing fragmented and geometrically inconsistent tracks when their predictions are chained across views. We propose MV-RoMa, a multi-view dense matching model that jointly estimates dense correspondences from a source image to multiple co-visible targets. Specifically, we design an efficient model architecture which avoids high computational cost of full cross-attention for multi-view feature interaction: (i) multi-view encoder that leverages pair-wise matching results as a geometric prior, and (ii) multi-view matching refiner that refines correspondences using pixel-wise attention. Additionally, we propose a post-processing strategy that integrates our model's consistent multi-view correspondences as high-quality tracks for SfM. Across diverse and challenging benchmarks, MV-RoMa produces more reliable correspondences and substantially denser, more accurate 3D reconstructions than existing sparse and dense matching methods.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Lee_2026_CVPR, author = {Lee, Jongmin and Kang, Seungyeop and Yoo, Sungjoo}, title = {MV-RoMa: From Pairwise Matching into Multi-View Track Reconstruction}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {7446-7456} }