Sky2Ground: A Benchmark for Site Modeling under Varying Altitude

Zengyan Wang, Sirshapan Mitra, Rajat Modi, Hui Lim, Yogesh Rawat; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 12227-12236

Abstract


In this work, we propose the problem of localizing cameras and producing renders of a scene, given multiple images captured from ground/aerial/satellite viewpoints. We introduce a dataset called Sky2Ground, which contains synthetic/real images across all 3 viewpoints, along with camera parameters, and dense depth-maps/surface-normals. Recent works have shown that transformer-based nets like VGGT are capable of inferring scene-parameters in a single-forward pass. However, we formally reveal that simply fine-tuning such models reduces performance, and can't be solved simply by bruteforce-scaling. We find the culprit to be satellite images, which inject too much noise during the learning process. Therefore, we propose SkyNet to enable learning using satellite-images. SkyNet is a two-stream neural-net, with one stream explicitly processing satellite, and another processing all modalities together.We propose a restricted-attention mechanism, termed as `Masked-Satellite-Attention' which prevents ground/aerial images from interacting with satellite images. Further, our SkyNet is optimized with strategies inspired from curriculum-learning: sampling cameras which are far-away from each other during training. Extensive experiments on our Sky2Earth dataset reveal that SkyNet outperforms existing methods by 23% in terms of absolute performance. Our dataset, and code shall be made publicly available on huggingface.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wang_2026_CVPR, author = {Wang, Zengyan and Mitra, Sirshapan and Modi, Rajat and Lim, Hui and Rawat, Yogesh}, title = {Sky2Ground: A Benchmark for Site Modeling under Varying Altitude}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {12227-12236} }