Rethinking Visual Geo-Localization for Large-Scale Applications

Gabriele Berton, Carlo Masone, Barbara Caputo; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 4878-4888


Visual Geo-localization (VG) is the task of estimating the position where a given photo was taken by comparing it with a large database of images of known locations. To investigate how existing techniques would perform on a real-world city-wide VG application, we build San Francisco eXtra Large, a new dataset covering a whole city and providing a wide range of challenging cases, with a size 30x bigger than the previous largest dataset for visual geo-localization. We find that current methods fail to scale to such large datasets, therefore we design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem avoiding the expensive mining needed by the commonly used contrastive learning. We achieve state-of-the-art performance on a wide range of datasets, and find that CosPlace is robust to heavy domain changes. Moreover, we show that, compared to previous state of the art, CosPlace requires roughly 80% less GPU memory at train time and achieves better results with 8x smaller descriptors, paving the way for city-wide real-world visual geo-localization. Dataset, code and trained models are available for research purposes at

Related Material

[pdf] [supp] [arXiv]
@InProceedings{Berton_2022_CVPR, author = {Berton, Gabriele and Masone, Carlo and Caputo, Barbara}, title = {Rethinking Visual Geo-Localization for Large-Scale Applications}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {4878-4888} }