GAEA: A Geolocation Aware Conversational Assistant

Campos, Ron; Vayani, Ashmal; Kulkarni, Parth Parag; Gupta, Rohit; Zafar, Aizan; Dutta, Aritra; Shah, Mubarak

Ron Campos, Ashmal Vayani, Parth Parag Kulkarni, Rohit Gupta, Aizan Zafar, Aritra Dutta, Mubarak Shah; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026, pp. 5236-5246

Abstract

Image geolocalization, in which an AI model traditionally predicts the precise GPS coordinates of an image, is a challenging task with many downstream applications. However, the user cannot utilize the model to further their knowledge beyond the GPS coordinates; the model lacks an understanding of the location and the conversational ability to communicate with the user. In recent days, with the tremendous progress of large multimodal models (LMMs), both proprietary and open-source researchers have attempted to geolocalize images using LMMs. However, the issues remain unaddressed; beyond general tasks, for more specialized downstream tasks, such as geolocalization, LMMs struggle. In this work, we propose solving this problem by introducing a conversational model, GAEA, that provides information regarding the location of an image as the user requires. No large-scale dataset exists that enables the training of such a model. Thus, we propose GAEA-1.4M, a comprehensive dataset comprising over 800k images and approximately 1.4M question-answer pairs, constructed by leveraging OpenStreetMap (OSM) attributes and geographical context clues. For quantitative evaluation, we propose a diverse benchmark, GAEA-Bench, comprising 3.5k image-text pairs to evaluate conversational capabilities equipped with diverse question types. We consider 11 state-of-the-art open-source and proprietary LMMs and demonstrate that GAEA significantly outperforms the best open-source model, LLaVA-OneVision, by 18.2% and the best proprietary model, GPT-4o, by 7.2%. Our dataset, model and codes are available https://ucf-crcv.github.io/GAEA.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Campos_2026_WACV, author = {Campos, Ron and Vayani, Ashmal and Kulkarni, Parth Parag and Gupta, Rohit and Zafar, Aizan and Dutta, Aritra and Shah, Mubarak}, title = {GAEA: A Geolocation Aware Conversational Assistant}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {March}, year = {2026}, pages = {5236-5246} }