Acknowledging Focus Ambiguity in Visual Questions

Chen, Chongyan; Tseng, Yu-Yun; Li, Zhuoheng; Venkatesh, Anush; Gurari, Danna

Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li, Anush Venkatesh, Danna Gurari; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 1228-1238

Abstract

No published work on visual question answering (VQA) accounts for ambiguity regarding where the content described in the question is located in the image. To fill this gap, we introduce VQ-FocusAmbiguity, the first VQA dataset that visually grounds each plausible image region a question could refer to when arriving at valid answers. We next analyze and compare our dataset to existing datasets to reveal its unique properties. Finally, we benchmark modern models for two novel tasks related to acknowledging focus ambiguity: recognizing whether a visual question has focus ambiguity and locating all plausible focus regions within the image. Results show that the dataset is challenging for modern models. To facilitate future progress on these tasks, we publicly share the dataset with an evaluation server at https://vizwiz.org/tasks-and-datasets/focus-ambiguity-in-visual-questions/.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Chen_2025_ICCV, author = {Chen, Chongyan and Tseng, Yu-Yun and Li, Zhuoheng and Venkatesh, Anush and Gurari, Danna}, title = {Acknowledging Focus Ambiguity in Visual Questions}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {1228-1238} }