Advancing Open-Set Object Detection in Remote Sensing Using Multimodal Large Language Model

Nandini Saini, Ashudeep Dubey, Debasis Das, Chiranjoy Chattopadhyay; Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops, 2025, pp. 451-458

Abstract


In recent years open-set recognition in remote sensing has attracted significant attention. The goal is to identify unknown objects during inference extending the generalization of models trained on labeled data for known objects. However obtaining bounding box annotations for unknown object categories at a large scale is prohibitively expensive. Multimodal large language models (MLLMs) offer a promising alternative enabling the discovery of unknown object categories without the need for human intervention in labeling novel classes. In this paper we propose a novel methodology that leverages MLLMs to address the dual challenges of detecting and categorizing unknown objects in remote sensing imagery. By integrating three diverse datasets--DOTA DIOR and NWPU VHR-10--we simulate real-world open-set conditions by partitioning object classes into known and unknown categories. The proposed methodology employs a two-step approach: (1) open-set object region detection where known objects are identified using a model trained on labeled data while threshold-based region proposal extraction is applied to detect unknown objects; and (2) discovery and semantic labeling of unknown objects using MLLM-based textual annotation. The contextual descriptions generated by the MLLM serve as human-interpretable pseudo-labels which are further validated using vision-language similarity metrics. Experimental results demonstrate significant improvements in both detection (achieving high recall for unknown objects) and discovery (producing meaningful and accurate categorizations of novel objects). This work highlights the transformative potential of MLLMs for interpreting unknowns and paves the way for more robust open-set object detection in the remote sensing domain.

Related Material


[pdf]
[bibtex]
@InProceedings{Saini_2025_WACV, author = {Saini, Nandini and Dubey, Ashudeep and Das, Debasis and Chattopadhyay, Chiranjoy}, title = {Advancing Open-Set Object Detection in Remote Sensing Using Multimodal Large Language Model}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {451-458} }