-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Murtaza_2025_WACV, author = {Murtaza, Shakeeb and Belharbi, Soufiane and Pedersoli, Marco and Granger, Eric}, title = {A Realistic Protocol for Evaluation of Weakly Supervised Object Localization}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {5367-5376} }
A Realistic Protocol for Evaluation of Weakly Supervised Object Localization
Abstract
Weakly Supervised Object Localization (WSOL) allows training deep learning models for classification and localization (LOC) using only global class-level labels. The absence of bounding box (bbox) supervision during training raises challenges in the literature for hyper-parameter tuning model selection and evaluation. WSOL methods rely on a validation set with bbox annotations for model selection and a test set with bbox annotations for threshold estimation for producing bboxes from localization maps. This approach however is not aligned with the WSOL setting as these annotations are typically unavailable in real-world scenarios. Our initial empirical analysis shows a significant decline in LOC performance when model selection and threshold estimation rely solely on class labels and the image itself respectively compared to using manual bbox annotations. This highlights the importance of incorporating bbox labels for optimal model performance. In this paper a new WSOL evaluation protocol is proposed that provides LOC information without the need for manual bbox annotations. In particular we generated noisy pseudo-boxes from a pretrained off-the-shelf region proposal method such as Selective Search CLIP and RPN for model selection. These bboxes are also employed to estimate the threshold from LOC maps circumventing the need for test-set bbox annotations. Our experiments with several WSOL methods on challenging natural and medical image datasets show that using the proposed pseudo-bboxes for validation facilitates the model selection and threshold estimation with LOC performance comparable to models selected using GT bboxes on the validation set and threshold estimation on the test set. It also outperforms models selected using class-level labels and then dynamically thresholded with only LOC maps.
Related Material