On Train-Test Class Overlap and Detection for Image Retrieval

Song, Chull Hwan; Yoon, Jooyoung; Hwang, Taebaek; Choi, Shunghyun; Gu, Yeong Hyeon; Avrithis, Yannis

Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang, Shunghyun Choi, Yeong Hyeon Gu, Yannis Avrithis; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 17375-17384

Abstract

How important is it for training and evaluation sets to not have class overlap in image retrieval? We revisit Google Landmarks v2 clean the most popular training set by identifying and removing class overlap with Revisited Oxford and Paris the most popular training set. By comparing the original and the new RGLDv2-clean on a benchmark of reproduced state-of-the-art methods our findings are striking. Not only is there a dramatic drop in performance but it is inconsistent across methods changing the ranking. What does it take to focus on objects or interest and ignore background clutter when indexing? Do we need to analyze the evaluation set? Do we need to train an object detector and the representation separately? Do we need location supervision? We introduce Single-stage Detect-to-Retrieve (CiDeR) an end-to-end single-stage pipeline to detect objects of interest and extract a global image representation. We outperform previous state-of-the-art on both existing training sets and the new RGLDv2-clean.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Song_2024_CVPR, author = {Song, Chull Hwan and Yoon, Jooyoung and Hwang, Taebaek and Choi, Shunghyun and Gu, Yeong Hyeon and Avrithis, Yannis}, title = {On Train-Test Class Overlap and Detection for Image Retrieval}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {17375-17384} }