Learning Spatial-context-aware Global Visual Feature Representation for Instance Image Retrieval

Zhongyan Zhang, Lei Wang, Luping Zhou, Piotr Koniusz; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 11250-11259

Abstract


In instance image retrieval, considering local spatial information within an image has proven effective to boost retrieval performance, as demonstrated by local visual descriptor based geometric verification. Nevertheless, it will be highly valuable to make ordinary global image representations spatial-context-aware because global representation based image retrieval is appealing thanks to its algorithmic simplicity, low memory cost, and being friendly to sophisticated data structures. To this end, we propose a novel feature learning framework for instance image retrieval, which embeds local spatial context information into the learned global feature representations. Specifically, in parallel to the visual feature branch in a CNN backbone, we design a spatial context branch that consists of two modules called online token learning and distance encoding. For each local descriptor learned in CNN, the former module is used to indicate the types of its surrounding descriptors, while their spatial distribution information is captured by the latter module. After that, the visual feature branch and the spatial context branch are fused to produce a single global feature representation per image. As experimentally demonstrated, with the spatial-context-aware characteristic, we can well improve the performance of global representation based image retrieval while maintaining all of its appealing properties. Our code is available at https://github.com/Zy-Zhang/SpCa

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Zhang_2023_ICCV, author = {Zhang, Zhongyan and Wang, Lei and Zhou, Luping and Koniusz, Piotr}, title = {Learning Spatial-context-aware Global Visual Feature Representation for Instance Image Retrieval}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {11250-11259} }