ETR: An Efficient Transformer for Re-Ranking in Visual Place Recognition

Hao Zhang, Xin Chen, Heming Jing, Yingbin Zheng, Yuan Wu, Cheng Jin; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 5665-5674

Abstract


Visual place recognition is to estimate the geographical location of a given image, which is usually addressed by recognizing its similar reference images from a database. The reference images are usually retrieved via similarity search using global descriptor, and the local descriptors are used to re-rank the initial retrieved candidates. The local descriptors re-ranking can significantly improve the accuracy of global retrieval but comes at a high computational cost. To achieve a good trade-off between accuracy and efficiency, we propose an Efficient Transformer for Re-ranking (ETR), utilizing both global and local descriptors to re-rank the top candidates in a single shot. In contrast to traditional re-ranking methods, we leverage self-attention to capture relationships between local descriptors in a single image and cross-attention to explore the similarity of the image pairs. We show that the proposed model can be regarded as a general re-ranking algorithm for significantly boosting the performance of other global-only retrieval methods. Extensive experimental results show that our method outperforms state-of-the-arts and is orders of magnitude faster in terms of computational efficiency.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhang_2023_WACV, author = {Zhang, Hao and Chen, Xin and Jing, Heming and Zheng, Yingbin and Wu, Yuan and Jin, Cheng}, title = {ETR: An Efficient Transformer for Re-Ranking in Visual Place Recognition}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {5665-5674} }