Attentional Pyramid Pooling of Salient Visual Residuals for Place Recognition

Guohao Peng, Jun Zhang, Heshan Li, Danwei Wang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 885-894

Abstract


The core of visual place recognition (VPR) lies in how to identify task-relevant visual cues and embed them into discriminative representations. Focusing on these two points, we propose a novel encoding strategy named Attentional Pyramid Pooling of Salient Visual Residuals (APPSVR). It incorporates three types of attention modules to model the saliency of local features in individual, spatial and cluster dimensions respectively. (1) To inhibit task-irrelevant local features, a semantic-reinforced local weighting scheme is employed for local feature refinement; (2) To leverage the spatial context, an attentional pyramid structure is constructed to adaptively encode regional features according to their relative spatial saliency; (3) To distinguish the different importance of visual clusters to the task, a parametric normalization is proposed to adjust their contribution to image descriptor generation. Experiments demonstrate APPSVR outperforms the existing techniques and achieves a new state-of-the-art performance on VPR benchmark datasets. The visualization shows the saliency map learned in a weakly supervised manner is largely consistent with human cognition.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Peng_2021_ICCV, author = {Peng, Guohao and Zhang, Jun and Li, Heshan and Wang, Danwei}, title = {Attentional Pyramid Pooling of Salient Visual Residuals for Place Recognition}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {885-894} }