Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation

Sixian Zhang, Xinyao Yu, Xinhang Song, Xiaohan Wang, Shuqiang Jiang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16414-16425

Abstract


The Object Goal navigation (ObjectNav) task requires the agent to navigate to a specified target in an unseen environment. Since the environment layout is unknown the agent needs to infer the unknown contextual objects from partially observations thereby deducing the likely location of the target. Previous end-to-end RL methods capture contextual relationships through implicit representations while they lack notion of geometry. Alternatively modular methods construct local maps for recording the observed geometric structure of unseen environment however lacking the reasoning of contextual relation limits the exploration efficiency. In this work we propose the self-supervised generative map (SGM) a modular method that learns the explicit context relation via self-supervised learning. The SGM is trained to leverage both episodic observations and general knowledge to reconstruct the masked pixels of a cropped global map. During navigation the agent maintains an incomplete local semantic map meanwhile the unknown regions of the local map are generated by the pre-trained SGM. Based on the generated map the agent sets the predicted location of the target as the goal and moves towards it. Experiments on Gibson MP3D and HM3D show the effectiveness of our method.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Zhang_2024_CVPR, author = {Zhang, Sixian and Yu, Xinyao and Song, Xinhang and Wang, Xiaohan and Jiang, Shuqiang}, title = {Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {16414-16425} }