ENTL: Embodied Navigation Trajectory Learner

Klemen Kotar, Aaron Walsman, Roozbeh Mottaghi; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 10863-10872

Abstract


We propose Embodied Navigation Trajectory Learner (ENTL), a method for extracting long sequence representations for embodied navigation. Our approach unifies world modeling, localization and imitation learning into a single sequence prediction task. We train our model using vector-quantized predictions of future states conditioned on current states and actions. ENTL's generic architecture enables the sharing of the the spatio-temporal sequence encoder for multiple challenging embodied tasks. We achieve competitive performance on navigation tasks using significantly less data than strong baselines while performing auxiliary tasks such as localization and future frame prediction (a proxy for world modeling). A key property of our approach is that the model is pre-trained without any explicit reward signal, which makes the resulting model generalizable to multiple tasks and environments.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Kotar_2023_ICCV, author = {Kotar, Klemen and Walsman, Aaron and Mottaghi, Roozbeh}, title = {ENTL: Embodied Navigation Trajectory Learner}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {10863-10872} }