- [pdf] [supp] [arXiv]
Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents
Accomplishing household tasks such as 'bringing a cup of water' requires to plan step-by-step actions by maintaining the knowledge about the spatial arrangement of objects and consequences of previous actions. Perception models of current embodied AI agents, however, often make mistakes due to lack of such knowledge but rely on imperfect learning of imitating agents or an algorithmic planner without the knowledge about the changed environment by the previous actions. To address the issue, we propose the CPEM (Context-aware Planner and Environment-aware Memory) embodied agent to incorporate the contextual information of previous actions for planning and maintaining spatial arrangement of objects with their states (e.g., if an object has been already moved or not) in the environment to the perception model for improving both visual navigation and object interactions. We observe that the proposed model achieves state-of-the-art task success performance in various metrics using a challenging interactive instruction following benchmark both in seen and unseen environments by large margins (up to +10.70% in unseen env.).