R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner

Bai, Ziyi; Li, Hanxuan; Fu, Bin; Xiong, Chuyan; Wang, Ruiping; Chen, Xilin

Ziyi Bai, Hanxuan Li, Bin Fu, Chuyan Xiong, Ruiping Wang, Xilin Chen; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 19456-19466

Abstract

This paper explores using large language models (LLMs) as low-level action planners for embodied tasks. While LLMs excel as the robot's "brain" for high-level planning, they face challenges in directly controlling the "body" by generating precise low-level actions. This limitation arises from LLMs' strength in high-level conceptual understanding but their inability to handle spatial perception effectively, restricting their potential in embodied tasks. To address this, we bridge the gap by enabling LLMs to not only comprehend complex instructions but also produce actionable, low-level plans. We introduce Room to Chessboard (R2C), a novel semantic representation that maps environmental states onto a grid-based chessboard, empowering LLMs to generate specific low-level coordinates and guide the robot in a manner akin to playing a game of chess. To further enhance decision-making, we propose the Chain-of-Thought Decision (CoT-D) paradigm, which improves LLMs' interpretability and context-awareness in spatial reasoning. By jointly training LLMs for high-level task decomposition and low-level action generation, we create a unified "brain-body" system capable of handling complex, free-form instructions while producing precise low-level actions, allowing the robot to flexibly control its movements and adapt to varying tasks. We validate R2C using both fine-tuned open-source LLMs and GPT-4, demonstrating effectiveness on the challenging ALFRED benchmark. Results show that with our R2C framework, LLMs can effectively act as low-level planners, generalizing across diverse settings and open-vocabulary robotic tasks. The code and demonstrations are available at: https://vipl-vsu.github.io/Room2Chessboard.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Bai_2025_CVPR, author = {Bai, Ziyi and Li, Hanxuan and Fu, Bin and Xiong, Chuyan and Wang, Ruiping and Chen, Xilin}, title = {R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {19456-19466} }