-
[pdf]
[supp]
[bibtex]@InProceedings{Xu_2026_CVPR, author = {Xu, Zirui and Yang, Biao and Ni, Rongrong and Zhou, Zhongkai and Shen, Shaobo}, title = {W2W: Language-Model-Based Trajectory Prediction with Reinforcement Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {23538-23548} }
W2W: Language-Model-Based Trajectory Prediction with Reinforcement Learning
Abstract
Pedestrian trajectory prediction is crucial for applications such as autonomous driving and social robots. Recently, language model (LM)-based trajectory prediction has offered both prediction accuracy and interpretability. However, the L2 loss commonly used in trajectory prediction cannot be directly applied to LM optimization, resulting in degraded prediction performance. Moreover, current LM-based trajectory prediction methods lack explicit expressions of social interactions, and their scene descriptions are overly simplistic, making it challenging to impose practical scene constraints. To address these issues, we propose Write-to-Walk (W2W). First, we convert observed trajectories and interaction cues (companion/following/obstacle) into parsable textual prompts, so that interaction semantics are expressed more explicitly in the model input. Afterward, a T5-Small backbone is trained in a two-stage manner: (1) Full-parameter supervised fine-tuning with cross-entropy loss for language learning, enabling formatted question answering; (2) Reinforcement Learning (RL) to optimize W2W, where a reward function combining ADE error and off-road penalties strengthens scene constraints, producing future trajectories consistent with the scene context and further improving prediction accuracy. Experiments on benchmark datasets (ETH/UCY and SDD) demonstrate that W2W remains competitive with recent LM-based prediction methods and strong trajectory prediction baselines on ADE/FDE. The interpretability of LMs further supports W2W's deployment in safety-critical domains such as autonomous driving.
Related Material

