-
[pdf]
[arXiv]
[bibtex]@InProceedings{Gao_2025_CVPR, author = {Gao, Xiangbo and Wu, Yuheng and Wang, Rujia and Liu, Chenxi and Zhou, Yang and Tu, Zhengzhong}, title = {LangCoop: Collaborative Driving with Language}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {4226-4237} }
LangCoop: Collaborative Driving with Language
Abstract
Multi-agent collaboration holds great promise for enhancing the safety, reliability, and mobility of autonomous driving systems by enabling information sharing among multiple connected agents. However, existing multi-agent communication approaches are hindered by limitations of existing communication media, including high bandwidth demands, agent heterogeneity, and information loss. To address these challenges, we introduce LangCoop, a new paradigm for collaborative autonomous driving that leverages natural language as a compact yet expressive medium for inter-agent communication. LangCoop features two key innovations: Mixture Model Modular Chain-of-thought (M^3CoT) for structured zero-shot vision-language reasoning and Natural Language Information Packaging (LangPack) for efficiently packaging information into concise, language-based messages. Through extensive experiments conducted in the CARLA simulations, we demonstrate that LangCoop achieves a remarkable 96% reduction in communication bandwidth (< 2KB per message) compared to image-based communication, while maintaining competitive driving performance in the closed-loop evaluation. Our comprehensive evaluation across various driving signals, prompting strategies, and model architectures illustrates LangCoop's capability to encode diverse perceptual and decision-making information, establishing natural language as a powerful, model-agnostic, and low-bandwidth paradigm for multi-agent collaborative systems.
Related Material