-
[pdf]
[bibtex]@InProceedings{Liang_2025_ICCV, author = {Liang, Jiazhao and Fang, Yi}, title = {CrossCompanion: An Empathetic Real-Time Assistant Supporting Street Crossing for Low-Vision Users}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {2466-2475} }
CrossCompanion: An Empathetic Real-Time Assistant Supporting Street Crossing for Low-Vision Users
Abstract
With the rapid advancement of Vision Language Models (VLMs), new opportunities have emerged for assistive technologies. Yet, existing models often lack the empathy and clarity required to support individuals who are blind or have low vision in real-world scenarios. Current solutions are frequently expensive, difficult to deploy, or fail to deliver actionable guidance. More critically, these systems often neglect the emotional and psychological dimensions of vision loss, providing minimal empathetic interaction or support for mental well-being. This paper presents CrossCompanion, an empathy-based assistant tailored for safe street crossing, explicitly designed to address the needs of low-vision users through compassionate, real-time assistance. We introduce the Empathy Cross-Safe dataset comprising 7,886 images with detailed environmental annotations for street-level scenarios. Our system integrates real-time speech recognition and text-to-speech synthesis with a lightweight vision-language model, creating a conversational interface that generates emotionally supportive and practically useful guidance. Comprehensive evaluation demonstrates superior performance compared to existing baseline models across both technical accuracy and empathy metrics. The successful deployment on standard mobile hardware validates real-world applicability, helping users make safer crossing decisions with greater confidence and emotional well-being.
Related Material
