-
[pdf]
[bibtex]@InProceedings{Fime_2025_CVPR, author = {Fime, Awal Ahmed and Hossain, Md Zarif and Zaman, Saika and Shahid, Abdur R and Imteaj, Ahmed}, title = {Towards Trustworthy Autonomous Vehicles with Vision-Language Models Under Targeted and Untargeted Adversarial Attacks}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {619-628} }
Towards Trustworthy Autonomous Vehicles with Vision-Language Models Under Targeted and Untargeted Adversarial Attacks
Abstract
The integration of autonomous vehicles (AVs) into the transportation sector promises a transformative impact on mobility, safety, and efficiency. Yet, deploying these advanced systems in dynamically complex and unpredictable real-world environments presents substantial challenges, particularly in safeguarding their operational integrity against adversarial attacks. This paper rigorously examines the robustness of Vision-Language Models (VLMs) within AVs, emphasizing their resilience to both targeted and untargeted adversarial threats. We comprehensively evaluate four different vision encoders: CLIP, TeCoA, FARE, and Sim-CLIP. These models are assessed for their ability to maintain accurate and reliable performance when subjected to adversarial manipulations, tailoring a carefully preprocessed dataset designed to generate semantically detailed scene descriptions for enhanced caption generation. This paper explores the performance of these models across various adversarial scenarios, establishing benchmarks for their capability to interpret complex multimodal inputs under subtle adversarial manipulations. The findings reveal notable variances in resilience across the models for various AV-based datasets, with Sim-CLIP outperforming others in terms of robustness and maintaining high accuracy under adversarial conditions.
Related Material