-
[pdf]
[bibtex]@InProceedings{Yang_2025_CVPR, author = {Yang, Siwei and Wang, Zeyu and Ortiz, Diego and Burbano, Luis and Kantarcioglu, Murat and Cardenas, Alvaro and Xie, Cihang}, title = {Probing Vulnerabilities of Vision-LiDAR Based Autonomous Driving Systems}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {3561-3569} }
Probing Vulnerabilities of Vision-LiDAR Based Autonomous Driving Systems
Abstract
Autonomous driving systems rely on advanced perception models to interpret their surroundings and make real-time driving decisions. Among these, Bird's Eye View (BEV) perception has emerged as a critical component, offering a unified 3D representation from multi-camera and sensor inputs. However, in the meantime, the security vulnerabilities of BEV-based models are starting to be examined within the scope of adversarial machine learning research. This study provides a preliminary security analysis of BEV perception models, focusing on adversarial attacks employed in different modalities, including both visual signals from cameras and point-cloud signals from LiDAR. Specifically, we examine the vulnerabilities of state-of-the-art models--including BEVDet, BEVDet4D, DAL, and BEVFormer--to different forms of adversarial attacks. In addition to the white-box setup, we also check the transferability of these attacks to black-box models. Our findings reveal although multi-modal inputs significantly improve BEV models's detection performance, they also introduce new channels for adversarial attacks and hence increase vulnerability. As long as the adversarial attack is applied to all the modalities that a model takes in, e.g., adversarial perturbation is added to both vision and LiDAR signals for a vision-LiDAR model, the attack can always achieve an almost complete success rate. Moreover, we show that the designed attack can transfer across totally different BEV architectures. For example, adversarial input trained with DAL, which is a CNN-based model, can still transfer to BEVFusion and hurt its performance significantly, which utilizes a transformer-based architecture.
Related Material