Transformer-Based Sensor Fusion for Autonomous Driving: A Survey

Apoorv Singh; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 3312-3317

Abstract


Sensor fusion is an essential topic in many perception systems, such as autonomous driving and robotics. According to the dataset leaderboards, the transformers-based detection head and CNN-based feature encoder to extract features from raw sensor data has emerged as one of the top performing sensor-fusion 3D-detection-framework. In this work, we provide an in-depth literature survey of transformer-based 3D-object detection tasks in the recent past, primarily focusing on sensor fusion. We also briefly review the Vision Transformers (ViT) basics so readers can easily follow through with the paper. Moreover, we also briefly go through a few non-transformer-based, less-dominant methods for sensor fusion for autonomous driving. In conclusion, we summarize the role that transformers play in the domain of sensor fusion and also provoke future research in the field.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Singh_2023_ICCV, author = {Singh, Apoorv}, title = {Transformer-Based Sensor Fusion for Autonomous Driving: A Survey}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {3312-3317} }