UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 15238-15250

Abstract


In the context of autonomous driving the significance of effective feature learning is widely acknowledged. While conventional 3D self-supervised pre-training methods have shown widespread success most methods follow the ideas originally designed for 2D images. In this paper we present UniPAD a novel self-supervised learning paradigm applying 3D volumetric differentiable rendering. UniPAD implicitly encodes 3D space facilitating the reconstruction of continuous 3D shape structures and the intricate appearance characteristics of their 2D projections. The flexibility of our method enables seamless integration into both 2D and 3D frameworks enabling a more holistic comprehension of the scenes. We manifest the feasibility and effectiveness of UniPAD by conducting extensive experiments on various 3D perception tasks. Our method significantly improves lidar- camera- and lidar-camera-based baseline by 9.1 7.7 and 6.9 NDS respectively. Notably our pre-training pipeline achieves 73.2 NDS for 3D object detection and 79.4 mIoU for 3D semantic segmentation on the nuScenes validation set achieving state-of-the-art results in comparison with previous methods.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Yang_2024_CVPR, author = {Yang, Honghui and Zhang, Sha and Huang, Di and Wu, Xiaoyang and Zhu, Haoyi and He, Tong and Tang, Shixiang and Zhao, Hengshuang and Qiu, Qibo and Lin, Binbin and He, Xiaofei and Ouyang, Wanli}, title = {UniPAD: A Universal Pre-training Paradigm for Autonomous Driving}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {15238-15250} }