Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang, Li Chen, Yanan Sun, Hongyang Li; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 14673-14684

Abstract


In contrast to extensive studies on general vision pre-training for scalable visual autonomous driving remains seldom explored. Visual autonomous driving applications require features encompassing semantics 3D geometry and temporal information simultaneously for joint perception prediction and planning posing dramatic challenges for pre-training. To resolve this we bring up a new pre-training task termed as visual point cloud forecasting - predicting future point clouds from historical visual input. The key merit of this task captures the synergic learning of semantics 3D structures and temporal dynamics. Hence it shows superiority in various downstream tasks. To cope with this new problem we present ViDAR a general model to pre-train downstream visual encoders. It first extracts historical embeddings by the encoder. These representations are then transformed to 3D geometric space via a novel Latent Rendering operator for future point cloud prediction. Experiments show significant gain in downstream tasks e.g. 3.1% NDS on 3D detection 10% error reduction on motion forecasting and 15% less collision rate on planning.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Yang_2024_CVPR, author = {Yang, Zetong and Chen, Li and Sun, Yanan and Li, Hongyang}, title = {Visual Point Cloud Forecasting enables Scalable Autonomous Driving}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {14673-14684} }