UnO: Unsupervised Occupancy Fields for Perception and Forecasting

Ben Agro, Quinlan Sykora, Sergio Casas, Thomas Gilles, Raquel Urtasun; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 14487-14496

Abstract


Perceiving the world and forecasting its future state is a critical task for self-driving. Supervised approaches leverage annotated object labels to learn a model of the world --- traditionally with object detections and trajectory predictions or temporal bird's-eye-view (BEV) occupancy fields. However these annotations are expensive and typically limited to a set of predefined categories that do not cover everything we might encounter on the road. Instead we learn to perceive and forecast a continuous 4D (spatio-temporal) occupancy field with self-supervision from LiDAR data. This unsupervised world model can be easily and effectively transferred to downstream tasks. We tackle point cloud forecasting by adding a lightweight learned renderer and achieve state-of-the-art performance in Argoverse 2 nuScenes and KITTI. To further showcase its transferability we fine-tune our model for BEV semantic occupancy forecasting and show that it outperforms the fully supervised state-of-the-art especially when labeled data is scarce. Finally when compared to prior state-of-the-art on spatio-temporal geometric occupancy prediction our 4D world model achieves a much higher recall of objects from classes relevant to self-driving.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Agro_2024_CVPR, author = {Agro, Ben and Sykora, Quinlan and Casas, Sergio and Gilles, Thomas and Urtasun, Raquel}, title = {UnO: Unsupervised Occupancy Fields for Perception and Forecasting}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {14487-14496} }