S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving

Maciej K. Wozniak, Hariprasath Govindarajan, Marvin Klingner, Camille Maurice, B Ravi Kiran, Senthil Yogamani; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 1660-1670

Abstract


Recent self-supervised clustering-based pre-training techniques like DINO and CriBo have shown impressive results for downstream detection and segmentation tasks. However real-world applications such as autonomous driving face challenges with imbalanced object class and size distributions and complex scene geometries. In this paper we propose S3PT a novel scene semantics and structure guided clustering to provide more scene-consistent objectives for self-supervised training. Specifically our contributions are threefold: First we incorporate semantic distribution consistent clustering to encourage better representation of rare classes such as motorcycles or animals. Second we introduce object diversity consistent spatial clustering to handle imbalanced and diverse object sizes ranging from large background areas to small objects such as pedestrians and traffic signs. Third we propose a depth-guided spatial clustering to regularize learning based on geometric information of the scene thus further refining region separation on the feature level. Our learned representations significantly improve performance in downstream semantic segmentation and 3D object detection tasks on the nuScenes nuImages and Cityscapes datasets and show promising domain translation properties.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wozniak_2025_WACV, author = {Wozniak, Maciej K. and Govindarajan, Hariprasath and Klingner, Marvin and Maurice, Camille and Kiran, B Ravi and Yogamani, Senthil}, title = {S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {1660-1670} }