Self-Supervised Pre-Training for Semantic Segmentation in an Indoor Scene

Sulabh Shrestha, Yimeng Li, Jana Košecká; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2024, pp. 625-635

Abstract


The ability to endow 3D models of indoor scenes with semantic information is an integral part of embodied agents performing tasks such as target-driven navigation, object search, and object rearrangement. We propose RegConsist, a method for environment-specific self-supervised pre-training of a semantic segmentation model that exploits the ability of the mobile robot to move and register multiple views in the environment. Using the spatial and temporal consistency cues used for pixel association and a novel efficient region matching approach, we present a variant of contrastive learning to train a DCNN model for predicting semantic segmentation from RGB views in the environment where the agent operates. The approach introduces different strategies for sampling individual pixel pairs from associated regions in overlapping views and an efficient region association method and yields a more robust and better-performing pre-trained model when fine-tuned with a low amount of labeled data. RegConsist outperforms other self-supervised methods that pre-train on single view images and achieves competitive performance with models which are pre-trained for exactly the same task but on a different and larger dataset. We also perform various ablation studies to analyze and demonstrate the efficacy of our proposed method.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Shrestha_2024_WACV, author = {Shrestha, Sulabh and Li, Yimeng and Ko\v{s}eck\'a, Jana}, title = {Self-Supervised Pre-Training for Semantic Segmentation in an Indoor Scene}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {January}, year = {2024}, pages = {625-635} }