Prompting Foundational Models for Omni-supervised Instance Segmentation

Arnav M. Das, Ritwick Chaudhry, Kaustav Kundu, Davide Modolo; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 1583-1592

Abstract


Pixel-level mask annotation costs are a major bottleneck in training deep neural networks for instance segmentation. Recent promptable foundation models like the Segment Anything Model (SAM) and GroundedDINO (GDino) have shown impressive zero-shot performance in segmentation and object detection benchmarks. These models while not capable of performing inference without some form of human supervision (prompting) are ideal for omnisupervised learning where weak labels are used to derive supervisory signals for complex tasks. In our work we use SAM and GDino as teacher models and prompt them with weak annotations to create high-quality pseudomasks. These pseudomasks are then used to train student instance segmentation models. We explore various weak annotations such as bounding boxes points and image-level class labels and show that a student model can achieve roughly 95% of a fully-supervised model's performance while reducing annotation costs by 7x. Our approach reduces annotation costs of instance segmentation training making it more accessible to a wider range of applications.

Related Material


[pdf]
[bibtex]
@InProceedings{Das_2024_CVPR, author = {Das, Arnav M. and Chaudhry, Ritwick and Kundu, Kaustav and Modolo, Davide}, title = {Prompting Foundational Models for Omni-supervised Instance Segmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {1583-1592} }