Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

Xin Xing, Zhexiao Xiong, Abby Stylianou, Srikumar Sastry, Liyu Gong, Nathan Jacobs; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 7799-7808

Abstract


We study a limited label problem and present a novel approach to Single-Positive Multi-label Learning. In the multi-label learning setting a model learns to predict multiple labels or categories for a single input image. This contrasts with standard multi-class image classification where the task is to predict a single label from many possible labels for an image. Single-Positive Multi-label Learning specifically considers learning to predict multiple labels when there is only one annotation per image in the training data. Multi-label learning is a more natural task than single-label learning because real-world data often involves instances belonging to multiple categories simultaneously; however most computer vision datasets contain single labels due to the inherent complexity and cost of collecting multiple high-quality annotations per image. We propose a novel approach called Vision-Language Pseudo-Labeling which uses a vision-language model CLIP to suggest strong positive and negative pseudo-labels. The experiment performance shows the effectiveness of the proposed model. Our code and data will be made publicly available at https://github.com/mvrl/VLPL.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Xing_2024_CVPR, author = {Xing, Xin and Xiong, Zhexiao and Stylianou, Abby and Sastry, Srikumar and Gong, Liyu and Jacobs, Nathan}, title = {Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {7799-7808} }