Leveraging Large-Scale Weakly Labeled Data for Semi-Supervised Mass Detection in Mammograms

Tang, Yuxing; Cao, Zhenjie; Zhang, Yanbo; Yang, Zhicheng; Ji, Zongcheng; Wang, Yiwei; Han, Mei; Ma, Jie; Xiao, Jing; Chang, Peng

Yuxing Tang, Zhenjie Cao, Yanbo Zhang, Zhicheng Yang, Zongcheng Ji, Yiwei Wang, Mei Han, Jie Ma, Jing Xiao, Peng Chang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 3855-3864

Abstract

Mammographic mass detection is an integral part of a computer-aided diagnosis system. Annotating a large number of mammograms at pixel-level in order to train a mass detection model in a fully supervised fashion is costly and time-consuming. This paper presents a novel self-training framework for semi-supervised mass detection with soft image-level labels generated from diagnosis reports by Mammo-RoBERTa, a RoBERTa-based natural language processing model fine-tuned on the fully labeled data and associated mammography reports. Starting with a fully supervised model trained on the data with pixel-level masks, the proposed framework iteratively refines the model itself using the entire weakly labeled data (image-level soft label) in a self-training fashion. A novel sample selection strategy is proposed to identify those most informative samples for each iteration, based on the current model output and the soft labels of the weakly labeled data. A soft cross-entropy loss and a soft focal loss are also designed to serve as the image-level and pixel-level classification loss respectively. Our experiment results show that the proposed semi-supervised framework can improve the mass detection accuracy on top of the supervised baseline, and outperforms the previous state-of-the-art semi-supervised approaches with weakly labeled data, in some cases by a large margin.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Tang_2021_CVPR, author = {Tang, Yuxing and Cao, Zhenjie and Zhang, Yanbo and Yang, Zhicheng and Ji, Zongcheng and Wang, Yiwei and Han, Mei and Ma, Jie and Xiao, Jing and Chang, Peng}, title = {Leveraging Large-Scale Weakly Labeled Data for Semi-Supervised Mass Detection in Mammograms}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {3855-3864} }