Weakly Semi-Supervised Detector-Based Video Classification with Temporal Context for Lung Ultrasound

Li, Gary Y.; Chen, Li; Zahiri, Mohsen; Balaraju, Naveen; Patil, Shubham; Mehanian, Courosh; Gregory, Cynthia; Gregory, Kenton; Raju, Balasundar; Kruecker, Jochen; Chen, Alvin

Gary Y. Li, Li Chen, Mohsen Zahiri, Naveen Balaraju, Shubham Patil, Courosh Mehanian, Cynthia Gregory, Kenton Gregory, Balasundar Raju, Jochen Kruecker, Alvin Chen; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 2483-2492

Abstract

For many challenging medical imaging tasks involving sequences, video-level labels alone are insufficient to train accurate disease classification models and do not carry information about the locations of relevant features. Alternatively, localization-based models such as detectors offer much stronger interpretability by indicating areas of suspicion, but require comprehensive frame-by-frame annotations by experts. We propose a method to address the trade-off between annotation burden and interpretability by performing simultaneous detection and classification on medical video sequences while requiring very limited frame-level supervision. Specifically, our approach aggregates individual predictions from a detection model into "tracklets" representing temporally consistent regions of pathology along the sequence. The tracklets are classified in a second stage to arrive at an overall video-level prediction. Both the detector and tracklet classifier are trained in a weakly semi-supervised manner using a large amount of video-annotated data alongside a limited set of frame annotations. We apply the approach to several challenging medical imaging tasks, namely localizing and predicting the presence or absence of lung consolidation and pleural effusion in ultrasound videos. We show that, with only a very small amount of additional frame-annotated data, the method provides strong model interpretability through localization and achieves state-of-the-art detection and classification, outperforming both direct video classifiers and comparable frame-based detectors trained without the added temporal context.

Related Material

[pdf]

[bibtex]

@InProceedings{Li_2023_ICCV, author = {Li, Gary Y. and Chen, Li and Zahiri, Mohsen and Balaraju, Naveen and Patil, Shubham and Mehanian, Courosh and Gregory, Cynthia and Gregory, Kenton and Raju, Balasundar and Kruecker, Jochen and Chen, Alvin}, title = {Weakly Semi-Supervised Detector-Based Video Classification with Temporal Context for Lung Ultrasound}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {2483-2492} }